It's not surprising, however, given the fact that I'm up against professional cryptographers.
Just to let you guys know, the new algo involves a modified version of Wagner's algorithm,
which I call "castrated Wagner's," with reduced memory bandwidth without compromising
algorithm binding. My brain is fried thinking about this problem with so many gotchas day and night,
but we are at the end of the tunnel.
You've piqued my curiosity. I never gave much thought to optimizing Wagner's algorithm, as I thought the algorithm binding limited what you could do. Maybe I'll take a break from AMD GCN assembler docs and look back at the equihash paper.
p.s. You know your brain is working hard when you get a headache from just thinking!
After a day of racking my brain any faster versions of the algorithm I've come up with result in close to 0 solutions. I've had ideas that might improve only round 0 or round8, but nothing that would make a material difference for the algorithm as a whole. I'm not quite confident enough to say it's impossible to significantly optimize the algorithm, but I'm going back to reading AMD GCN architecture docs which I know can help optimize the implementation.