I only wanted to make clear there's a need for improvement.
At the end of the day HW alone does nothing and SW alone does very little. NV right now has a certainly better ecosystem around it.
Imagine putting a square peg in a round hole. To do this, GPUs split the beg in pieces, pass each smaller piece then put them back on the other side.
AMD believes this is not efficient usage of transistors and this is a reason they won the console war again. NV by contrast believes they should optimize the worst case and this is what they do.
Chained hashing is even worse: you have taped several pegs together!
It is possible - and perhaps even likely - many have the improved kernels already. If they have a room full of GPUs, they are consistent advantage. If you have one, not so much.
Not all algorithms can be improved. Some are so simple they're close to efficient... it's on a case-by-case basis. For example, Echo is nearly optimal. I honestly don't like X11/X13 much, the main problem I see is: it is inefficient for GPU users, it is inefficient for a small company doing FPGAs... it is a pipe dream for a big investor who wants to roll out its own ASIC already (they are shelling out 7-digits so paying four engineers instead of one isn't much of a problem). I simply don't like the idea but it's an hash like everything else out of here.