I suggest we give him some space to fix it, and move on to actual issues of substance.
Fair enough. In that case, "Wild Keccak" doesn't provide sufficient
substance over alternative algorithms to be worthy of much consideration, in-my-most-humblest-of-opinions-without-offending-Michael.
*shrugs* That's fine, but let's consider it on its actual merits instead of trading thinly-veiled sarcasm. :-)
+ WK offers faster block verification than CN.
- WK can be searched in parallel, scalably, limited only by the die area you want to devote to Keccak processing.
? WK requires an amount of fast storage that scales linearly with
time -- how this interacts with Moore's law is complicated, and whether or not its scratchpad will be overtaken by lithographic advances is something my crystal ball can't handle.
-> At present, WK's parallelism is thus limited by DRAM bandwidth for 256 bit reads, because the scratchpad size has already slipped out of L3 cache on most CPUs.
As a result of this, the GPU/ASIC resistance of WK is determined almost entirely by its scratchpad. This is interesting -- it's like scrypt-adaptive-N with an automatic way of scaling the amount of memory required, but without the verification slowdown of increasing N.
- CN has poor block verification performance. This may have negative implications both for the time to bring new nodes online, but also for block-flooding DoS resistance.
+ CN's use of AES is well-matched to functionality already optimized in silicon on CPUs.
+ CN cannot be searched in parallel, to the best of my knowledge, without a corresponding increase in the number of 2MB scratchpads used for searching. This occurs because the scratchpad is modified during the search, a key differentiator from prior work such as scrypt.
=> As a consequence, the parallelism available for CN is limited by die area for an L3-based approach, or by DRAM bandwidth with 128bit reads for a DRAM-based approach.
For both, at the present time, GPUs are the most efficient way to mine them (though more so for WK). Both take a DRAM-bandwidth-based approach, storing the scratchpad(s) in RAM and using thread parallelism to mask the access latency. WK's use of 256 bit reads and Keccak makes it a little more GPU-friendy, but the major difference between the two on GPU is due to the use of AES in CN. On an ASIC, the "Aes-is-in-hardware-on-x86" advantage disappears, and the 128 vs 256 bit differences will be only modestly important (I'd guess 30%, but that's pulled out of thin air).
edit: I should clarify this: CN's better ratio for GPUs-vs-ASIC disappears. the CPUs will still keep their advantage relative to the GPUs, so the CPU/ASIC ratio of CN should be a little better than the CPU/ASIC of WK. But the GPU/ASIC ratio for both should be relatively similar, affected mostly by 128 vs 256 bit dram reads.
So: I
do see a major advantage to Wild Keccak
at this time in terms of its fast verification coupled with near-term relative CPU/GPU/ASIC balance. But it's an advantage that depends heavily on longer-term technology trends of continued lithographic scaling. Both schemes are vulnerable to large jumps in DRAM bandwidth enabled by future technologies such as tsv-stacked DRAM, but that's a little farther out in the crystal ball, and my best guess is that the manufacturing difficulties of stacked DRAM won't be ironed out to the point where it would be usable for cheap crypto mining in the next 5 years.
To me, this boils down to whether or not there are effective flooding DoS measures that can be implemented for the Cryptonotes without requiring a block verification. There probably are. But it's nice that WK makes it harder to mount *this particular* computational attack against nodes. It'd be interesting for someone to poke at that a little and see how bad the problem really is, instead of us speculating into the air about it.
There's also the issue of verification speed, which with the current implementation of Cryptonote *is* an issue, though I think we both agree that it's "just" a matter of engineering. Again - it's nice that the WK design reduces the need for extra attention paid to high-performance coding, because trying to make things fast and correct is harder than just making them correct - but this is also something that can be resolved empirically.