And yes AES-NI is already widely deployed.
My understanding is Intel has been on 22nm while ASICs have been 28 or 40nm (?), so Intel has inherent advantage if they decide to make a ubiquitously used instruction more efficient. As encryption becomes more used on everyone's computer, Intel is going to have an economic incentive to make AES-NI (and perhaps SHA-2?) more efficient.
You're right in that the AES-NI unit on Intel CPUs will be faster than on any ASIC, but you'll never have thousands of such units on a CPU as you could have on dedicated ASICs.
Thousands of AES units won't do anything for you. This is not a Bitcoin-style PoW using AES (as a hash function) instead of SHA. Most of the Intel chip is being used for mining. In fact one might surmise (in part of course) that the algorithm was designed by looking at the layout of an Intel CPU and working backwards from there.
Ah, that's interesting, I wasn't aware. Thanks for explaining!
smooth, his point is still valid in the sense that if L3 cache is implementable on an ASIC then there might be many transistors in the CPU which are not being utilized and in which case the ASIC might in theory be able to pack
more hashrate on the same die or decreased cost of hardware for the equivalent hashrate. And it is possible that specific memory use pattern of Monero's proof-of-work hash could be optimized on an ASIC as compared to the use of the generalized L3 cache on the CPU. This would need much more investigation, but realize that caching is a very wide topic, e.g. consider the variable of set-associativity. Be very careful making blanket statements, because the devil is in the details.
Yet my point was that if Intel lithography[1] is at 22nm when the ASICs are at 28 or 40nm (due I assume to Intel's higher economies-of-scale and consequent long-term investment schedule), then in terms of
more hashes per watt, Intel might have a potential advantage should they attempt to maximize some instruction, e.g. the
aesenc instruction. Such a well enveloped instruction may have much fewer modes of optimization as compared to usage patterns of L3 cache.
Electricity consumption may be the more significant factor, especially if you consider the points I made that if home miners mine at a loss, then ASIC farms would mine at a loss too if the home mining setup is equivalent in power efficiency.
And you've still got all the latent power issues on a CPU with so many things that aren't being utilized but which are still powered on. Intel has been working very much on improving the finer grained power down of unused modules as they realize power consumption is extremely important.
The CPU has an inherent disadvantage in that it is designed to be a general purpose computing device so it can't be as specialized at any one computation as an ASIC can be.
There is much more that has to be investigated and I can find nearly nothing on Monero's proof-of-work hash. No benchmarking. No detailed whitepaper. Nothing but the source code.
[1] Lithography on silicon isn't consistently that small. That progress
halted somewhere in the realm of 30nm or so. Other tricks (even 3D transistors) have been employed to give the apparency of what 22nm geometry would provide. Intel has had delays moving to "14nm".