Scrypt uses the very efficient Salsa (ChaCha is nearly identical) one-way hash from Bernstein (not SHA256) which has a very low count of ALU operations. Afaik, Litecoin (which uses Scrypt) has ASICs which I assume have displaced CPU and GPU mining. I haven't been following Litecoin since the ASICs were rumored to be released.
Yes I was relating to the gains from SHA256 ASICs. I'm less familiar with Scrypt ASICs in all matters (how much they cost, how much performance they gain, etc.)
But as you are aware Scrypt is designed in a manner that makes the TMTO particularly trivial to implement. Even then it isn't
necessarily the case that a Scrypt ASIC would pay off (as you said). It seems they do but that is an empirical result that depends on the particulars of the algorithm. It is not the case that the lookup table approach for read-write scratchpads is equally trivial. Still feasible, possibly, but it isn't obvious.
Indeed converting a more compute intensive variant that makes it less trivial to convert to compute bound thus makes it compute bound. Hehe. See where I ended up?
Even if you are using dedicated hardware, to some extent it must be diluting the benefit of doing so by accessing them through generalized pipelines, 3 levels of caches (L1, L2, L3), and all the glue logic to make all those parts of the CPU work for the general purpose use case of the CPU. It is very unlikely that you've been able to isolate such that you are only activating the same number of transistors that the ASIC would for the specialized silicon.
Clearly true, but less clear there is anything yielding "orders of magnitude" here, nor, again, that the gain is enough to offset the costs of trying to reduce latency.
Apologies perhaps the plurality on "orders" in my prior post is what you meant by being an overly generalized statement and in that case you are literally correct.
When I write orders in plural form, I include the singular form as one of the applicable cases.
I am more inclined to believe both Cryptonite and my pure AES-NI are within 1 - 2 orders of magnitude of any possible ASIC optimization, in terms of the relevant parameters of Hash/watt and Hash/$capital outlaw.
I would only argue that I feel a more complex approach such that Cryptonite uses, has a greater risk of slipping a bit maybe between 1 and 2 orders range. Again we are just making very rough guesses based on plausible theories.
We try to minimize the advantage the ASIC will have, but we can not be at parity with the ASIC
Of course, it is a given that you can take
some things off the chip and yield an ASIC with better marginal economics. But equally ASICs can (probably) never equal the economies of scale of general purpose CPUs, either on production or R&D. So the actual numbers matter. If If you can convincingly show orders of magnitude gain, then it is easy to dismiss the rest. Otherwise not.
I almost mentioned Intel's superior fabs, but I didn't because well for one thing Intel has been failing to meet deadlines recent years. Moore's law appears to be failing.
If we are talking about a $4 billion Bitcoin market cap then yeah. But if we are talking about a $10 trillion crypto-coin world, then maybe not so any more.
Agreed the devil is in the details.
I think what I am trying to argue is that I've hoped to stay below 2 orders-of-magnitude advantage for the ASIC and by being very conservative with which CPU features are combined, then hopefully less than 1 order-of-magnitude. And that is enough for me in my consensus network design, because I don't need economic parity with ASICs in order to eliminate the profitability of ASICs in my holistic coin design.
And I am saying that the extra CPU features that Cryptonite is leveraging seem to add more risk, not less (not only risk of ASICs but risk that someone else found a way to optimize the source code). So I opted for simplicity and clarity instead. AES benchmarks against ASICs are published. Cryptonite versus ASIC benchmarks don't exist. I prefer the marketing argument that is easier to support.
In order to make it so it is not economical for the ASIC to convert the latency bound to compute bound, you end up making your "memory-hard" hash compute bound in the first place! That was a major epiphany.
I don't think you have shown it is economical, only that it is possible. In fact you have specifically agreed that the gain from ASICs vs. in-CPU dedicate circuits is likely relatively small, which is why you propose using the in-CPU circuits!
I think perhaps you are missing my point. In order to make it so it is only "possible" and not likely for the ASIC to convert the latency bound (of the CPU hash) to compute bound, then it appears it is likely to end up making the CPU hash compute bound any way by adding more computation between latency accesses, as I believe Cryptonite has done. Haven't benchmarked it though.
So that is one reason why I discarded all that latency bound circuitry and just K.I.S.S. on the documented less than 1 order-of-magnitude advantage for ASICs w.r.t. to AES-NI. And then I also did one innovation better than that (secret).
Thus why waste the time pretending the memory-hard complexity is helping you. If anything it is likely hurting you. Just go directly to using AES-NI for the hash. Which is what I did in spades.
Because I think this more clearly loses to economies of scale by replicating the relatively small AES hardware many, many times and getting rid of overhead of packaging, being instructions decoded inside a CPU (you can never eliminate all overhead from that regardless of how "tight" your algorithm) inside a computer (memory, fans, batteries, I/O, etc.) with a relatively tiny hash rate compared to a chip with 1000x AES circuits inside a mining rig with 100 chips.
In terms of Hashes/capital outlaw yes. But not necessarily true for Hashes/watt.
Even you admitted upthread that the 40 Gbytes/sec for ASIC on AES might not be an order-of-magnitude advantage in terms of Hashes/watt. I had seen some stats on relative Hashes/watt and I've forgotten what they said. Will need to dig my past notes for it.
But for our application, I wasn't concerned about the capital cost because the user has a $0 capital cost because they already own a CPU.
I was supremely focused on the Hashes/watt, because I know if I can get users to mine, and they don't notice a 1% increase in their electricity bill, then their Hashes/watt is infinite.
Also so that the rest of the user's CPU isn't tied up by the hashing, so they can mine and work simultaneously.And thus I make ASICs relatively unprofitable. They may still be able to make a "profit" if the coin value is rising to level of hashrate invested (which would so awesome if you have 100 million users mining), but the ASICs can not likely dominate the network hashrate. Or at least they can't make it totally worthless for a user to mine, if there is perceived value to microtransactions and users don't want to hassle with an exchange just to use a service they want that only accepts microtransactions. But even that is not the incentive I count on to get users to mine.
There is a lot of holistic economics involved in designing your proof-of-work hash. It is very complex.
Memory gives physical size (and also its own cost, but mostly just size), which makes it impossible to replicate to whatever arbitrary degree is needed to amortize product-type and packaging overhead.
Agreed if Hashes/capital outlaw is your objective but you may also worse your relative (to ASIC) Hashes/watt. Maybe not, but again there are no Cryptonite vs. ASIC benchmarks so we can't know. There are AES benchmarks.
There may be economic arguments apart from mining efficiency why people mine on computers vs. minings rigs. But if those exist they must not exclusively rely on pure raw efficiency (because that will clearly lose by the argument in the previous paragraph) but to efficiency being "close enough" for the other factors to come into play. So again, the number matter. And once you resort to "close enough" then yet again numbers matter.
Yup. Well stated to point out that as get closer to parity on efficiencies then all sorts of game theory economic scenarios come into play. That is what I meant by it is complex.
Anyway, I won't be surprised to be proven "wrong" (not really wrong because I'm not saying any of this won't work, just that it isn't clear that it will work) but I will be surprised if the market cap at which cryptonight ASICs appear isn't a lot higher than Scrypt ASICs (adjusting for differences in time period and such).
Well no one can for any circuit of any reasonable complexity implemented on a complex circuit of the CPU which is somewhat of a blackbox. Berstein even complained recently that Intel isn't documenting all its timings.
The basic problem as I see it is that the only thing Cryptonote does to deal with ASICs is the proof-of-work hash. It doesn't look at other aspects of the system for ways to drive away ASICs. And in some sense, it shouldn't since Cryptonote like all existing crypto is subject to a 51% attack and thus it needs as much hashrate as it can get (up to some level).
EDIT: there is actually a real world example of using just in-CPU crypto not being enough to compete with ASICs and that is Intels SHA extensions. They are not competitive with Bitcoin mining ASICs.
Seems I had read something about that where Intel hadn't done the circuit necessary to be competitive. As you said, SHA is a complex algorithm. If there is some instruction which is fundamental and highly optimized, e.g. 64x64 it might be as optimized as it can get in silicon. I doubt Intel has fully optimized anything to that degree, but apparently on the AES benchmarks they got close.
It seems to me that a slower SHA hash is not something that can slow your computer to a crawl. Whereas with full disk encryption, Intel needs AES to be very fast.