the bandwidth of a few dozen/hundred DRAM chips (are these the ASICs you're looking for?)
No,
these are.
Unfortunately, bounties don't seem to work, especially if you're looking to (dis)prove security with them— I think it sad, because your ideas are probably the least ill-advised in the memory-hard workfunction space. (Though, seeing that you're now using siphash internally, I might worry some about your approximation-freseness...)
I say "least ill-advised" with no intended insult to your work: I'm concerned that the whole notion of memory hardness as a useful part of a work-function my be ill-advised: If anyone is looking for a grant for research on the implications of how "memory hardness" moves costs from energy to die-area and may or may not create differential advantages for attackers or defenders due die-area related costs being amortized during long term operation in either the POW-blockchain model or the password strengthening KDF model, I can very likely find funding for that (esp for the KDF model).
It is becoming conventional wisdom that memory hard KDFs are "well advised", but the analysis in the scrypt paper ignores operating costs— I talked to Colin Percival, and it seems he didn't really have enough visibility into the hardware low levels to even begin to estimate energy impact. Experience on modern process mass produced computing hardware (e.g. GPUs) seems to show power costs exceeding retail, much less manufacturing costs, in only a couple months of operation, so I'm personally pretty worried that memory hard functions go in the wrong direction, creating amortization advantages for attackers compared to functions whos large scale execution end up thermally bound— but it's an informal concern and I have no idea where the breakpoints are where this would start to strongly favor attacks (/centralization). At a minimum existing "cost" arguments for memory hard functions in some applications are ignoring a majority of the cost (energy), so more research here could be pretty interesting.
The other thing to stare at in a more modern sense than the old CRAM (or berkeley's iram) is the effect that tsv-stacked DRAM [1] will have on memory-hard PoW functions. This is something I don't think the current set of designers appreciate to the extent necessary. The "CryptoNight" algorithm (Bytecoin, Monero) is an example of this: It's a gorgeous design from the perspective of matching today's CPUs well, but its ASIC parallelism on a single chip is, at present, either area-limited by putting the 2MB scratchpad on die, or external pin-bandwidth limited by the need to stream all of the accesses to an external DRAM. A tsv-stacked DRAM architecture would blow the lid off of those limits.
gmaxwell, my starving academic researcher hat forces me to wonder to what degree you're interested in funding this general area of memory-hard PoW evaluation. I've been playing in the space for fun (and, ahh, personal profit by doing some fast miner engineering), but I'm reaching the point where I feel like I have to either turn my bitcoin-related exploration into a legitimate research project, or quit. If you think it'd be worth your time, drop me a note -
[email protected]. I'd love to actually throw some of my real day-job effort at this area. I bet I could lure one of my architecture colleagues into it, if the problem proved interesting enough.
Re bounties: The bounty isn't particularly financially attractive, but the problem has stuck in my teeth a little. I wrote up a previous post with a first-cut analysis and attack against cuckoo cycle (
http://da-data.blogspot.com/2014/03/a-public-review-of-cuckoo-cycle.html ), but I think I've finally figured out what it was that was nagging me about it from the perspective of finding a better time-memory tradeoff. I don't think the scheme is broken at all, but I suspect I can show, in a fairly handwavy sense, that there's a superlinear but sub-quadratic way to trade computation for memory. But I have some reservations about the algorithmic complexity involved -- sparse graphs are *interesting*, in the sense that there's structure and room for algorithmic optimization, and I find interesting scary when it comes to security.
I'm not the person to do it, but if someone were able to attack this one from the other direction and show a computational or time-memory tradeoff *lower bound*, it'd make me breathe a lot easier. [2]
[1] TSVs, or through-silicon vias, directly connect two or more dies with a very dense 2d interconnect. The manufacturing aspects are still very tricky for stacking memory on top of processing chips because of issues such as differential thermal expansion, but we're getting there. Nvidia has stacked dram on their roadmap, for example, though they've pushed it out by a year or so already.
[2] I might be wrong, of course; I'll try to write it up over the coming days and make sure I've dotted enough i's and t's.