Author

Topic: Cuckoo Cycle revisited (Read 2657 times)

donator
Activity: 1274
Merit: 1060
GetMonero.org / MyMonero.com
August 02, 2014, 03:27:58 AM
#8
This just happened in #tahoe-lafs - Zooko pinged out before I could reply, but thought I'd pop it up here for thoughts:

[07:07:59]  zooko:    So I was just looking at https://github.com/tromp/cuckoo
[07:08:46]  zooko:    And I was wondering: suppose you are a plucky little smartphone, and you're competing against a shiny, powerful server, to do cryptocurrency mining which uses cuckoo PoW.
[07:09:22]  zooko:    It occurs to me that if cuckoo can be tuned to fit into 32 KB, then maybe your relative performance vs. the server would be better, because you have a Cortex-A9 with 4 cores and each core has 32 KB of L1 cache
[07:09:42]  zooko:    and your L1 cache has latency comparable to your competitors.
[07:12:24]  zooko:    So, I'm speculating that this would be better for you than a variant of cuckoo pow which lets you use your 1–8 MB of L2 cache, or even your 1–4 GB of DRAM, but also lets your competitor do the same.
[07:12:35]  zooko:    I'm guessing that his DRAM is much more copious and also much lower latency than yours.


dga
hero member
Activity: 737
Merit: 511
August 01, 2014, 07:42:31 PM
#7
I posted my reply with a pointer to a very quickly-hacked-up algorithmic analysis and time/memory tradeoff algorithm and proof-of-concept implementation here:

https://bitcointalksearch.org/topic/m.8144512

direct link to the "paper" (if something so cobbled together can be called that):

http://www.cs.cmu.edu/~dga/crypto/cuckoo/analysis.pdf

I'll update it into something a little more professional in the coming week, but figured the discussion might benefit from having some preliminary results added to it.  Feedback appreciated!
dga
hero member
Activity: 737
Merit: 511
July 31, 2014, 09:39:31 PM
#6
the bandwidth of a few dozen/hundred DRAM chips (are these the ASICs you're looking for?)
No, these are.

Unfortunately, bounties don't seem to work, especially if you're looking to (dis)prove security with them— I think it sad, because your ideas are probably the least ill-advised in the memory-hard workfunction space.  (Though, seeing that you're now using siphash internally, I might worry some about your approximation-freseness...)

I say "least ill-advised" with no intended insult to your work: I'm concerned that the whole notion of memory hardness as a useful part of a work-function my be ill-advised:  If anyone is looking for a grant for research on the implications of how "memory hardness" moves costs from energy to die-area and may or may not create differential advantages for attackers or defenders due die-area related costs being amortized during long term operation in either the POW-blockchain model or the password strengthening KDF model, I can very likely find funding for that (esp for the KDF model).

It is becoming conventional wisdom that memory hard KDFs are "well advised", but the analysis in the scrypt paper ignores operating costs— I talked to Colin Percival, and it seems he didn't really have enough visibility into the hardware low levels to even begin to estimate energy impact.  Experience on modern process mass produced computing hardware (e.g. GPUs) seems to show power costs exceeding retail, much less manufacturing costs, in only a couple months of operation, so I'm personally pretty worried that memory hard functions go in the wrong direction, creating amortization advantages for attackers compared to functions whos large scale execution end up thermally bound— but it's an informal concern and I have no idea where the breakpoints are where this would start to strongly favor attacks (/centralization). At a minimum existing "cost" arguments for memory hard functions in some applications are ignoring a majority of the cost (energy), so more research here could be pretty interesting.


The other thing to stare at in a more modern sense than the old CRAM (or berkeley's iram) is the effect that tsv-stacked DRAM [1] will have on memory-hard PoW functions.  This is something I don't think the current set of designers appreciate to the extent necessary.  The "CryptoNight" algorithm (Bytecoin, Monero) is an example of this:  It's a gorgeous design from the perspective of matching today's CPUs well, but its ASIC parallelism on a single chip is, at present, either area-limited by putting the 2MB scratchpad on die, or external pin-bandwidth limited by the need to stream all of the accesses to an external DRAM.  A tsv-stacked DRAM architecture would blow the lid off of those limits.

gmaxwell, my starving academic researcher hat forces me to wonder to what degree you're interested in funding this general area of memory-hard PoW evaluation.  I've been playing in the space for fun (and, ahh, personal profit by doing some fast miner engineering), but I'm reaching the point where I feel like I have to either turn my bitcoin-related exploration into a legitimate research project, or quit.  If you think it'd be worth your time, drop me a note - [email protected].  I'd love to actually throw some of my real day-job effort at this area.  I bet I could lure one of my architecture colleagues into it, if the problem proved interesting enough.

Re bounties:  The bounty isn't particularly financially attractive, but the problem has stuck in my teeth a little.  I wrote up a previous post with a first-cut analysis and attack against cuckoo cycle (http://da-data.blogspot.com/2014/03/a-public-review-of-cuckoo-cycle.html ), but I think I've finally figured out what it was that was nagging me about it from the perspective of finding a better time-memory tradeoff.  I don't think the scheme is broken at all, but I suspect I can show, in a fairly handwavy sense, that there's a superlinear but sub-quadratic way to trade computation for memory.  But I have some reservations about the algorithmic complexity involved -- sparse graphs are *interesting*, in the sense that there's structure and room for algorithmic optimization, and I find interesting scary when it comes to security. Smiley  I'm not the person to do it, but if someone were able to attack this one from the other direction and show a computational or time-memory tradeoff *lower bound*, it'd make me breathe a lot easier. [2]

[1] TSVs, or through-silicon vias, directly connect two or more dies with a very dense 2d interconnect.  The manufacturing aspects are still very tricky for stacking memory on top of processing chips because of issues such as differential thermal expansion, but we're getting there.  Nvidia has stacked dram on their roadmap, for example, though they've pushed it out by a year or so already.

[2] I might be wrong, of course;  I'll try to write it up over the coming days and make sure I've dotted enough i's and t's.
legendary
Activity: 990
Merit: 1108
July 31, 2014, 09:13:03 AM
#5
the bandwidth of a few dozen/hundred DRAM chips (are these the ASICs you're looking for?)
No, these are.

For CRAM to be effective, it must be possible to partition the data-set over memory banks,
such that each bank can do independent local computation on the part of the data-set that it stores.
I don't see Cuckoo Cycle fitting into this mold, as any subarray of vertex degree counters needs access
to the entire bitvector of life edges in order to compute degrees correctly.
Can you elaborate on what it is that you would have each memory bank store and process?

Quote
Unfortunately, bounties don't seem to work, especially if you're looking to (dis)prove security with them— I think it sad, because your ideas are probably the least ill-advised in the memory-hard workfunction space.  (Though, seeing that you're now using siphash internally, I might worry some about your approximation-freseness...)

The bounties are to increase the likelihood of identifying opportunities for optimization that I may have overlooked. Of course, people finding such opportunities may decide to keep them secret in the hope of exploiting them later, but I'm hoping that at least some people will prefer to claim the bounty and help establish a more level playing field for when adoption does take place.

Quote
I say "least ill-advised" with no intended insult to your work: I'm concerned that the whole notion of memory hardness as a useful part of a work-function my be ill-advised:  If anyone is looking for a grant for research on the implications of how "memory hardness" moves costs from energy to die-area and may or may not create differential advantages for attackers or defenders due die-area related costs being amortized during long term operation in either the POW-blockchain model or the password strengthening KDF model, I can very likely find funding for that (esp for the KDF model).

It is becoming conventional wisdom that memory hard KDFs are "well advised", but the analysis in the scrypt paper ignores operating costs— I talked to Colin Percival, and it seems he didn't really have enough visibility into the hardware low levels to even begin to estimate energy impact.  Experience on modern process mass produced computing hardware (e.g. GPUs) seems to show power costs exceeding retail, much less manufacturing costs, in only a couple months of operation, so I'm personally pretty worried that memory hard functions go in the wrong direction, creating amortization advantages for attackers compared to functions whos large scale execution end up thermally bound— but it's an informal concern and I have no idea where the breakpoints are where this would start to strongly favor attacks (/centralization). At a minimum existing "cost" arguments for memory hard functions in some applications are ignoring a majority of the cost (energy), so more research here could be pretty interesting.

I think that economic consequences of large scale adoption of a memory-hard PoW are hard to predict. I am hoping that both many-core CPUs and FGPAs will become common features on desktop computers of the future, in which case many people will be able to mine with only a moderate efficiency handicap, which will probably never be the case for bitcoin.
staff
Activity: 4326
Merit: 8951
July 30, 2014, 11:41:09 PM
#4
the bandwidth of a few dozen/hundred DRAM chips (are these the ASICs you're looking for?)
No, these are.

Unfortunately, bounties don't seem to work, especially if you're looking to (dis)prove security with them— I think it sad, because your ideas are probably the least ill-advised in the memory-hard workfunction space.  (Though, seeing that you're now using siphash internally, I might worry some about your approximation-freseness...)

I say "least ill-advised" with no intended insult to your work: I'm concerned that the whole notion of memory hardness as a useful part of a work-function my be ill-advised:  If anyone is looking for a grant for research on the implications of how "memory hardness" moves costs from energy to die-area and may or may not create differential advantages for attackers or defenders due die-area related costs being amortized during long term operation in either the POW-blockchain model or the password strengthening KDF model, I can very likely find funding for that (esp for the KDF model).

It is becoming conventional wisdom that memory hard KDFs are "well advised", but the analysis in the scrypt paper ignores operating costs— I talked to Colin Percival, and it seems he didn't really have enough visibility into the hardware low levels to even begin to estimate energy impact.  Experience on modern process mass produced computing hardware (e.g. GPUs) seems to show power costs exceeding retail, much less manufacturing costs, in only a couple months of operation, so I'm personally pretty worried that memory hard functions go in the wrong direction, creating amortization advantages for attackers compared to functions whos large scale execution end up thermally bound— but it's an informal concern and I have no idea where the breakpoints are where this would start to strongly favor attacks (/centralization). At a minimum existing "cost" arguments for memory hard functions in some applications are ignoring a majority of the cost (energy), so more research here could be pretty interesting.
legendary
Activity: 990
Merit: 1108
July 30, 2014, 02:36:06 PM
#3
Have you found a qualified ASIC expert with low-level hardware experience to evaluate it yet?

No; but that doesn't stop me from claiming that a good Cuckoo Cycle hardware setup might be an FPGA saturating (with random accesses) the bandwidth of a few dozen/hundred DRAM chips (are these the ASICs you're looking for?).
legendary
Activity: 1120
Merit: 1164
July 30, 2014, 01:34:04 PM
#2
Have you found a qualified ASIC expert with low-level hardware experience to evaluate it yet?

I have zero reason to think you have the right background based on what you have previously said on this topic.
legendary
Activity: 990
Merit: 1108
July 30, 2014, 10:29:01 AM
#1
This is about the graph theoretic proof-of-work algorithm that I originally proposed in January and which is still not in use in any crypto-currency. Please see

https://bitcointalksearch.org/topic/cuckoo-cycle-speed-challenge-2500-in-bounties-707879

for an introduction to Cuckoo Cycle and the bounties I am offering to support its claims of optimality, memory-hardness, and GPU resistance.
Jump to: