Topic: The Ethereum Paradox - page 6. (Read 99918 times)

Quote from: iamnotback on June 09, 2016, 12:33:30 PM

Attacks on hash functions are mostly about doing tons of computation (but less than try all inputs)
to try invert its outputs. Siphash may not stand up to that. But to think that a weakness in inversion
resistance can make finding cycles of length 42 in a graph defined by millions or billions of siphash
outputs easier is just... utterly inconceivable. That's the gist of it.

My comment applies similarly to multi-collision attacks. I believe detecting deviations from perfect randomness to be not only way too expensive but more importantly to die out exponentially fast as the cycle grows.

Except if the cycle is growing in a patterned way and that is part of the algorithm of the attack, i.e. deviates from the algorithm you use but is still able to provide the cycles within the maximum nonce count (and does so with some advantage in performance obviously else there is no point to it).

I think it is naive for you to assume that a random walk becomes more convoluted as cycle length grows when the entire risk is about whether the uniform randomness could be subverted.

I have a very creative imagination, which is why I am often able to come up with things that others don't think of.

There are so many permutations of possibilities that you sweep away with a generalized notion of random walking which may not hold when the random oracle assumption is subverted.

It may be that there is some differential analysis or distinguisher (even just bias in even one of the bits between relative outputs to relative inputs between multiple hashes) that can be precomputed and amortized over all the hashes, because it is a table of differentials of the way the hash behaves over huge number of invocations where the spread between the enumeration of the space is smaller due to the large number of hashes computed. There are many different angles to analyse because we are not just considering one hash, but millions of them for one Cuckoo Cycle solution.

Of course I respect Dan Berstein as a much smarter cryptographer than I am, but unless I am reading that he has written, I am hesitant to be convinced he has considered all of what I am getting at just from your second hand summary of what he said verbally at a conference.

Quote from: iamnotback on June 09, 2016, 12:33:30 PM

I am not comfortable with cheating margins of security on the hash function. Sorry.

I know you and Zooko disagree with me and Dan on this. You feel such attacks are conceivable. I feel such attacks are quite inconceivable. Clearly we're not going to convince each other. So let's just agree to disagree.
People are more than welcome to run Cuckoo Cycle with blake2b replacing siphash if they feel that trade off is worthwhile. It's just not something I recommend.

I am just arguing why not be safer than sorry when so much is at stake? Why would you recommend adding unnecessary risk even where you think you are omniscient and there is no risk?

Quote from: iamnotback on June 09, 2016, 12:33:30 PM

Any way, if the edge trimming is done after running enumerating all possible nonces, then it is not a heuristic and is an optimization to reduce memory consumption by eliminating all the edges which can't be in a cycle and then I presume you re-enumerate the nonces on the "basic algorithm" but use a hash table to represent a sparse array so you don't have to store all the buckets from the more naive version of the basic algorithm.

The basic algorithm doesn't use buckets. Just a cuckoo hash table. In the edge trimming case, only on the order of 1% of edges remains, and that table is a little more complicated to translate the sparseness into memory savings. But still no buckets.

I don't see what you wrote that is different than what I wrote. You focus too literally on the use of the work 'buckets'. What I am saying is that the edge trimming eliminates most of the nodes from consideration and then you have a hash table representing a sparse array for the remainder of the nodes which aren't the pruned/trimmed leaf edges. Those remaining nodes were "buckets" in a regular array in the former untrimmed basic algorithm.

Quote from: iamnotback on June 09, 2016, 12:33:30 PM

Btw, you may not remember that AnonyMint or TPTB_need_war wrote to you a long time ago (and I think perhaps before Dave Andersen did) that you could compress and count with bit flags. So you should credit Shelby too, lol (but Dave wrote a much more detailed exposition). Do I need to go find that post on BCT where AnonyMint or TPTB_need_war told you that? I remember reading it again this week when I was doing my deeper research on Cuckoo Cycle.

Sure, a URL would help.

Here it was:

https://bitcointalksearch.org/topic/m.13721473

Note that was quoting a document that AnonyMint wrote I believe in late 2013 or early 2014. But he didn't publish all of that document until January 2016. Note he had published excerpts of that document on BCT when Monero was first released and AnonyMint was having an argument with the guy who optimized Monero's hash function. So we could go correlate that if we wanted to prove it. Any way, I am fine with David getting the credit, as he developed the concept more than AnonyMint did.

Quote from: iamnotback on June 09, 2016, 12:33:30 PM

Also you do note that the edge trimming is applicable for the case where many/most of the edges will leaf nodes, which is the case where M/N < 1/2. You may think that is acceptable, because your intended usage is apparently to run Cuckoo at that load. I note it for reasons I will not reveal now. Wink

A much earlier version used edge fractions lower than 1/2 as a difficulty control, but I abandoned it since there a relationship between edge fraction and cycle probability is strongly non-linear, making dynamic difficulty control practically impossible.

Well my wink is about using maximum nonce counts, i.e. edge fraction M/N > 1/2.

Yeah I can see that dynamic difficulty control gets messed up if deviate too far from 1/2. But difficulty can also be controlled externally by hashing the result with SHA256, which you also noted in your paper.

tromp

legendary

Activity: 1000

Merit: 1120

Quote from: iamnotback on June 09, 2016, 12:33:30 PM

My comment applies similarly to multi-collision attacks. I believe detecting deviations
from perfect randomness to be not only way too expensive but more importantly to die out exponentially
fast as the cycle grows.

Quote

I am not comfortable with cheating margins of security on the hash function. Sorry.

I know you and Zooko disagree with me and Dan on this.
You feel such attacks are conceivable. I feel such attacks are quite inconceivable.
Clearly we're not going to convince each other. So let's just agree to disagree.
People are more than welcome to run Cuckoo Cycle with blake2b replacing siphash
if they feel that trade off is worthwhile. It's just not something I recommend.

Quote

The basic algorithm doesn't use buckets. Just a cuckoo hash table.
In the edge trimming case, only on the order of 1% of edges remains, and
that table is a little more complicated to translate the sparseness into memory savings.
But still no buckets.

Quote

Sure, a URL would help.

Quote

A much earlier version used edge fractions lower than 1/2 as a difficulty control, but
I abandoned it since there a relationship between edge fraction and cycle probability is strongly
non-linear, making dynamic difficulty control practically impossible.

Quote

Okay so now we are on the same page, so I can proceed to reply to what you had written before...

Gonna visit the local chess club now. Will read and respond to that later...

finitemaz

member

Activity: 88

Merit: 10

iamnotback

sr. member

Activity: 336

Merit: 265

Quote from: iamnotback on June 09, 2016, 08:31:29 AM

Quote from: iamnotback on June 09, 2016, 02:41:57 AM

Cuckoo Cycle doesn't rely on cryptographic security of its underlying hash function,
and Dan Bernstein himself attested to its use in Cuckoo Cycle being perfectly sound.

Please provide me a copy of that discussion, because I believe I have already shown it is potentially unsound in some logic I have written down and not yet revealed.

I spoke to him in person at BITCOIN'15 where I presented Cuckoo Cycle.

Please distinguish preimage attacks from multi-collision attacks. If we can find certain characteristics in the hash function which are not perfectly random and are repeatable relationship with certain changes of bits in the key from hash-to-hash, then we can perhaps impact the cycling in the Cuckoo. Cycle forming or the way we optimize detecting them, may be impacting by certain patterning away from a perfect Random Oracle.

Think of a differential cryptanalysis where we look for the way a hash changes over differentials from one hash to the next. And also think of the parallelized rho attack. The weakness in Cuckcoo w.r.t. to hashing it is isn't doing one hash, but millions of them and so the dynamic randomness of correlation over hash differentials also has to be considered.

I am not comfortable with cheating margins of security on the hash function. Sorry. I think Zooko @ Zcash was correct to be concerned even though he didn't express what sort of attack might concern him. And remember he worked on the Blake2s cryptographic hash function but I understand he isn't a cryptographer per se (and neither am I). Crypto-currency is also about being able to trust the cryptography is strong and sound. I don't see the advantage of cheating on the hash function used in the Cuckoo Cycle, especially given I don't think you attain ASIC resistance any way, which was the entire motivation for replacing SHA256 with Siphash in order to lower the computation to memory consumption ratio.

You are apparently thinking of inversion and not about what matters. I doubt Dan really thought deeply about what he was commenting on off-the-cuff.

And Btw, you write about in our past discussions and also in your white paper (or blog) about 1/3 computation and 2/3 memory accesses, but this is execution time, not energy cost. The power cost of loading a row buffer on DRAM is very cheap because the circuits are much more dense and the page is localized. The cost of computation on a CPU is very costly because the circuits are spread out all over the chip and not as dense as 1 transistor per bit of DRAM, thus the parasitic capacitance and resistance of the interconnects is an order-of-magnitude or more greater. I think it is a losing strategy to try to say 1 hash per 1 memory access (meaning you expect 1 memory access to load a new memory row) is going to only result in a 1/3 speedup for the ASIC. And besides below I will explain how I think the assumption of 1 hash per memory row buffer load can I think be avoided by the ASIC running many h/w threads.

Quote from: iamnotback on June 09, 2016, 08:31:29 AM

I think you are referring to something entirely unrelated to what I was referring to. I see int here in the code I am referring to:

https://github.com/tromp/cuckoo/blob/gh-pages/cuckoo.c#L10

That's the basic algorithm, not the reference miner, which is in
https://github.com/tromp/cuckoo/blob/master/src/cuckoo_miner.h
for CPUs and
https://github.com/tromp/cuckoo/blob/master/src/cuda_miner.cu
for GPUs.
These spend most time doing edge trimming,
so that's what this discussion should focus on.

I was referring to the "simple array in the basic algorithm" which was from the February 1, 2014 version of your white paper. I note you mentioned the edge trimming in the December 31, 2014 update.

I was avoiding studying the edge trimming code and was focusing on the "basic algorithm".

Any way, if the edge trimming is done after running enumerating all possible nonces, then it is not a heuristic and is an optimization to reduce memory consumption by eliminating all the edges which can't be in a cycle and then I presume you re-enumerate the nonces on the "basic algorithm" but use a hash table to represent a sparse array so you don't have to store all the buckets from the more naive version of the basic algorithm.

So it is reducing memory consumption and I presume eliminating the wasted code paths for walking leaf edges in the final basic algorithm.

Btw, you may not remember that AnonyMint or TPTB_need_war wrote to you a long time ago (and I think perhaps before Dave Andersen did) that you could compress and count with bit flags. So you should credit Shelby too, lol (but Dave wrote a much more detailed exposition). Do I need to go find that post on BCT where AnonyMint or TPTB_need_war told you that? I remember reading it again this week when I was doing my deeper research on Cuckoo Cycle.

In any case, it seems it doesn't change my theory about the lack of ASIC resistance. Rather it just reduces the size of the data being randomly read/write from 32-bits to 2-bits for the majority of running time.

Also you do note that the edge trimming is applicable for the case where many/most of the edges will leaf nodes, which is the case where M/N < 1/2. You may think that is acceptable, because your intended usage is apparently to run Cuckoo at that load. I note it for reasons I will not reveal now. Wink

Okay so now we are on the same page, so I can proceed to reply to what you had written before...

Quote from: iamnotback on June 09, 2016, 02:41:57 AM

In one round of edge trimming, Cuckoo Cycle typically updates 2^29 counters
spread over 2^8 memory banks. Each memory bank thus holds 2^21 counters,
only 2^14 of which typically fit in a single row. This is where the latency comes in.
To avoid latency, you could increase the number of memory banks by a factor of 2^7,
but this similarly increases die area, hardware costs, and energy usage.

Or increase the number of h/w threads to 2^15 (2^8 multiplied by 2^21 bits for 2-bit counters divided by 2^14 bits per row) which have some efficient h/w mechanism for waiting to be synchronized on a memory page/row.

I don't see why you multiply with the number of rows that Cuckoo uses per memory bank.
Your 2^15 threads will each hash some nonce and then most will want to access a unique row.
Then what? There's little to be synchronized.

The point is that if you have enough threads running, the likelihood is there are 2^12 threads wanting to read from the same row, thus your algorithm is no longer bounded on memory latency rather on memory bandwidth and moreover that there isn't 1 hash per memory row buffer load but instead 2^12 hashes, and thus actually your algorithm is computation bound and will be 1000s times more power efficient on the ASIC.

I suspect if the GPU is running at memory bandwidth, then it may be getting multiple hashes per row buffer load and thus more power efficient than the CPU with only 8 threads/cores. CPU h/w threads may not be coalesced to sync on row buffer loads either (and I am not sure if they are on the GPU).

Quote from: iamnotback on June 09, 2016, 02:41:57 AM

Quote from: iamnotback on June 09, 2016, 08:31:29 AM

Alternatively, the algorithm parameter PART_BITS allows for a reduction in the number
of counters in use at the same time, which is what your proposal essentially amounts to.
Setting this to 21 will require only 2^8 counters,
one per memory bank. But now your hash computations have increased by a factor 2^21,
over 2 million.

No that is not equivalent to increasing the number of h/w threads and syncing them to pause until 2^13 of them are queued up to read a shared memory page/row.

In my code a thread is something that processes a sequence of nonces, hashing each one, and
then atomically updating a corresponding counter, without any coordination.

Are you proposing a thread for each nonce? Half a billion threads?
I really don't understand what you are proposing and how this syncing is supposed to be implemented.

It doesn't matter that the thread proceeds to do something else after reading and updating the 2-bit count corresponding to a nonce. It only matters that enough h/w threads are running to enable syncing many reads on the same row buffer load without stalling all the threads and gridlock.

The ASIC will need some custom timing and syncing logic. I presume stalled threads (waiting for the other threads to coalesce on each row buffer load) don't cost much in terms of power consumption when implemented with custom circuitry on the ASIC.

I think you really didn't understand what Bill Cox was trying to explain.

tromp

legendary

Activity: 1000

Merit: 1120

Quote from: iamnotback on June 09, 2016, 02:41:57 AM

Cuckoo Cycle doesn't rely on cryptographic security of its underlying hash function,
and Dan Bernstein himself attested to its use in Cuckoo Cycle being perfectly sound.

Please provide me a copy of that discussion, because I believe I have already shown it is potentially unsound in some logic I have written down and not yet revealed.

I spoke to him in person at BITCOIN'15 where I presented Cuckoo Cycle.

Quote

I think you are referring to something entirely unrelated to what I was referring to. I see int here in the code I am referring to:

https://github.com/tromp/cuckoo/blob/gh-pages/cuckoo.c#L10

iamnotback

sr. member

Activity: 336

Merit: 265

Quote from: iamnotback on June 09, 2016, 02:41:57 AM

Cuckoo Cycle doesn't rely on cryptographic security of its underlying hash function,
and Dan Bernstein himself attested to its use in Cuckoo Cycle being perfectly sound.

Please provide me a copy of that discussion, because I believe I have already shown it is potentially unsound in some logic I have written down and not yet revealed.

I spoke to him in person at BITCOIN'15 where I presented Cuckoo Cycle.

Did he consider (especially parallelized) multi-collision attacks and their impact on the potential to find an algorithm that can find cycles faster? Can you paraphrase what he said or the gist of his reasoning?

The basic issue is that can we guarantee that patterns will not leak through Siphash because the key is controllable by the attacker? By not using a general cryptographic hash function, there is not the setup overhead of randomizing the impact (or lack thereof) of a key, but I think this is very risky. If not necessary to take this risk, then why do it. I understand you think you need to minimize computation for ASIC resistance but I don't think that is the wise place to remove computation from the assurances on the Cuckoo Cycle PoW.

For example, look at the potential cryptanalysis break found just recently on break 256-bit random numbers by reducing the search space by eliminating certain patterns (although this is probably a hoax):

https://bitcointalksearch.org/topic/m.15138928

If the PoW function is broken, the entire block chain can be rewritten from the genesis block. This is not a trivial decision. We need large margin of error assurances.

Quote from: iamnotback on June 09, 2016, 02:41:57 AM

Are you referring to the latest version of your code, because I am referring to the code that was in the Appendix of your December 31. 2014 white paper which references int buckets.

The bucketing in an earlier version was just something inessential to the algorithm that was
found to be beneficial to performance. Last year I replaced it by prefetching in commit
https://github.com/tromp/cuckoo/commit/5aef59659e77c599c730ece6a42a7a2597de80da
which turned out to be more beneficial.

I think you are referring to something entirely unrelated to what I was referring to. I see int here in the code I am referring to:

https://github.com/tromp/cuckoo/blob/gh-pages/cuckoo.c#L10

I am intentionally referring to the earlier version of your algorithm, to avoid all the source code noise implementing multi-threading. Also apparently you added some TMTO tradeoffs in the latter version of your code which I presume adds more complexity to the code.

Once we establish what we are referring to, then I can better respond to your other points. I will await first your reply.

Quote from: iamnotback on June 09, 2016, 02:41:57 AM

Btw, electrical power cost of SRAM is only one order-of-magnitude increase:

I'm talking about the dollar cost of SRAM vs DRAM.

I know you were. And I thinking electrical power consumption is what matters most for mining and anti-DDoS economics.

tromp

legendary

Activity: 1000

Merit: 1120

Quote from: iamnotback on June 09, 2016, 02:41:57 AM

Cuckoo Cycle doesn't rely on cryptographic security of its underlying hash function,
and Dan Bernstein himself attested to its use in Cuckoo Cycle being perfectly sound.

Please provide me a copy of that discussion, because I believe I have already shown it is potentially unsound in some logic I have written down and not yet revealed.

I spoke to him in person at BITCOIN'15 where I presented Cuckoo Cycle.

No that is not equivalent to increasing the number of h/w threads and syncing them to pause until 2^13 of them are queued up to read a shared memory page/row.

Quote

Btw, electrical power cost of SRAM is only one order-of-magnitude increase:

I'm talking about the dollar cost of SRAM vs DRAM.

iamnotback

sr. member

Activity: 336

Merit: 265

The early days of Ethereum recounted by Charles:

https://www.youtube.com/watch?v=FIMk-A7L8S4#t=665

iamnotback

sr. member

Activity: 336

Merit: 265

Quote from: iamnotback on June 07, 2016, 04:24:07 AM

Cuckoo Cycle insecure use of Siphash.

Cuckoo Cycle doesn't rely on cryptographic security of its underlying hash function,
and Dan Bernstein himself attested to its use in Cuckoo Cycle being perfectly sound.

Please provide me a copy of that discussion, because I believe I have already shown it is potentially unsound in some logic I have written down and not yet revealed.

I think perhaps you have misunderstood Dan or (more unlikely given how smart he is) Dan has misunderstood something. It can happen.

Quote from: iamnotback on June 07, 2016, 04:24:07 AM

I also expect that given Cuckoo Cycle is parallelizable

No need to expect. This is clearly stated in the white paper.

I presume based on your birthplace, that English is not your native language. Please don't ignore then word 'given'. The 'expect' is referring to the remainder of the statement that you elided.

Quote from: iamnotback on June 07, 2016, 04:24:07 AM

necessary to make Cuckoo Cycle's proving time reasonable for instant transactions

The white paper also explains that Cuckoo Cycle is not suitable for short block intervals,
and works best for intervals even longer than bitcoin's.

Yes but you have interleaved other reasons for that, afair not mentioning the reason that I added. So I am just adding analysis.

By no means was I indicating your work is worthless. There is something very important that will come from this.

Instant transactions are best handled with off-chain payment channels, reserving the main chain for settlement.

Technically incorrect.

Quote from: iamnotback on June 07, 2016, 04:24:07 AM

tromp thinks that Cuckoo Cycle has very low computation relative to memory latency. But this depends on only one 32-bit bucket being read in each memory page/row. With massive number of h/w threads, can get them all synced up (or sorted) so that we read 1024 of 32-bit buckets in one memory latency cycle

Cuckoo Cycle randomly accesses 2-bit counters, not 32-bit buckets.

Are you referring to the latest version of your code, because I am referring to the code that was in the Appendix of your ~~December 31. 2014~~February 1, 2014 white paper which references int buckets.

Is this edge trimming coming from a suggestion from Dave Andersen?

No that is not equivalent to increasing the number of h/w threads and syncing them to pause until 2^13 of them are queued up to read a shared memory page/row.

Both of these approaches are rather cost prohibitive. You might as well avoid DRAM latency
by using SRAM instead. But that is also two orders of magnitude more expensive, while the
performance increase is closer to one order of magnitude.

Btw, electrical power cost of SRAM is only one order-of-magnitude increase:

Magnus Moreau. Estimating the Energy Consumption of Emerging Random Access Memory Technologies