Double hashing: less entropy? | Bitcointalksearch.org

Meni Rosenfeld

donator

Activity: 2058

Merit: 1054

Quote from: someone42 on June 11, 2012, 02:15:04 PM

Quote from: Meni Rosenfeld on June 11, 2012, 11:55:38 AM

Right, this is because of the changing function, with some support from the continuity approximation.

Let's go to the extreme: Let's say after several rounds we end up with 2 elements, so k=2/N. Then after another round we should end up with 2-2/N) elements which is a bit less, and with additional rounds we converge to 0.

Of course, we cannot actually have a noninteger number of elements. But that's not a problem when we change the function each time. Most functions will map 2 elements to 2 elements; but after enough rounds we eventually stumble upon a function that maps both elements to one.

With a consistent function, this can't happen; either it sends them to 2 elements (which must be the same ones) or it doesn't. If it does then it remains at 2 elements forever without decreasing further.

And this doesn't have to happen when we're at 2 elements; after enough rounds, we've weeded out all elements that are not in the function's cycles, and what we're left with we're stuck with forever. And the set of cycle elements can be very large, though I have no idea just how large.

You've given me an idea on how to analyse the fixed-function case. You're better than me at this stuff, so correct me if I'm wrong.

Represent every element as a vertex in a graph. Then, add directed edges between vertices according to how SHA-256 maps elements onto other elements. The final set of cycle elements is all the stuff that is part of a cycle. The question is, how big is this set? I attempted a rough estimate, and was surprised at what I got.

The graph can be constructed by building chains/cycles. To begin, pick an element, run it through the random oracle, then add a directed edge between input and output. Then pick the output element, run it through the same random oracle ... etc. This builds up the first chain, and eventually this chain will loop back on itself, creating a cycle. How big is this cycle? Let N be the number of elements. After adding 1 edge, the probability that the chain will loop back on itself is 1 / N. After adding k edges, the probability that the chain will loop back on itself is k / N. Thus the probability that the first chain has k edges with no cycles is:

Code:

(1 - 1/N)(1 - 2/N)...(1 - k/N)

This can be approximated as (1 - k / (2N)) ^ k (as long as k is much smaller than N), and then approximated again as exp(-k ^ 2 / (2N)). Equating this to 0.5 reveals that the first chain, on average, has a length of roughly sqrt(N).

The rest of the graph can be constructed by picking an unconnected vertex and running it through the random oracle, building another chain. However, this time, the chain either loops back on itself (creating another cycle) or merges with a previous chain. Very roughly, the second chain has a 1/2 chance of creating another cycle and a 1/2 chance of merging with the first chain (because, its average length should be similar to the first chain's average length). Likewise, the ith chain has a 1/i chance of creating another cycle and a (i - 1)/i chance of merging with any one of the previous (i - 1) chains. The average number of cycles is then 1 + 1/2 + 1/3 + 1/4 + ...; the harmonic series. Even for 2 ^ 128 terms, this is only about 100.

My estimate for the size of the final set is: average cycle length * average number of cycles, and is very roughly, 100 * sqrt(N). For SHA-256, this is about 2 ^ 135. That's much lower than I expected! But to get to this, you probably have to go through an insane (like 2 ^ 128) number of rounds.

Looks ok, there are a few small issues but they don't matter much.

Gavin Andresen

legendary

Activity: 1652

Merit: 2316

Chief Scientist

Relevant discussion here:
http://crypto.stackexchange.com/questions/779/hashing-or-encrypting-twice-to-increase-security

dooglus

legendary

Activity: 2940

Merit: 1333

Quote from: ?? on ??

Quote from: kokjo on June 11, 2012, 06:18:37 AM

satoshi encouraged people to mine with gpus, he did foresee this.

Eh. I remember hearing the opposite. I probably remember wrong.

No, you're right. He foresaw it, but discouraged it:

Quote from: satoshi on December 12, 2009, 12:52:44 PM

We should have a gentleman's agreement to postpone the GPU arms race as long as we can for the good of the network. It's much easer to get new users up to speed if they don't have to worry about GPU drivers and compatibility. It's nice how anyone with just a CPU can compete fairly equally right now.

someone42

member

Activity: 78

Merit: 11

Chris Chua

Quote from: Meni Rosenfeld on June 11, 2012, 11:55:38 AM

Right, this is because of the changing function, with some support from the continuity approximation.

Let's go to the extreme: Let's say after several rounds we end up with 2 elements, so k=2/N. Then after another round we should end up with 2-2/N) elements which is a bit less, and with additional rounds we converge to 0.

Of course, we cannot actually have a noninteger number of elements. But that's not a problem when we change the function each time. Most functions will map 2 elements to 2 elements; but after enough rounds we eventually stumble upon a function that maps both elements to one.

With a consistent function, this can't happen; either it sends them to 2 elements (which must be the same ones) or it doesn't. If it does then it remains at 2 elements forever without decreasing further.

And this doesn't have to happen when we're at 2 elements; after enough rounds, we've weeded out all elements that are not in the function's cycles, and what we're left with we're stuck with forever. And the set of cycle elements can be very large, though I have no idea just how large.

You've given me an idea on how to analyse the fixed-function case. You're better than me at this stuff, so correct me if I'm wrong.

Represent every element as a vertex in a graph. Then, add directed edges between vertices according to how SHA-256 maps elements onto other elements. The final set of cycle elements is all the stuff that is part of a cycle. The question is, how big is this set? I attempted a rough estimate, and was surprised at what I got.

The graph can be constructed by building chains/cycles. To begin, pick an element, run it through the random oracle, then add a directed edge between input and output. Then pick the output element, run it through the same random oracle ... etc. This builds up the first chain, and eventually this chain will loop back on itself, creating a cycle. How big is this cycle? Let N be the number of elements. After adding 1 edge, the probability that the chain will loop back on itself is 1 / N. After adding k edges, the probability that the chain will loop back on itself is k / N. Thus the probability that the first chain has k edges with no cycles is:

Code:

(1 - 1/N)(1 - 2/N)...(1 - k/N)

This can be approximated as (1 - k / (2N)) ^ k (as long as k is much smaller than N), and then approximated again as exp(-k ^ 2 / (2N)). Equating this to 0.5 reveals that the first chain, on average, has a length of roughly sqrt(N).

The rest of the graph can be constructed by picking an unconnected vertex and running it through the random oracle, building another chain. However, this time, the chain either loops back on itself (creating another cycle) or merges with a previous chain. Very roughly, the second chain has a 1/2 chance of creating another cycle and a 1/2 chance of merging with the first chain (because, its average length should be similar to the first chain's average length). Likewise, the ith chain has a 1/i chance of creating another cycle and a (i - 1)/i chance of merging with any one of the previous (i - 1) chains. The average number of cycles is then 1 + 1/2 + 1/3 + 1/4 + ...; the harmonic series. Even for 2 ^ 128 terms, this is only about 100.

My estimate for the size of the final set is: average cycle length * average number of cycles, and is very roughly, 100 * sqrt(N). For SHA-256, this is about 2 ^ 135. That's much lower than I expected! But to get to this, you probably have to go through an insane (like 2 ^ 128) number of rounds.

Meni Rosenfeld

donator

Activity: 2058

Merit: 1054

Quote from: someone42 on June 11, 2012, 11:35:08 AM

Quote from: Meni Rosenfeld on June 11, 2012, 07:57:56 AM

No, because the assumptions I made become less true the more rounds are done (maybe they're not even accurate enough after one round). The set of all possible images of SHA256^N becomes smaller for larger N until it converges to a fixed set (which is probably very large). Then SHA-256 is a permutation (one-to-one mapping) on this set. (This is true for every function from a space to itself).

But this is probably because I assumed that each round had its own independent random oracle. The results may be different for a fixed function like SHA-256.

Right, this is because of the changing function, with some support from the continuity approximation.

Let's go to the extreme: Let's say after several rounds we end up with 2 elements, so k=2/N. Then after another round we should end up with 2-2/N) elements which is a bit less, and with additional rounds we converge to 0.

Of course, we cannot actually have a noninteger number of elements. But that's not a problem when we change the function each time. Most functions will map 2 elements to 2 elements; but after enough rounds we eventually stumble upon a function that maps both elements to one.

With a consistent function, this can't happen; either it sends them to 2 elements (which must be the same ones) or it doesn't. If it does then it remains at 2 elements forever without decreasing further.

And this doesn't have to happen when we're at 2 elements; after enough rounds, we've weeded out all elements that are not in the function's cycles, and what we're left with we're stuck with forever. And the set of cycle elements can be very large, though I have no idea just how large.

someone42

member

Activity: 78

Merit: 11

Chris Chua

Quote from: Meni Rosenfeld on June 11, 2012, 07:57:56 AM

No, because the assumptions I made become less true the more rounds are done (maybe they're not even accurate enough after one round). The set of all possible images of SHA256^N becomes smaller for larger N until it converges to a fixed set (which is probably very large). Then SHA-256 is a permutation (one-to-one mapping) on this set. (This is true for every function from a space to itself).

I thought it would be interesting to see what the entropy reduction is for multiple rounds. I assumed each round has its own independent random oracle which maps k * N elements to N potential output elements, where 0 <= k <= 1 and N is 2 ^ 256. For each round, I found that on average, exp(-k) * N output elements have no preimage. Therefore, each round maps k * N elements to (1 - exp(-k)) * N elements.

Iterating this, the entropy reduction (ignoring the non-uniform output distribution for now) is:

Round	Cumulative entropy reduction
1	0.6617
2	1.0938
4	1.6800
8	2.4032
16	3.2306
32	4.1285
64	5.0704
128	6.0381
256	7.0204

I don't observe any convergence, and indeed the equation k = 1 - exp(-k) has one solution: at k = 0. But this is probably because I assumed that each round had its own independent random oracle. The results may be different for a fixed function like SHA-256.

Meni Rosenfeld

donator

Activity: 2058

Merit: 1054

Quote from: Pieter Wuille on June 11, 2012, 07:48:03 AM

Quote from: Meni Rosenfeld on June 11, 2012, 07:15:53 AM

I did a calculation which says that every application of SHA-256 reduces entropy by about 0.5734 bits. I have no idea if that's correct.

Assuming SHA-256 is a random function (maps every input to a uniformly random independent output), you will end up having (on average) (1-1/e)*2^256 different outputs. This indeed means a loss of entropy of about half a bit. Further iterations map a smaller space to an output space of 2^256, and the loss of entropy of each further application drops very quickly. It's certainly not the case that you lose any significant amount by doing 1000 iterations or so.

I took it one step further and considered the distribution among the outputs, which is not uniform; the result for the amount of entropy lost is (1/e)*sum(log_2 n / (n-1)!). But this is probably little more than a nice exercise, as SHA-256 is likely too far from random for this calculation to be meaningful.

And I mistakenly input ln rather than log_2 to the software, so the value I want really should be 0.827.

TangibleCryptography

sr. member

Activity: 476

Merit: 250

Tangible Cryptography LLC

Quote from: ?? on ??

Do people who design hash functions target these kind of properties or are they sort of
an after-the-fact implication of stronger sought-after properties of the function ?

It is property they are targeting. The ideal hashing function is described as a random oracle
http://en.wikipedia.org/wiki/Random_oracle

No hashing function to date is a perfect random oracle but that is the ideal they are striving for.

Kazimir

legendary

Activity: 1176

Merit: 1011

Quote from: Meni Rosenfeld on June 11, 2012, 07:57:56 AM

No, because the assumptions I made become less true the more rounds are done (maybe they're not even accurate enough after one round). The set of all possible images of SHA256^N becomes smaller for larger N until it converges to a fixed set (which is probably very large). Then SHA-256 is a permutation (one-to-one mapping) on this set. (This is true for every function from a space to itself).

Ah right, yeah. It would be interesting to know (or at least have a reasonable estimate) how large this conversion set is.

Intuitively I'd say it's quite sufficient to hold as a secure hash for the forseeable future, although I don't have any reasonable arguments for this.

kokjo

legendary

Activity: 1050

Merit: 1000

You are WRONG!

i should really stop posting in this thread, im embarrassing myself.

Meni Rosenfeld

donator

Activity: 2058

Merit: 1054

Quote from: ?? on ??

Quote from: Meni Rosenfeld on June 11, 2012, 07:15:53 AM

I did a calculation which says that every application of SHA-256 reduces entropy by about 0.5734 bits. I have no idea if that's correct.

Mmmh.

Does that mean that sha256 ^N(some 256bit input) for N sufficiently large
would always converge to the same value, independent of the actual input.

No, because the assumptions I made become less true the more rounds are done (maybe they're not even accurate enough after one round). The set of all possible images of SHA256^N becomes smaller for larger N until it converges to a fixed set (which is probably very large). Then SHA-256 is a permutation (one-to-one mapping) on this set. (This is true for every function from a space to itself).

Quote from: ?? on ??

First, do we actually *know* that sha-256 is *not* a one to one mapping on the 256 bit space ?
If it turns out to be, then you've got nothing. I don't know the answer, I'm not a professional cryptographer,
but looking at the code for SHA-256, there doesn't seem to be an obvious dropping of bits within
the transform step itself, but then I am too lazy to analyze it in-depth.

SHA-256, as a cryptographic hash function, aspires to be indistinguishable from random. If it was in fact random, the number of preimages for every 256-bit element would follow the Poisson distribution - about 36% would have no preimage, 36% would have one, 18% two, 6% three and so on. So I'd say it's almost certain that it's not a 1-1 mapping.

Very interesting observation, and you're probably correct.
Even more interesting would be to find a way to measure if that
is indeed the case (even if the verification is in a monte-carlo sense)

That's probably as hard as determining whether the function is broken.

Quote from: ?? on ??

Finally, what someone said: the likely intent of the team who designed bitcoin was to slow mining down, not to
add a layer of security there.

What could that mean? The difficulty controls the mining rate. If a hash function half as hard would be chosen, the difficulty would double and you'd have the same generation rate.

You're right, but then I'm not entirely convinced by the 'this buys use time the day sha-256 gets cracked' either.
If that was their concern, why not combine two wildly differing 256-bit hash algorithm and XOR the result ?

That probably could work too, but that also doesn't guarantee improves security. Maybe the two hash functions are actually connected in some unexpected way and the XOR is actually weaker than either.

SHA-256 itself is multiple iterations of a basic function. I'm guessing it is assumed that the basic function itself is ok and that the more times you apply it, the harder it is to attack.

Topic: Double hashing: less entropy? (Read 4058 times)