I want to make some comments on how LBC works. I take inspiration from this Wuille's comment:
https://bitcoin.stackexchange.com/questions/54841/birthday-attack-on-p2sh/54844The aim of the following considerations is to help bitcointalkforum's users to understand better what (and why) LBC is doing. For this reason, I will make some simplifications, like 20M = 2^24 instead of 20M = 2^24.2535 and so on. In this context above all I care about the concepts.*************************************************************************************************************************************************************************************************THE THEORYThe theory:
1)
Preimage search: given a hash H, find X such that Hash(X) = H.
2)
Collision search: find an X and a Y, such that X != Y and Hash(X) = Hash(Y).
These 2 problems look similar, but they are actually very different. In our context, X is a public key, H = hash160(X) is a bitcoin address:
1)
Preimage search: given a particular address H, find
a public key X such that hash160(X) = H.
2)
Collision search: find
two public keys X and Y, such that X != Y and hash160(X) = hash160(Y) .
A preimage search for a 160-bit hash function needs 2^160 steps. A collision search however only needs 2^80 steps.
Example of a preimage search: You choose
a particular address H.
Then you generate:
2^160 public keys Xi and store the corresponding addresses Hi = hash160(Xi)
in a list
L= {H1, H2, H3, .....H2^160} If they are all different, you will get surely a match between an Hi and the H you chose (Hi = H). So we say that there are needed
2^160 steps to find an Xi (a pre-image) of H=Hi.
Note: Xi is only
one preimage, since we don't know how H was generated, we can't say we have found a collision (we don't have
two different public keys but only one that produces for sure H)
Example of a collision search:Your purpose here is to find a couple ("collision" implies always at least two) of public keys X and Y that generate the same address H,
any address H, you can't choose it.
You generate:
a list L1 of 2^80 random addresses Hi =>
L1= {H1, H2, H3, .....H2^80} where Hi = Hash160(Xi)
and
a list L2 of other 2^80 random addresses Aj =>
L2= {A1, A2, A3, .....A2^80} where Aj = hash160(Yj)
"
You sort both lists. Then you go through both lists, and find a hash that exists in both in a single iteration. This is the critical point. It is likely that such a collision exists,
because even though there are only 2^80 entries in each list, there are 2^160 combinations of entries. Each of those has a 2^(-160) chance to be a collision.
This is also the explanation of the Birthday Paradox, and such an attack is therefore called a birthday attack." (Wuille)
So we say that there are needed
2^80 steps to find an Xi and Yj (a collision) such that Hi=Aj.
If you noted, the difference between the first case (preimage) and the second one (collision) is that in the first case we are looking for something we chose, a particular address (in LBC we have the UTXO data set, we are assuming now for the sake of simplicity it is always the same during the research) and we get only one preimage (one public key), in the second case we find two public keys that correspond to a random address.
To sum up, we can think at these 2 problems using 2 lists:
preimage search: L1 = {H1, H2, H3, .....H2^160} --> you have to generate
randomly 2^160 addresses ->
2^160 stepsL2 = {H} --> H is the
particular prefixed address you chose
collision search: L1 = {H1, H2, H3, .....H2^80} --> you have to generate
randomly 2^80 addresses
L2= {A1, A2, A3, ..... A2^80} --> you have to generate
randomly 2^80 addresses ->
2^80 steps (2^81 at the most)
The fact that the number of the elements of the 2 lists in the second case is the same (this number is equal - more or less - to the square root of the number of the elements in L1 in the first case) makes less expensive to perform a collision search.
This point is very important.
A
metaphor to visualize the difference between a preimage search (looking for a prefixed target using randomly generated elements) and a collision (a chance "encounter" between two elements of two random sequences):
imagine you are in a room.
1) In that room there is a fixed target (a single point). If a fly moves randomly, what are the chances that it hit the target? Very low (this is a preimage, to hit a specific target using randomly generated addresses).
2) Now imagine that the target turns into another fly.
Then we have two flies that move randomly in the room. What are the chances that one hit the other one? Much more. This is a collision.
*************************************************************************************************************************************************************************************************+*************************************************************************************************************************************************************************************************THE CURRENT LBC APPROACHOk, now how is the LBC's approach in relation to these two problems?
To start, LBC performs a
preimage search, where L2 contains not only 1 but 20M = (about) 2^24 addresses taken from the UTXO data set:
preimage search: L1 = {H1, H2, H3, .....H2^160} --> we can think that it is randomly generated
L2 = {A1, A2, ......, A2^24} --> UTXO data set, it is not randomly generated!!!
in this case you can note that there are 2^160 * 2^24 combinations of entries, then we can reduce L1 from 2^160 to 2^136 addresses:
L1 = {H1, H2, H3, .....H2^136} --> the addresses we have to generate to get a single preimage
L2 = {A1, A2, ......, A2^24} --> the UTXO data set, these are "data", we don't have to compute these
I repeat: we are taking care of
a preimage search problem. In LBC project we call "collision" what actually is a "preimage". Infact our difficulty is nearer to 160 bit than 80 bit. We want to find preimages (public keys) only of some particular hashes (addresses from UTXO).
Now imagine we find a preimage, i.e. we find a public key Xi such that Hi = Aj for some Aj in L2 (remember: Hi = hash160(Xi))
Then, only if we can prove that the public key Xi is different from the public key Yj that produces Aj, we can claim to have found a real collision too (usually we don't have Yj, how to get this information is a real problem)
Only in this case we can go from a single preimage to two preimages and then get a real collision.**************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************SOME IDEA FOR THE FUTURECan we improve the odds to find a preimage/collision?
My guess:
we should extend the L2 list (and extend the corresponding bloom filter to not rise the false positive rate, but this is a technical issue)
Remember:
if L2 becomes bigger, then L1 becomes smaller, and L1's size is the difficulty of our task (because:
size_of_L1 = 2^160 / size_of_L2).
L2's elements can be collected and stored easily, those of L1 don't, L1 is too big.
To extend L2, we have these possibilities:
1) we wait for bitcoin's net grows up naturally
2) we generate a few random millions of addresses (but in this way we shift from a preimage search to a purely random collision search task)
3) we use addresses already generated and stored in blockchain even if they have no more bitcoin (always interesting for me; besides, if we find a preimage of one of these addresses, then we will know for sure if we have a collision too, because all the addresses with at least an output transaction have already exposed their public key)
Narrow down the list L1 is a way to speedup our search, on the other hand we need to rise the speed at which we generate the list L1. Any idea is welcome.
To speedup my ECC library I need to realize a couple of assembly functions. I don't know so much about assembly. If someone is able to help me, I will be happy.
*************************************************************************************************************************************************************************************************