After a discussion with Rico about probability in this project, I made some computations to try to understand better the problem.
So: now we are generating many private keys ( from #1 to #2^160 ) to get some collisions in space of addresses. Collision: 2 different keys that "generate" the same address. We want to get a collision to unlock one of the 14M that we are monitoring. We don't want "bad" collisions, i.e. collision between 2 keys that we generate, because we can't catch them and then we want to avoid them because they are a waste of time (we don't want to compute many times the same addresses ).
Rico tried to avoid useless collisions working on 2^160 keys instead of 2^256. In my opinion there are still bad collisions and overall we can't get all addresses starting from only 2^160 keys.
Theory:
From a private key to the relative address there are 3 steps:
private key (a simple number) --> public key (a point of a curve, (x,y)) --> sha256 --> ripemd160 --> address
**************************************************************************************************
Let's imagine for a moment that we could generate all the points of the elliptic curve secp256k: there are about 2^256 points.
Each point can be represented in 2 ways: "04xy" (to get the uncompressed public key) or "02-3x" (to get the compressed public key), so there are 2^257 distinct representations (different inputs) for sha256.
Question 1:
how many distinct 256 bit strings we could get via sha256 from these 2^257 representations?If we look at theory** of hash function
Theorem : In hashing n items into a hash table with k locations, the expected number of empty locations is k*(1−1/k)^n.
the number of strings that we cannot get is then:
k (1-1/k) ^n
where n=2^257 (input) and k = 2^256 (output).
The result is: 2^256 * (1 - 1/2^256) ^ (2^257) = 2^256 * ((1 - 1 / 2^256)^(2^256))^(2) = 2^256 * ((1/e)^2) = 0,135 * 2^256 so we can get at this stage the 86,5% of all the 256 bit strings.
Now we pass from 256bit strings to 160 bit strings with ripemd160.
Question 2:
how many distinct 160 bit strings we could get via ripemd160 from these 0,865 * 2^256 strings?Input: 0,865 * 2^256 strings of 256 bit --> Output: ?? strings of 160 bit
If we apply again the formula k (1-1/k) ^n, with n = 0,865 * 2^256 , k = 2^160,
we get 0,865*2^160*((1-1/0,865*2^160)^2^256 ) = 0,865*2^160 * (1/e)^(2^95) = 0 strings that we cannot obtain.
Substantially we can say that we can get 2^160 strings, i.e.
there are 2^160 distinct addresses .
*************************************************************************************************
But what are the numbers in our case? We don't generate 2^257 representations, but only 2^161 (2^160 private keys <--> 2^160 points <--> 2^161 point representations)
Question 1:
how many distinct 256 bit strings we could get via sha256 from these 2^161 representations?If we apply again the formula k (1-1/k) ^n, with n = 2^161 , k = 0,865*2^256, we obtain substantially 2^161 different strings.
Question 2:
how many distinct 160 bit strings we could get via ripemd160 from these 2^161 strings?the formula k (1-1/k) ^n, with n = 2^161 , k = 2^160, we obtain that there are:
2^160 * ((1-1/2^160)^(2^161)) = 2^160*(1/e)^2 =
0,135 * 2^160 strings (addresses) that we cannot get in our project.
At the end,
we will have a 13,5% of collisions (the first after 2^80 keys), 1 collisions each 7 strings.
So, even if we generated 2^160 private keys, we will never find out the private keys of the 13,5% (1.9M) of our 14M of addresses with bitcoin.
And if we generated only 2^159 private keys (2^160 point representations), then there will be 2^160*(1/e) = 0,37 * 2^160 addresses that we cannot get, i.e.
in that case we will never find out the private keys of 5.2 M of our 14M of addresses with bitcoin.
**
https://www.google.it/url?sa=t&rct=j&q=&esrc=s&source=web&cd=7&ved=0ahUKEwivm9eGq5LTAhXDu48KHcIMCHcQFghMMAY&url=https%3A%2F%2Fmath.dartmouth.edu%2Farchive%2Fm19w03%2Fpublic_html%2FSection6-5.pdf&usg=AFQjCNEmK51LFxc1lsDXxHoiTbYKpOHZCg&sig2=rKC2czitmxgO8GRPSNbPTg