Pages:
Author

Topic: Large Bitcoin Collider (Collision Finders Pool) - page 43. (Read 193404 times)

legendary
Activity: 1932
Merit: 2077

If I am not entirely mistaken, I compute 1 inverse each 4096 private keys (8192 addresses).


Ok, then I cannot read correctly your code,

these piece of code:

Code:
for (i = 0; i < 4096; i++) 
   ......
    secp256k1_ge_set_gej_zinv( &r[i], &a[i], &az[i]);

doesn't mean:  "compute 4096 inverse field"! Roll Eyes   I definitely misunderstood the meaning of that line. Smiley


At the moment the time cost for generating 16777216 uncompressed public keys is ~ 6 seconds on my machine. Which - if I compute correctly - boils down to ~0.36 us per private key. the other 16 seconds are required for creating a compressed key (negligible), to compute the hash160 of both the uncompressed and the compressed key and to check these hash160s against the bloom filter.

Of these 16seconds, 7s is the two RIPEMD160 computations and 8s the two SHA256 computations. Rest is bloom, startup/init etc.

The bottleneck is then SHA256, you need mining hardware.  Wink
legendary
Activity: 1120
Merit: 1037
฿ → ∞
Hi,

first let me thank you for your input, I very much apreciate that. After developing this thing for over 6 months, I am of course aware about several things (like I don't have to worry about side-channel attack and other security issues). I also believe that there is not much left from the original brainflayer code in the HRD-core generator. The original brainflayer gave me a keygen rate of 520000 keys/s, whereas I am now at 730000 keys/s.

Quote
The time cost for computing one public key given a random private key takes : 47.2 us.

At the moment the time cost for generating 16777216 uncompressed public keys is ~ 6 seconds on my machine. Which - if I compute correctly - boils down to ~0.36 us per private key. the other 16 seconds are required for creating a compressed key (negligible), to compute the hash160 of both the uncompressed and the compressed key and to check these hash160s against the bloom filter.

Of these 16seconds, 7s is the two RIPEMD160 computations and 8s the two SHA256 computations. Rest is bloom, startup/init etc.

Quote
How do you generate each public key (in batchpj)? And how do you convert jacobian coordinates to affine (hrdec_ge_set_all_gej_new(batchpa, batchpj))? How many operations (multiplication, inverse) are involved?

I have posted the code already.

Quote
2 private keys --> 2 public keys   (only 1 inverse)  --> IA = 1/2 = 0.5
-----------------------------------------------------------------------------------------------------------------------------
better:
2 private keys --> 4 private keys (k, n-k, k+1, n-k-1) --> 4 uncompressed public keys (only 1 inverse) --> IA = 0.25
---------------------------------------------------------------------------------------------------------------------------------------------------
much better:
2 private keys --> 4 private keys --> 8 public keys, 4 compressed and 4 uncompressed, 8 addresses (only 1 inverse) --> IA=0.125

If I am not entirely mistaken, I compute 1 inverse each 4096 private keys (8192 addresses).



Rico
legendary
Activity: 1932
Merit: 2077
At the moment, the client precomputes a table with 4096 (uncompressed public keys) with basically this code:
Code:
int
hrdec_pubkey_batch_incr(uchar (*pub)[65],
                        uchar start[32]) {
  unsigned int  i;

  hrdec_mult_gen2(&batchpj[0], start);

  for (i = 1; i < 4096; ++i) {
    hrdec_gej_add_ge(&batchpj[i], &batchpj[i-1], &incr_a);             // increment public key
  }

  hrdec_ge_set_all_gej_new(batchpa, batchpj);                          // convert all jacobian coordinates to affine

  for (i = 0; i < 4096; ++i) {                                         // compute uncompressed public keys
    pub[i][0] = 0x04;
    secp256k1_fe_normalize_var(&batchpa[i].x);
    hrdec_fe_get_b32(pub[i] +  1, &batchpa[i].x);

    secp256k1_fe_normalize_var(&batchpa[i].y);
    hrdec_fe_get_b32(pub[i] + 33, &batchpa[i].y);
  }
 ................
  for (i = 0; i < 4096; i++) {
    r[i].infinity = a[i].infinity;
    secp256k1_ge_set_gej_zinv(&r[i], &a[i], &az[i]);
  }
  free(az);
}

Although I have already optimized the sh*t out of this code, I still feel it's way too clumsy as is and potentially could be improved a lot.
If anyone has any suggestions, I'm all ears.


The code from libsecp256k1 you are using has a different purpose than yours:
Quote
The process of cracking Bitcoin brain wallets is to repeatedly generate public keys using guessed passwords.
Key generation method as we described in 2.1.1, is to compute Q = dP . Here d is a SHA256 hash of the generated password,
P is a xed point which is the base point G for secp256k1...

The time cost for computing one public key given a random private key takes : 47.2 us.
source

You don't have to generate one public key given a random private key,  neither do you generate a lot of public keys from a lot of sparse random keys, but you have to generate a batch of "consecutive" public keys, a very more specific (and less expensive) task. Furthermore you don't have to worry about side-channel attack and other security issues.

How do you generate each public key (in batchpj)? And how do you convert jacobian coordinates to affine (hrdec_ge_set_all_gej_new(batchpa, batchpj))? How many operations (multiplication, inverse) are involved?

Once you have k*G , you have to add only 1*G in each step:  k*G --> (k+1)*G --> (k+2)*G,  .... so the effort should be focused on optimizing the addition '+G':  minimizing number of operations for each addition '+G'

The computation of a multiplicative inverse is the most time consuming operation  (1 inverse = 22 multiplication - see table 2 ), so I would start cutting there:

                                     goal:  less #inverse for each public key  or    better: less #inverse for each address   (IA)

---------------------------------------------------------------------------------------------------------------------------------
For example, given a,b elements of the field:

a*b,  inv(a*b) --> inv(a)=b*inv(a*b),   inv(b)=a*inv(a*b) -->    1 inverse + 3 multiplication  instead of 2 inverse

So:

private key k     -->    k*G    =  (X1,Y1,Z1) (jacobian coordinates)  no inverse
private key k+1 --> k*G + G = (X1,Y1,Z1)+(XG,YG,1)=(X2,Y2,Z2) (jacobian coordinates)  no inverse


Z=Z1*Z2,  inv(Z)  --> inv(Z1)=Z2*inv(Z),  inv(Z2)=Z1*inv(Z)      only 1 inverse

jacobian coordinates (X1,Y1,Z1)  -->   (X1*inv(Z1)^2, Y1*inv(Z1)^3) affine
jacobian coordinates (X2,Y2,Z2)  -->   (X2*inv(Z2)^2, Y2*inv(Z2)^3) affine

2 private keys --> 2 public keys   (only 1 inverse)  --> IA = 1/2 = 0.5
-----------------------------------------------------------------------------------------------------------------------------
better:
2 private keys --> 4 private keys (k, n-k, k+1, n-k-1) --> 4 uncompressed public keys (only 1 inverse) --> IA = 0.25
---------------------------------------------------------------------------------------------------------------------------------------------------
much better:
2 private keys --> 4 private keys --> 8 public keys, 4 compressed and 4 uncompressed, 8 addresses (only 1 inverse) --> IA=0.125
legendary
Activity: 1120
Merit: 1037
฿ → ∞
We know that:

private key k :    --> public key: (X,Y) = k*G
private key n-k :  --> public key: (X,-Y) = (X,p-Y)  (no need for multiplication/inverse anymore)

...

So you can calculate:

(1*G,(n-1)*G),  (2*G,(n-2)*G),  (3*G,(n-3)*G),   .....,   (k*G,(n-k)*G), .... ,   (2^159*G, (n-2^159)*G)

k from 1 to 2^159 and n-1 downto n-2^159

Same amount of keys, half time, double speed.

Very intriguing as for the effective reduction in computational effort per pk, also well manageable from the aspect of interval arithmetics to keep track of "computed areas", because both are continuous. I'll certainly have a more detailed look at this.



Rico
legendary
Activity: 1932
Merit: 2077
We know that:

private key k :    --> public key: (X,Y) = k*G

private key n-k :  --> public key: (X,-Y) = (X,p-Y)  (no need for multiplication/inverse anymore)

Example:

Code:
k:  0xf3bb9dc5a  (65426545754)      
-->
X: 0x6ee5f74076df253037be5956bd169d1ce1a2e586cf3b233df6f9f4a31d9b955b
Y: 0xf9680b09669f0df113b9b0bb4d37b531858a73913ac38025a31d322801d660f0
-->
Address: 1LjMTsSMPrXoFFbiSExHpXsBuDamRxjs8R


n-k: 0xfffffffffffffffffffffffffffffffebaaedce6af48a03bbfd25e7d947c64e7 (115792089237316195423570985008687907852837564279074904382605163141452734948583)
-->
X': 0x6ee5f74076df253037be5956bd169d1ce1a2e586cf3b233df6f9f4a31d9b955b    (X'=X)
Y': 0x697f4f69960f20eec464f44b2c84ace7a758c6ec53c7fda5ce2cdd6fe299b3f  (Y'=p-0xf9680b09669f0df113b9b0bb4d37b531858a73913ac38025a31d322801d660f0)
-->
Address: 1MutomTTupZ1XVvzUmzivGfWH1HCP9Y5wE

So you can calculate:

(1*G,(n-1)*G),  (2*G,(n-2)*G),  (3*G,(n-3)*G),   .....,   (k*G,(n-k)*G), .... ,   (2^159*G, (n-2^159)*G)

k from 1 to 2^159 and n-1 downto n-2^159

Same amount of keys, half time, double speed.
legendary
Activity: 1120
Merit: 1037
฿ → ∞
a) The odds are about the same, only 1 / 2^(256-160) = 1/2^96 it is a original key.    200 leading zeroes or 2^(2^87) have exactly the same chance of happening.

If you look at the statistical distribution of the hamming weight of the private keys, how can you say that 200 leading zeroes are no exceptional (=rare) event? It is true, that given one (unknown) original private key, the chance of finding a supposedly alternate key that is the original is 1/2^96 in both search spaces. But the hypothesis of this actually being an alternate key is much more probable in the "leading zeroes" case.

Code:
2^256        = 115792089237316195423570985008687907853269984665640564039457584007913129639936
ncr(256;128) = 5768658823449206338089748357862286887740211701975162032608436567264518750790
ncr(256;56)  = 1529810590453037906058619994778570990615168461145717234720
2^56         = 72057594037927936

Rico
legendary
Activity: 1932
Merit: 2077

There are two problems. a) the resulting private keys from doubling become statistically indistinguishable from "real non-pathological(*)" private keys and b) the whole interval arithmetic for parallelization of the distribution is moot. At the moment, the client precomputes a table with 4096 (uncompressed public keys) with basically this code:

....

(*) If the pool finds - as it currently does by design - a private key that has e.g. 200 leading zeroes, you can assume that this private key is either "the original key" of an address generated by some bad software/as test or by a very marginal chance it got generated correctly but by accident ended up like that. If - on the other hand - your private key contains around 128 bits set/clear, you can assume it is a correctly generated original key.

a) The odds are about the same, only 1 / 2^(256-160) = 1/2^96 it is a original key.    200 leading zeroes or 2^(2^87) have exactly the same chance of happening.


b) I see your point.

Furthermore,  finding the inverse of an element in finite field is the most of the workload, much more than multiplication.

legendary
Activity: 1120
Merit: 1037
฿ → ∞
I was thinking:

why we calculate: G, G+G, G+G+G, G+G+G+G, ...., 2^160*G (private keys: 1,2,3,4,...., 2^160)

instead of: G, 2*G, 4*G, 8*G, 16*G, ......... 2^(2^160)*G  ?
(private keys: 1,2,4,8,16, .... 2^(2^160) mod p)
...
I understand that point doubling is much cheaper than point addition (the case 2J is the optimal one, secp256k1 library uses J+A)

I know that our points form a finite cyclic group that is isomorphic to the additive group of Z/nZ. So if we use the second succession instead of the first one, we probably cannot obtain the entire group. But you need only 2^160 points, not all 2^256.

Am I wrong?

There are two problems. a) the resulting private keys from doubling become statistically indistinguishable from "real non-pathological(*)" private keys and b) the whole interval arithmetic for parallelization of the distribution is moot. At the moment, the client precomputes a table with 4096 (uncompressed public keys) with basically this code:

Code:
int
hrdec_pubkey_batch_incr(uchar (*pub)[65],
                        uchar start[32]) {
  unsigned int  i;

  hrdec_mult_gen2(&batchpj[0], start);

  for (i = 1; i < 4096; ++i) {
    hrdec_gej_add_ge(&batchpj[i], &batchpj[i-1], &incr_a);             // increment public key
  }

  hrdec_ge_set_all_gej_new(batchpa, batchpj);                          // convert all jacobian coordinates to affine

  for (i = 0; i < 4096; ++i) {                                         // compute uncompressed public keys
    pub[i][0] = 0x04;
    secp256k1_fe_normalize_var(&batchpa[i].x);
    hrdec_fe_get_b32(pub[i] +  1, &batchpa[i].x);

    secp256k1_fe_normalize_var(&batchpa[i].y);
    hrdec_fe_get_b32(pub[i] + 33, &batchpa[i].y);
  }

  return 0;
}

static void hrdec_ge_set_all_gej_new(secp256k1_ge *r, const secp256k1_gej *a) {
  secp256k1_fe *az;
  secp256k1_fe u;
  unsigned int i;

  az = (secp256k1_fe *)malloc(sizeof(secp256k1_fe) * 4096);

  az[0] = a[0].z;

  for (i = 1; i < 4096; i++) {
    secp256k1_fe_mul(&az[i], &az[i - 1], &a[i].z);
  }

  secp256k1_fe_inv_var(&u, &az[--i]);  // sets i to 4095 here

  while (--i) {
    secp256k1_fe_mul(&az[i + 1], &az[i], &u);
    secp256k1_fe_mul(&u, &u, &a[i + 1].z);
  }

  az[0] = u;

  for (i = 0; i < 4096; i++) {
    r[i].infinity = a[i].infinity;
    secp256k1_ge_set_gej_zinv(&r[i], &a[i], &az[i]);
  }
  free(az);
}


Although I have already optimized the sh*t out of this code, I still feel it's way too clumsy as is and potentially could be improved a lot.
If anyone has any suggestions, I'm all ears.



(*) If the pool finds - as it currently does by design - a private key that has e.g. 200 leading zeroes, you can assume that this private key is either "the original key" of an address generated by some bad software/as test or by a very marginal chance it got generated correctly but by accident ended up like that. If - on the other hand - your private key contains around 128 bits set/clear, you can assume it is a correctly generated original key.


Rico
legendary
Activity: 1932
Merit: 2077
What function do you use in your program?

secp256k1_gej_add_ge_var, and mainly in a descendant of the secp256k1_ecmult_gen2 function from here:

https://github.com/ryancdotorg/brainflayer/blob/master/ec_pubkey_fast.c
....

I was thinking:

why we calculate: G, G+G, G+G+G, G+G+G+G, ...., 2^160*G (private keys: 1,2,3,4,...., 2^160)

instead of: G, 2*G, 4*G, 8*G, 16*G, ......... 2^(2^160)*G  ?
(private keys: 1,2,4,8,16, .... 2^(2^160) mod p)

From here :



I understand that point doubling is much cheaper than point addition (the case 2J is the optimal one, secp256k1 library uses J+A)

I know that our points form a finite cyclic group that is isomorphic to the additive group of Z/nZ. So if we use the second succession instead of the first one, we probably cannot obtain the entire group. But you need only 2^160 points, not all 2^256.

Am I wrong?
legendary
Activity: 1120
Merit: 1037
฿ → ∞
root@server-02:~# ./LBC -h
JSON not found - installing it.
Can't locate JSON.pm in @INC (you may need to install the JSON module) (@INC contains: /etc/perl /usr/local/lib/perl/5.18.2 /usr/local/share/perl/5.18.2 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.18 /usr/share/perl/5.18 /usr/local/lib/site_perl .) at ./LBC line 88.
BEGIN failed--compilation aborted at ./LBC line 88.


tried to google it and installed all kind of shit ... nothing helped ... same problem every time. The system is an Ubuntu 14.04 LTS.


I take it, you've read https://lbc.cryptoguru.org/man/admin#on-linux and gcc is installed.

If so, try to install JSON manually by doing

Code:
> cpan

cpan> install JSON

and if it fails, paste the results.


Rico
member
Activity: 62
Merit: 10
root@server-02:~# ./LBC -h
JSON not found - installing it.
Can't locate JSON.pm in @INC (you may need to install the JSON module) (@INC contains: /etc/perl /usr/local/lib/perl/5.18.2 /usr/local/share/perl/5.18.2 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.18 /usr/share/perl/5.18 /usr/local/lib/site_perl .) at ./LBC line 88.
BEGIN failed--compilation aborted at ./LBC line 88.


tried to google it and installed all kind of shit ... nothing helped ... same problem every time. The system is an Ubuntu 14.04 LTS.
legendary
Activity: 1120
Merit: 1037
฿ → ∞
Quote
R9 290X GPU, 4 GB GDDR5, 2,816 processors and 5,600 GFLOPS,
the OpenCL 32-bit implementation uses the 32-bit scalar type of OpenCL 1.2

On a R9 290X, the OpenCL 32-bit implementation performs 1,778,000 scalar multiplications per second, i.e.
one scalar multiplication in 563 nanoseconds

Almost 2 MKeys/s (without sha256/ripmed160) and on another curve (Curve25519).

Is it possible to do better on secp256k1?

It certainly must be, as the lower bound for an OpenCL implementation of key generation on my Notebook GPU (Quadro M2000M - 640 cores) is 15 Mkeys/s. (As I said, vanitygen does that - inefficiently - computing the compressed public key once you have computed the uncompressed is quite easy). My estimate is, that we should be able to squeeze 20 Mkeys/s out of the Quadro M2000M.

My R9-280X does some 29 Mkeys/s (again vanitygen), an optimal implementation should squeeze 38-40 Mkeys/s from that.


Rico


PS: I ditched the field_5x52_asm_impl.h in secp256k1 altogether, optimizing field_5x52_int128_impl.h (a lot). Now it's faster.  Smiley
legendary
Activity: 1932
Merit: 2077
If anyone's interested, from this article:

Quote
R9 290X GPU, 4 GB GDDR5, 2,816 processors and 5,600 GFLOPS,
the OpenCL 32-bit implementation uses the 32-bit scalar type of OpenCL 1.2

On a R9 290X, the OpenCL 32-bit implementation performs 1,778,000 scalar multiplications per second, i.e.
one scalar multiplication in 563 nanoseconds

Almost 2 MKeys/s (without sha256/ripmed160) and on another curve (Curve25519).

Is it possible to do better on secp256k1?
legendary
Activity: 1638
Merit: 1001
Re GPU optimization and CUDA:

Wolf0 spent some time in the Monero thread in May-June 2014 optimizing GPU mining of Monero.  Might be a source of magic beans for this project.  Also DGA was doing a similar thing around the same time/subject.

As an aside, have you considered that the cosmological Big Bang was a result of people long ago searching for hash collisions, and found one?  So maybe a helmet, safety glasses, etc.

legendary
Activity: 1120
Merit: 1037
฿ → ∞

Current generation speed:  23.71 Mkeys/s (*)

very very slow ... w8ing for GPU update ... thats when the real speed will appear .

I could do 5-6 times the current speed alone with GPU ...

If you have something, by all means code and advice is welcome.

I have an unpublishable GPU prototype based on vanitygen. It's basically half the speed of vanitygen (which has 15 MKeys/s on my notebook), as it uses 2 passes, one for compressed and one for uncompressed public keys. Also, it requires around 4GB RAM on startup until the AVL data structures are fiddled up. So on my notebook, the GPU prototype needs around 4GB memory on startup and 3.4 GB in GPU RAM and gives around 7MKeys/s.

Compared to this, SHA256 and RIPEMD160 code from hashcat 3.2 indicates around 150 Mega-hash160(x) per second and I am working on bringing the private key -> x (a.k.a. EC arithmetics) to the GPU also. For that, I need to strip down everything not relevant to generation, wrap my head around OpenCL distribution of the RMD160(SHA256(EC(private key))) x 2

Then I have a GPU implementation I got some time ago from a user - allegedly working, but (according to his own claims) with a bad performance. This solution seems Windows and AMD GPU only, and quite honestly I have not even been able to compile it.

All in all I agree that what we currently see as total pool speed, in a few weeks may come from one mid-range GPU.


Rico
member
Activity: 62
Merit: 10

Current generation speed:  23.71 Mkeys/s (*)

very very slow ... w8ing for GPU update ... thats when the real speed will appear .

I could do 5-6 times the current speed alone with GPU ...
legendary
Activity: 1120
Merit: 1037
฿ → ∞
Time since inception: 6 months
Number of keys generated: 250 trillion
Number of unspents found: 2 (genuine, no bounties etc.)
Total collisions confirmed:   0
Current generation speed:  23.71 Mkeys/s (*)

Updated BLF on server. Upon restart, your clients should auto-update.


(*) The equivalent of about 185000 pages on directory.io  where each address (compressed and uncompressed) checked against ~ 9000000 addresses with unspent outputs - per second.


Rico
legendary
Activity: 1120
Merit: 1037
฿ → ∞
this project is trying to find private keys to random public addresses with btc in it, is it not?

Well, the public addresses appear random, but the private keys "behind them" are not.

E.g. would you know, that
1MaGmdHoZpGzZk8m92rWSJ5LxUGic8RLjv
and
1Gs7mi3eerJwzUrs39fMBa3Uh4skw3ew96

are neighbors? They are at least in one case where their private keys differ by just 1.
Of course you cannot see this until you have computed them. (http://directory.io/1938276999168)

The pool is generating hash160s from simply incrementing the underlying private keys.
And it checks all generated keys against all known hash160 with unspent inputs on them. Yes.

Well - for a very special definition of "simply incrementing":

Quote
i thought that was, for all intents and purposes, impossible due to the truly astronomical odds.
so how is this finding them?

https://lbc.cryptoguru.org/man/theory#what-we-do

One reason why the pool keeps finding is the speed increase. Long term average, the pool is
getting faster and faster. Machines are faster, the generator gets faster, people come and go,
but in the end the more people let the LBC client run, the bigger the impact.

Another reason is, that we may hit private keys that were generated by some badly designed
software, are tests or whatever.

And - of course - another reason may be that this task is not at all so impossible as it is always presented.

Quote
btw love the name.

 Smiley Thanks - me too.


Rico
legendary
Activity: 4354
Merit: 3614
what is this "brake pedal" you speak of?
ok, 1st, im stupid. with that in mind please go easy on me.

this project is trying to find private keys to random public addresses with btc in it, is it not?

i thought that was, for all intents and purposes, impossible due to the truly astronomical odds.

so how is this finding them?

or do i have the whole purpose and results of this project wrong.

btw love the name.
legendary
Activity: 1120
Merit: 1037
฿ → ∞
A legit collision?  Shocked Shocked
To prove there is a collision you've to prove two different inputs result in same output.

This is correct. Right now, all the pool has is one key. Again, a very strange key with many many leading zeroes.
The address found is younger, but still from a time ~3 months before even the theoretical inception of the LBC project.

If we could leave the usual "how do you prove it's not you who deposited..." aside, we could analyze what the pool has found and where to look for that other private key.

On a side note, I would like to mention, that given the speed of the pool at time of inception and the linear search since then, the projected find time of this key would be around ... 19055 days ~52 years in the future.

If we look at the originating address https://blockchain.info/de/address/1C22ZukCKVszze8KXVLfAMRaeqhRzWEZGi

it seems there were a lot of 0.0001 transactions done - mostly to addresses that have significant funds on them. Don't know exactly what this is (begging or whatever), but the address found is different from all the other targets of 1C22ZukCKVszze8KXVLfAMRaeqhRzWEZGi
I'm looking right now which P2SH 378e8af588e6433032d0f5fb548431408c4bcb3c resolves to (hm.. none - so it's standalone).


Rico
Pages:
Jump to: