Pages:
Author

Topic: brute-forcing public keys at amazing speed 2.2 PH/s on CPU [malware warning] - page 8. (Read 3490 times)

member
Activity: 873
Merit: 22
$$P2P BTC BRUTE.JOIN NOW ! https://uclck.me/SQPJk
I published an exe file for Windows:
https://github.com/JeanLucPons/BSGS/releases/tag/1.0

To solve a 110bit puzzle, on my config, it would take a very very long time, ~4 billions years !


Hello respected Jean !!!

Then Lanch release on my 8 Gb memory PC, after memory was full used in release code, sped of working down to 0- zero. Can you please modyfy code for fine wlrking on PC with smole memory then 12 Gb.


?


Thanks

sr. member
Activity: 642
Merit: 316
-snip-
Thanks a lot for release.

Have you tested this release on your config? How is the progress compared with your code?
Jean_Luc program much faster than my!
Because have big base hashrate. I mean for ex. base hashrate for i7 3770 is 14Mkey/s. I can`t get this speed on my program because i use simple language where big integer calculations used 3ty part library. In that case i have low base hashrate due to convertion for integers.
sr. member
Activity: 443
Merit: 350
-snip-
Thanks a lot for release.

Have you tested this release on your config? How is the progress compared with your code?
sr. member
Activity: 462
Merit: 701
Ah, I understand difference bettwen our hashtables.
When you make pubkey you compare pubkey (masked pubkey) with each of element of hashtable( in last optimisation you add sorting for this)
But i try to use other way. i don`t need to compare pubkey with each elements becouse i have table of 2^24
In that case  i need only 1 lookup to the table to get hash and i get result  - hash exist or not and if exist get pointer to memory where stored G counter or few G counters if collision.

Ok, If I understand well your startegy, you face the birthday paradox problem. When you fill your hashtable, you will start to have collision around the sqrt(2^24)=2^12th insertion. So for 2^24 G counters, you will need a hashtable of ~2^48 entries to avoid collision.
jr. member
Activity: 50
Merit: 2
From what I have learnt about bitcoin I can say that deriving a private key from a public key is not possible.
The public key of is derived from a private key but that functionality is not backward compatible according to what I have learnt.
So deriving the private key from public key is not feasible in my opinion.

P.S : I have read this in some article on have watched youtube videos which say the same.

Also, if it is miraculously possible could you please explain it how, in Laymans terms.
sr. member
Activity: 642
Merit: 316
f you use a 24bit hashtable and 2^30 keys for baby step:

You will have (in average) a list of 2^(30-24) = 64 items per hash.
Each items of this list contains the generator index and an additional part of the key.

Code:
typedef struct {
  uint32_t b0; // LSB key (24 bit) + HSB Offset (8 bit)
  uint32_t p;  // LSB Offset
} ENTRY;

The hash is 3 byte of MSB of the key.
b0 is 3 byte of LSB of the key.
So we store 48 bits of the key.
p (offset) is the "generator index" corresponding to the key p.G

Then to access it , you compute the hash (one simple & operation) and you have direct access to this small list.
You can sort each list according to the part of key and you will need 6 operations (in average) to get the generator index.
I did like this in HashTable::Get().

Then as I don't store the full X value of p.G, i have some false collision, than i need to check by a computePublicKey.
A false collision happens every 2^(24+24 - 30) = 2^18 giant steps, that means that I need to perform an extra EC multiplication every 262144 giant steps.

According to the memory you have, you may need to adjust the HASH size, and the number of bit of the key stored in the entry in order to have a minimum of false collision.
Ah, I understand difference bettwen our hashtables.
When you make pubkey you compare pubkey (masked pubkey) with each of element of hashtable( in last optimisation you add sorting for this)
But i try to use other way. i don`t need to compare pubkey with each elements becouse i have table of 2^24
In that case  i need only 1 lookup to the table to get hash and i get result  - hash exist or not and if exist get pointer to memory where stored G counter or few G counters if collision.
sr. member
Activity: 462
Merit: 701
f you use a 24bit hashtable and 2^30 keys for baby step:

You will have (in average) a list of 2^(30-24) = 64 items per hash.
Each items of this list contains the generator index and an additional part of the key.

Code:
typedef struct {
  uint32_t b0; // LSB key (24 bit) + HSB Offset (8 bit)
  uint32_t p;  // LSB Offset
} ENTRY;

The hash is 3 byte of MSB of the key.
b0 is 3 byte of LSB of the key.
So we store 48 bits of the key.
p (offset) is the "generator index" corresponding to the key p.G

Then to access it , you compute the hash (one simple & operation) and you have direct access to this small list.
You can sort each list according to the part of key and you will need 6 operations (in average) to get the generator index.
I did like this in HashTable::Get().

Then as I don't store the full X value of p.G, i have some false collision, than i need to check by a computePublicKey.
A false collision happens every 2^(24+24 - 30) = 2^18 giant steps, that means that I need to perform an extra EC multiplication every 262144 giant steps.

According to the memory you have, you may need to adjust the HASH size, and the number of bit of the key stored in the entry in order to have a minimum of false collision.
sr. member
Activity: 642
Merit: 316
Nice job! Thank you.

You're welcome Smiley

So, your total time for 16 keys is 216 minutes, including 9 minutes for 2^30 precomputations. That means that for 16 keys search you need 207 minutes, or approx. 13 minute per one 64bit key. As i saw from your results, the time per one 64 key varied from 2 min to 19 min.

Avg time 13minute is very good. I guess it is faster than the result reqched by Pollard Kangaroo method sahred in BurtW topic.

If you look at the key who has been solved in 19min, you see that ...7D0E6081C7E0E865 is close to ...8000000000000000 (end range of thread #3) and that it requires a total of 2^33.86 giant steps (max 2^34) so we are close to the end of the range.
It takes ~20 minutes to browse the full 2^64 range, so the theoretical average time to solve a key should be 10min without taking into account the baby step precalculation.

I realy do not understand something))
I try to implement hashtable, because hashtable is more fast then binary search.
So, i use 24bit mask, like in your implementation.
total baby steps = 2^24 = 16777216. Just for test. In real usage it should more.
and i get a lot of collisions when fill table:
Code:
----------HashTable Info----------
Total unique hashes:10604491x12=127253892 bytes
Total items:16777216x4=67108864 bytes
Total 194362756 bytes
Total colisions:6172725
Max. colisions:10
----------------------------------
What i see from info:
there were unique 10604491  24 bit combinations from total of 16777216 combinations
24 bit combinations that has collision was 6172725
and maximum of collision was 10 on some of the hashes
i think it is terrible result of collisions  Shocked
sr. member
Activity: 462
Merit: 701
Nice job! Thank you.

You're welcome Smiley

So, your total time for 16 keys is 216 minutes, including 9 minutes for 2^30 precomputations. That means that for 16 keys search you need 207 minutes, or approx. 13 minute per one 64bit key. As i saw from your results, the time per one 64 key varied from 2 min to 19 min.

Avg time 13minute is very good. I guess it is faster than the result reqched by Pollard Kangaroo method sahred in BurtW topic.

If you look at the key who has been solved in 19min, you see that ...7D0E6081C7E0E865 is close to ...8000000000000000 (end range of thread #3) and that it requires a total of 2^33.86 giant steps (max 2^34) so we are close to the end of the range.
It takes ~20 minutes to browse the full 2^64 range, so the theoretical average time to solve a key should be 10min without taking into account the baby step precalculation.
sr. member
Activity: 642
Merit: 316
I published an exe file for Windows:
https://github.com/JeanLucPons/BSGS/releases/tag/1.0

To solve a 110bit puzzle, on my config, it would take a very very long time, ~4 billions years !

Thanks a lot for release.
sr. member
Activity: 462
Merit: 701
I published an exe file for Windows:
https://github.com/JeanLucPons/BSGS/releases/tag/1.0

To solve a 110bit puzzle, on my config, it would take a very very long time, ~4 billions years !
newbie
Activity: 17
Merit: 25
I did few optimizations (commited to github), I reached 13.6M giant steps of 2^30 per second on my Xeon X5647 2.93GHz with 12GB of RAM.
It solves the 16 keys of the odolvlobo's set in 3:35:53 from scratch, I mean without any precomputation.
If I convert to the Etar's unit: It does 14.6 PKey/s.

I posted the result in the readme of my program:
https://github.com/JeanLucPons/BSGS/blob/master/README.md

Thanks to odolvlobo to provide this challenge.
Best wishes to Etar to optimize his program and fix his issue on GPU.


Thx. How long would it take you to solve 110 bit range key of puzzle transaction?
sr. member
Activity: 443
Merit: 350
I did few optimizations (commited to github), I reached 13.6M giant steps of 2^30 per second on my Xeon X5647 2.93GHz with 12GB of RAM.
It solves the 16 keys of the odolvlobo's set in 3:35:53 from scratch, I mean without any precomputation.
If I convert to the Etar's unit: It does 14.6 PKey/s.
-snip-

Nice job! Thank you.
So, your total time for 16 keys is 216 minutes, including 9 minutes for 2^30 precomputations. That means that for 16 keys search you need 207 minutes, or approx. 13 minute per one 64bit key. As i saw from your results, the time per one 64 key varied from 2 min to 19 min.

Avg time 13minute is very good. I guess it is faster than the result reqched by Pollard Kangaroo method sahred in BurtW topic.
sr. member
Activity: 642
Merit: 316
I did few optimizations (commited to github), I reached 13.6M giant steps of 2^30 per second on my Xeon X5647 2.93GHz with 12GB of RAM.
It solves the 16 keys of the odolvlobo's set in 3:35:53 from scratch, I mean without any precomputation.
If I convert to the Etar's unit: It does 14.6 PKey/s.

I posted the result in the readme of my program:
https://github.com/JeanLucPons/BSGS/blob/master/README.md

Thanks to odolvlobo to provide this challenge.
Best wishes to Etar to optimize his program and fix his issue on GPU.

it was good thread i think.
I think that each of us received something instructive here.
Can you make release of your program? i mean  exe file for windows?
sr. member
Activity: 462
Merit: 701
I did few optimizations (commited to github), I reached 13.6M giant steps of 2^30 per second on my Xeon X5647 2.93GHz with 12GB of RAM.
It solves the 16 keys of the odolvlobo's set in 3:35:53 from scratch, I mean without any precomputation.
If I convert to the Etar's unit: It does 14.6 PKey/s.

I posted the result in the readme of my program:
https://github.com/JeanLucPons/BSGS/blob/master/README.md

Thanks to odolvlobo to provide this challenge.
Best wishes to Etar to optimize his program and fix his issue on GPU.
sr. member
Activity: 462
Merit: 701
How and where can you store all 2^40 keys?

I store only 49bits (25+24) per key (using a HASH_SIZE of 25 in https://github.com/JeanLucPons/BSGS/blob/master/HashTable.h).
I accept to have few false collisions and to make little more calculations to handle them.
Each item in the table is a 64bits item, 40 bit for the generator index and 24 bit for a part of the key.

I choose this maximum of 40 bit (32+8) because more than 2^32 keys is possible but you're right 2^40 can be decreased and few bits can be used for the key and decrease the probability of false collision.

This program is here to be modified and adapted...


sr. member
Activity: 443
Merit: 350
-snip-
I store only 40 bits for the generator index (so 2^40 key max in my hash table) and 24 (+ 25 bits) of the x value. 64 bits per items and an array of 2^25 pointers +  2 * 2^25 short arrays.
-snip-

How and where can you store all 2^40 keys?
sr. member
Activity: 462
Merit: 701
This problem is typically a time memory trade off problem so the goal is to find all best time memory tradeoff.
I store only 40 bits for the generator index (so 2^40 key max in my hash table) and 24 (+ 25 bits) of the x value. 64 bits per items and an array of 2^25 pointers +  2 * 2^25 short arrays.
So i get few false collisions (one every 2^(49-30) giant steps in average) that i need to check but I win lots of memory, my giant step is much larger.
I'm able to browse the 64bit range in 24min with my 9 year old Xeon (12GB)

sr. member
Activity: 642
Merit: 316
I passed my hashtable to a hash size of 25bit (always 2^30 points inside), i win 20% of speed for a 500MB overhead.

Edit:
Concerning GPU the access to global memory is slower than local memory but I'm surprised that it is so slow in your case.
For other programs, i implemented binary search on global memory (millions of items) with very good performances.
In that case a hashtable is better than binary search.

It is clear that if you do not check all 32 bytes, but only 16 bytes or 8 bytes in general, then the speed will be higher. it is natural.
Nevertheless, this does not justify such a sharp drop in the hashrate as the table grows. Indeed, in fact, a 2-fold increase in the table adds only 1 additional sample.

Edit:
if I correctly understood the logic of your code. That is when the public key is calculated. You look at the first 3 bytes of the key in the hash table. If 3 bytes of key are there, then you get offset of start private key>> add offset to start private key>>compute public key>> compare public keys.
But 24 bits is very small - it is a lot of collisions
sr. member
Activity: 462
Merit: 701
I passed my hashtable to a hash size of 25bit (always 2^30 points inside), i win 20% of speed for a 500MB overhead.

Edit:
Concerning GPU the access to global memory is slower than local memory but I'm surprised that it is so slow in your case.
For other programs, i implemented binary search on global memory (millions of items) with very good performances.
In that case a hashtable is better than binary search.
Pages:
Jump to: