Pages:
Author

Topic: brute-forcing public keys at amazing speed 2.2 PH/s on CPU [malware warning] - page 3. (Read 3490 times)

member
Activity: 873
Merit: 22
$$P2P BTC BRUTE.JOIN NOW ! https://uclck.me/SQPJk
arulbero you giant-step code down then not enoth memory to ?

legendary
Activity: 1948
Merit: 2097

warning: #warning "GCC lass than 7.3 detected, upgrade gcc to get best perfromance" [-Wcpp]
   #warning "GCC lass than 7.3 detected, upgrade gcc to get best perfromance"


Yes, do you remenber this ?
We have fighting with a f.ck.ng bug when you have implemented the ModSqr function, i had to add a volatile to prevent gcc to perform optimization !
This volatile is not necessary for gcc > 7.3

Ok, I tried with "unsigned char c" (I removed the check):

from
2**(33.42) keys / (13*60+44)s = 13.947.463 = 14 MKeys/s  your program with volatile

to
2**(32.18) keys / (4*60+12.0)s = 19.308.330 = 19,3 MKeys/s with unsigned char

and the bug is still there, with unsigned char I get this error:

./kangaroo -d 8 input
Kangaroo v1.3
ParsePublicKeyHex: Error invalid public key specified (Not lie on elliptic curve)
input, error line 2: 0459A3BFDAD718C9D3FAC7C187F1139F0815AC5D923910D516E186AFDA28B221DC994327554CED8 87AAE5D211A2407CDD025CFC3779ECB9C9D7F2F1A1DDF3E9FF8
Start:49DCCFD96DC5DF56487436F5A1B18C4F5D34F65DDB48CB5E0000000000000000
Stop :49DCCFD96DC5DF56487436F5A1B18C4F5D34F65DDB48CB5EFFFFFFFFFFFFFFFF

it's weird that the program doesn't stop, I think it should terminate in this case.
sr. member
Activity: 462
Merit: 701

warning: #warning "GCC lass than 7.3 detected, upgrade gcc to get best perfromance" [-Wcpp]
   #warning "GCC lass than 7.3 detected, upgrade gcc to get best perfromance"


Yes, do you remenber this ?
We have fighting with a f.ck.ng bug when you have implemented the ModSqr function, i had to add a volatile to prevent gcc to perform optimization !
This volatile is not necessary for gcc > 7.3
member
Activity: 873
Merit: 22
$$P2P BTC BRUTE.JOIN NOW ! https://uclck.me/SQPJk
Quote from: arulbero
[/quote

Hello Arulbero. Can you modyfi your break-short  code for import pubkeys from .txt files ?

This very hard recompile and edit code for run tests.

Big thank you.

Have a nice time in this thread. Wink
legendary
Activity: 1948
Merit: 2097
I presume the main differences between your ModMul function (if it is still the same you give me) and mine are in:
imm_umul() where I use intrinsic functions and where you have used inline assembly.
On windows it is not possible to do inline assembly, gcc support intrinsic functions so it should generate a good code.
I really don't want to fight with inline assembly on Linux and this horrible AT&T syntax, i did few ones where no intrinsic exists, this was a nightmare...


Yes, you're right.
Besides when I compile I get this message:

warning: #warning "GCC lass than 7.3 detected, upgrade gcc to get best perfromance" [-Wcpp]
   #warning "GCC lass than 7.3 detected, upgrade gcc to get best perfromance"

Maybe with a new version of gcc the performance of your function would be better on my system.



I replaced ModSquareK1(Int *a) too:

./kangaroo -d 8 input
Kangaroo v1.3
Start:49DCCFD96DC5DF56487436F5A1B18C4F5D34F65DDB48CB5E0000000000000000
Stop :49DCCFD96DC5DF56487436F5A1B18C4F5D34F65DDB48CB5EFFFFFFFFFFFFFFFF
Keys :1
Number of CPU thread: 8
Range width: 2^64
Number of random walk: 2^13.00 (Max DP=17)
DP size: 8 [0xff00000000000000]
SolveKeyCPU Thread 0: 1024 kangaroos
SolveKeyCPU Thread 2: 1024 kangaroos
SolveKeyCPU Thread 3: 1024 kangaroos
SolveKeyCPU Thread 1: 1024 kangaroos
SolveKeyCPU Thread 7: 1024 kangaroos
SolveKeyCPU Thread 4: 1024 kangaroos
SolveKeyCPU Thread 6: 1024 kangaroos
SolveKeyCPU Thread 5: 1024 kangaroos
[24.33 MKey/s][GPU 0.00 MKey/s][Count 2^33.42][Dead 2][08:56][3440.8MB]  
Key# 0 Pub:  0x0259A3BFDAD718C9D3FAC7C187F1139F0815AC5D923910D516E186AFDA28B221DC
       Priv: 0x49DCCFD96DC5DF56487436F5A1B18C4F5D34F65DDB48CB5EBB3EF3883C1866D4

Done: Total time 09:15


from 16 to 24 MKeys/s, +50% ...




EDIT:

2**(33.42) keys / (13*60+44)s = 13.947.463 = 14 MKeys/s  your program

2**(32.47) keys / (5*60+26)s = 18.248.465 = 18,2 MKeys/s  ModMul replaced

2**(33.42) keys / (9*60+15)s = 20707585 = 20,7 MKeys/s  ModMul+ModSqr replaced

2**(33.24) keys / (7*60+58)s = 21223116 = 21,2 MKeys/s  ModMul+ModSqr+ModSub replaced


from 14 to 21,2 MKeys/s: +50%

sr. member
Activity: 462
Merit: 701
I presume the main differences between your ModMul function (if it is still the same you give me) and mine are in:
imm_umul() where I use intrinsic functions and where you have used inline assembly.
On windows it is not possible to do inline assembly, gcc support intrinsic functions so it should generate a good code.
I really don't want to fight with inline assembly on Linux and this horrible AT&T syntax, i did few ones where no intrinsic exists, this was a nightmare...
legendary
Activity: 1948
Merit: 2097
./kangaroo -d 8 input
Kangaroo v1.3
Start:49DCCFD96DC5DF56487436F5A1B18C4F5D34F65DDB48CB5E0000000000000000
Stop :49DCCFD96DC5DF56487436F5A1B18C4F5D34F65DDB48CB5EFFFFFFFFFFFFFFFF
Keys :1
Number of CPU thread: 8
Range width: 2^64
Number of random walk: 2^13.00 (Max DP=17)
DP size: 8 [0xff00000000000000]
SolveKeyCPU Thread 4: 1024 kangaroos
SolveKeyCPU Thread 1: 1024 kangaroos
SolveKeyCPU Thread 5: 1024 kangaroos
SolveKeyCPU Thread 2: 1024 kangaroos
SolveKeyCPU Thread 3: 1024 kangaroos
SolveKeyCPU Thread 7: 1024 kangaroos
SolveKeyCPU Thread 0: 1024 kangaroos
SolveKeyCPU Thread 6: 1024 kangaroos
[16.09 MKey/s][GPU 0.00 MKey/s][Count 2^33.42][Dead 3][13:26][3423.9MB]  
Key# 0 Pub:  0x0259A3BFDAD718C9D3FAC7C187F1139F0815AC5D923910D516E186AFDA28B221DC
       Priv: 0x49DCCFD96DC5DF56487436F5A1B18C4F5D34F65DDB48CB5EBB3EF3883C1866D4

Done: Total time 13:44

with my function in IntMod.cpp:

./kangaroo -d 8 input
Kangaroo v1.3
Start:49DCCFD96DC5DF56487436F5A1B18C4F5D34F65DDB48CB5E0000000000000000
Stop :49DCCFD96DC5DF56487436F5A1B18C4F5D34F65DDB48CB5EFFFFFFFFFFFFFFFF
Keys :1
Number of CPU thread: 8
Range width: 2^64
Number of random walk: 2^13.00 (Max DP=17)
DP size: 8 [0xff00000000000000]
SolveKeyCPU Thread 5: 1024 kangaroos
SolveKeyCPU Thread 3: 1024 kangaroos
SolveKeyCPU Thread 1: 1024 kangaroos
SolveKeyCPU Thread 0: 1024 kangaroos
SolveKeyCPU Thread 2: 1024 kangaroos
SolveKeyCPU Thread 6: 1024 kangaroos
SolveKeyCPU Thread 4: 1024 kangaroos
SolveKeyCPU Thread 7: 1024 kangaroos
[21.08 MKey/s][GPU 0.00 MKey/s][Count 2^32.47][Dead 0][05:16][1774.8MB]  
Key# 0 Pub:  0x0259A3BFDAD718C9D3FAC7C187F1139F0815AC5D923910D516E186AFDA28B221DC
       Priv: 0x49DCCFD96DC5DF56487436F5A1B18C4F5D34F65DDB48CB5EBB3EF3883C1866D4

Done: Total time 05:26
sr. member
Activity: 462
Merit: 701
Hi arulbero Smiley

The key you tested are
  ...7F2F1A1DDF3E9FF8 so near the worst case if you run with 8 thread.
or ...BB3EF3883C1866D4 idem

I didn't optimized this program up to the ninja level, I spend time on the Kangaroo one Wink
legendary
Activity: 1948
Merit: 2097
I published an exe file for Windows:
https://github.com/JeanLucPons/BSGS/releases/tag/1.0

Hi, I tested on my ubuntu 17.04 (and a mobile cpu: Intel(R) Xeon(R) CPU E3-1505M v6 @ 3.00GHz) your BSGS program with a 64 bit key:

Code:
input file
40000000
49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5e0000000000000000
49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5effffffffffffffff
0459A3BFDAD718C9D3FAC7C187F1139F0815AC5D923910D516E186AFDA28B221DC994327554CED887AAE5D211A2407CDD025CFC3779ECB9C9D7F2F1A1DDF3E9FF8

Results:

Key# 0 Pub:  0x0259A3BFDAD718C9D3FAC7C187F1139F0815AC5D923910D516E186AFDA28B221DC
       Priv: 0x49DCCFD96DC5DF56487436F5A1B18C4F5D34F65DDB48CB5EBB3EF3883C1866D4

Done: Total time 22:49


Then I replaced only this function void void Int::ModMulK1(Int *a, Int *b) with a my function and I get this result:

Key# 0 Pub:  0x0259A3BFDAD718C9D3FAC7C187F1139F0815AC5D923910D516E186AFDA28B221DC
       Priv: 0x49DCCFD96DC5DF56487436F5A1B18C4F5D34F65DDB48CB5EBB3EF3883C1866D4

Done: Total time 20:00


Time ago I did a my version of the break-short program:

time ./break-short
Build Hash
Search Keys
Error!!! False collision!
Error!!! False collision!
Build Hash
Search Keys
Error!!! False collision!
Found private key  1: 49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5ebb3ef3883c1866d4

real   11m58,270s

A note: your program uses 9 GB RAM and 8 cores (for the giant steps), my version of break-short uses 8 GB RAM and only 1 core.
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

When I use 18 GB RAM:

Code:
input file
80000000
49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5e0000000000000000
49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5effffffffffffffff
0459A3BFDAD718C9D3FAC7C187F1139F0815AC5D923910D516E186AFDA28B221DC994327554CED887AAE5D211A2407CDD025CFC3779ECB9C9D7F2F1A1DDF3E9FF8

your program:

Key# 0 Pub:  0x0259A3BFDAD718C9D3FAC7C187F1139F0815AC5D923910D516E186AFDA28B221DC
       Priv: 0x49DCCFD96DC5DF56487436F5A1B18C4F5D34F65DDB48CB5EBB3EF3883C1866D4

Done: Total time 18:19

your program with my function:

Key# 0 Pub:  0x0259A3BFDAD718C9D3FAC7C187F1139F0815AC5D923910D516E186AFDA28B221DC
       Priv: 0x49DCCFD96DC5DF56487436F5A1B18C4F5D34F65DDB48CB5EBB3EF3883C1866D4

Done: Total time 17:15


my version of break-short program (16 GB RAM):

time ./break-short
Build Hash
Search Keys
Error!!! False collision!
Found private key  1: 49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5ebb3ef3883c1866d4

real   5m54,964s

If I could use parallel programming and exploit all my 8 cores, I think it would take probably less than 3 minutes to crack a 64 bit key only with a cpu.
sr. member
Activity: 462
Merit: 701
Your empirical tests actually showed the results very closed to benchmark for Kangaroo known algorithm.
Yes, the 2*sqrt(n) was for simple Kangaroo method.

The test i did was a simple calculation on a classical birthday paradox on 2 tables instead of one. I draw alternatively a random number in [0..N[ in each table and look for collision between the 2 tables. This converges to sqrt(PI.N). sqrt(PI/2.N) on a single table.

Now, I'm doing tests a in real situation, 1024 kangaroos, 10000 40 bit keys informally distributed, for the moment it seems to converge to 2.sqrt(N) + nbKangaroo.2^dp as expected, with dp=7 on 40bit search. In this situation (1024 << sqrt(N) - 2^dp), the approximation using the asymptote is good.

How do you tink, will it be faster?

Thanks for the reading.
No I don't think it will be faster

Example for Np=2, you have 2 ranges of w/2 so you perform a total of operation:

2.sqrt(w/2) + 2.sqrt(w/2) = 4/sqrt(2).sqrt(w) which is more than 2.sqrt(w)



sr. member
Activity: 443
Merit: 350
-snip-
It ends in sqrt(PI).sqrt(n) ~= 1.772.sqrt(n)

Note that the precision afer 1.000.000 trials is about 1/sqrt(1.000.000)=0.001 so 3 digits are expected to be good.
2^0.825836 = 1.77256  , sqrt(PI) = 1.77245

Your empirical tests actually showed the results very closed to benchmark for Kangaroo known algorithm.
Yes, the 2*sqrt(n) was for simple Kangaroo method.

This is a very nice reading as well: https://www.math.auckland.ac.nz/~sgal018/crypto-book/ch14.pdf

For three-kangaroo it is 1.818*sqrt(n) - Excercise 14.5.11 in reading above
For four-kkangaroo it is 1.714*sqrt(n) - Excercise  14.5.12 in reading above

You are using herds of kangaroos, so it is not simple one.

Can you please also have a look at the 14.6 section of the reading above? There is a Distributed Kanagroo Algorithm: let Np is the number of processors; [0, w] is the range. They divide the interval into Np sub-intervals of size w/Np and then run the kangaroo algorithm in parallel on each sub-interval.
The solution is to use a herd of Np/2 tame kangaroos and a herd of Np/2 wild kangaroos. This gives an algorithm with running time O(sqrt(w/Np)).

How do you tink, will it be faster?

sr. member
Activity: 462
Merit: 701
I did a test, wanting to know the average time of the birthday paradox when searching collision between 2 tables (like the kangaroo problem).

The kangaroo method is announced to be 2.sqrt(n) but this is for a simple 2 stages algorithm where:
 - you first travel a single tame kangaroo sqrt(n) steps to setup a trap
 - then you make steps with the wild until a collision occurs (it falls in the trap), this second stage is expected to end in sqrt(n) steps.
The factor 2 comes from that. There no need of a hashtable there.


I did a test on 1.000.000 searches (24bits) using a table:

[ 999999] (2^12.825836) (Theoretical 2^12.825748)

It ends in sqrt(PI).sqrt(n) ~= 1.772.sqrt(n)

Note that the precision afer 1.000.000 trials is about 1/sqrt(1.000.000)=0.001 so 3 digits are expected to be good.

2^0.825836 = 1.77256  , sqrt(PI) = 1.77245
member
Activity: 873
Merit: 22
$$P2P BTC BRUTE.JOIN NOW ! https://uclck.me/SQPJk
Jean luc when do you add multi pub key search and how would it be work?


I would like first to finalize the calculation of estimated memory and operation (important to tune the dp) and the save/load/merge work.
Then I will implement multiple key. The basic idea for multiple keys is to start wild kangaroos with different keys.



Beeeeeegest thank you Jean_Luc.  I think your relise will be great again Smiley)) like all your previous relises.
member
Activity: 873
Merit: 22
$$P2P BTC BRUTE.JOIN NOW ! https://uclck.me/SQPJk
OK but can you maybe add first multi pub key search and after that you can optimize all together?

What pubkey you interested ? Mining pubkey in very hard task I think, and very needed first.
member
Activity: 873
Merit: 22
$$P2P BTC BRUTE.JOIN NOW ! https://uclck.me/SQPJk
OK but can you maybe add first multi pub key search and after that you can optimize all together?

Helli. You have multi GPU balid. You can work from one to next target fine. Other members in this thread have only one GPU !!! I think this thread for no brain-less people with single GPU bitcoin miners Wink Have a nice day.

P.s. Luc, please do not share code for multy key finder. In future year, after we are making all test s ;-)share sach code will be  fine I thinck.
full member
Activity: 431
Merit: 105
hi Jean_Luc.

was wondering if it was possible to have a verbose method available, to see a bit more details. about wich range or space it is at.

thanks a lot great software very fast. especially 1.3 Cool
jr. member
Activity: 91
Merit: 3
OK but can you maybe add first multi pub key search and after that you can optimize all together?
sr. member
Activity: 462
Merit: 701
Jean luc when do you add multi pub key search and how would it be work?

I would like first to finalize the calculation of estimated memory and operation (important to tune the dp) and the save/load/merge work.
Then I will implement multiple key. The basic idea for multiple keys is to start wild kangaroos with different keys.
jr. member
Activity: 91
Merit: 3
Jean luc when do you add multi pub key search and how would it be work?
sr. member
Activity: 462
Merit: 701
Yes you're right but I have no conversion routine from int256 to double.
And the precision should be enough for the floor function, it is important that the program runs with the good range as it is use for the jump table. I will try to improve things. Still fighting with the memory and operation estimation...
Pages:
Jump to: