Author

Topic: Pollard's kangaroo ECDLP solver - page 142. (Read 60189 times)

sr. member
Activity: 443
Merit: 350
May 04, 2020, 09:40:02 AM
#72
-snip-
I committed on github.
GPU is not yet ready but CPU works (at least for 40bits)
R
Code:
pons@linpons:~/Kangaroo$ ./kangaroo -d 5 VC_CUDA8/in40_1000.txt
It runs in stat mode.
If you want to make test.

Yes, I made a test with your last commit on CPU (16 threads) Intel Xeon 2Ghz: 1000 keys of 40bit range were found for 1:39 min Smiley So, this is 99 seconds for 1000 keys (10 keys for 1 second)

Code:
Kangaroo v1.4gamma
Start:62CE27C8FED90758A834C2CB6E3F19BC8A0B5E7D92C0FC0F57004A0000000000
Stop :62CE27C8FED90758A834C2CB6E3F19BC8A0B5E7D92C0FC0F57004AFFFFFFFFFF
Keys :1000
Number of CPU thread: 16
Range width: 2^40
Jump Avg distance: 2^18.99
Number of kangaroos: 2^14.00
Suggested DP: 5
Expected operations: 2^21.00
Expected RAM: 6.0MB
DP size: 5 [0xf800000000000000]
SolveKeyCPU Thread 0: 1024 kangaroos
SolveKeyCPU Thread 3: 1024 kangaroos
SolveKeyCPU Thread 1: 1024 kangaroos
SolveKeyCPU Thread 4: 1024 kangaroos
SolveKeyCPU Thread 2: 1024 kangaroos
SolveKeyCPU Thread 5: 1024 kangaroos
SolveKeyCPU Thread 7: 1024 kangaroos
SolveKeyCPU Thread 8: 1024 kangaroos
SolveKeyCPU Thread 9: 1024 kangaroos
SolveKeyCPU Thread 11: 1024 kangaroos
SolveKeyCPU Thread 6: 1024 kangaroos
SolveKeyCPU Thread 10: 1024 kangaroos
SolveKeyCPU Thread 12: 1024 kangaroos
SolveKeyCPU Thread 13: 1024 kangaroos
SolveKeyCPU Thread 14: 1024 kangaroos
SolveKeyCPU Thread 15: 1024 kangaroos

Key# 0 [1S]Pub:  0x022FF363EBC536B7CF20DECD41F0B86B90992937B4E38D9F616C7F980AACD9C922
       Priv: 0x62CE27C8FED90758A834C2CB6E3F19BC8A0B5E7D92C0FC0F57004A31CED094F0
[  0] 2^20.958 Dead:8 Avg:2^20.958 (2^21.000)

Key# 1 [1S]Pub:  0x0290E532FA5DABF668B4EC1F7FC4727AFEB1A4AE3FB3009E74045797B7C55C12A7
       Priv: 0x62CE27C8FED90758A834C2CB6E3F19BC8A0B5E7D92C0FC0F57004A31D2F397AF
[  1] 2^21.390 Dead:12 Avg:2^21.190 (2^21.000)

Key# 2 [1N]Pub:  0x02032B8F3648BBC1D1666CAC1D0100462913C72E8736DE615C807CC13B78A63FFA
       Priv: 0x62CE27C8FED90758A834C2CB6E3F19BC8A0B5E7D92C0FC0F57004A00A71C8425
[  2] 2^20.854 Dead:6 Avg:2^21.087 (2^21.000)

Key# 3 [1N]Pub:  0x03FB6688436949D86D446A2CEF4A66958D99490064F0BF9D7F677F83EC867F9B46
       Priv: 0x62CE27C8FED90758A834C2CB6E3F19BC8A0B5E7D92C0FC0F57004A73420E46DF
[  3] 2^21.579 Dead:29 Avg:2^21.226 (2^21.000)

Key# 4 [1N]Pub:  0x02CBCACB58AF8BFD517BDAE6363C3E050E988E46A222F79BAEE88A512D6B885F36
       Priv: 0x62CE27C8FED90758A834C2CB6E3F19BC8A0B5E7D92C0FC0F57004ACEF20EBCFD
[  4] 2^20.672 Dead:9 Avg:2^21.131 (2^21.000)

.............................................................
deleted due to message size restricrtions
.............................................................

Key#994 [1S]Pub:  0x038926680E04C0622E1DC996AEC5B7AF1CCFAB9186BC801D6563C67A9E6CD836B3
       Priv: 0x62CE27C8FED90758A834C2CB6E3F19BC8A0B5E7D92C0FC0F57004AF7B4AE2C1F
[994] 2^21.280 Dead:12 Avg:2^20.792 (2^21.000)

Key#995 [1N]Pub:  0x0260FA97C34F238CB14858BF439CBCE84C8BC0FAFA7B9F6B693A5ADF7E8ADB8648
       Priv: 0x62CE27C8FED90758A834C2CB6E3F19BC8A0B5E7D92C0FC0F57004AD8100D4FFE
[995] 2^20.110 Dead:2 Avg:2^20.792 (2^21.000)

Key#996 [1N]Pub:  0x03E7E47031870D9E1E028E7996B2FB14E494E659399F0C21698C9410451E85E97B
       Priv: 0x62CE27C8FED90758A834C2CB6E3F19BC8A0B5E7D92C0FC0F57004ADC798EE740
[996] 2^21.098 Dead:12 Avg:2^20.792 (2^21.000)

Key#997 [1N]Pub:  0x025F07F6D4F63D6FEB1857B3E82080C1EF18DDCB535C8698215B2B4CCA0A2E2302
       Priv: 0x62CE27C8FED90758A834C2CB6E3F19BC8A0B5E7D92C0FC0F57004AD23E45CBA9
[997] 2^20.731 Dead:11 Avg:2^20.792 (2^21.000)

Key#998 [1S]Pub:  0x039F0D9D4A0ECEB0EA57BB3183A4C691B2D1568C819386777F8DB1E91763607BD7
       Priv: 0x62CE27C8FED90758A834C2CB6E3F19BC8A0B5E7D92C0FC0F57004AB7B2FEFC39
[998] 2^20.949 Dead:7 Avg:2^20.792 (2^21.000)

Key#999 [1S]Pub:  0x034D3E8FB5B7D9FA27B207BB9A7EE0D43D30021682965BC3D495DFEC7C40D1ABB6
       Priv: 0x62CE27C8FED90758A834C2CB6E3F19BC8A0B5E7D92C0FC0F57004A0074C8377A
[999] 2^21.273 Dead:21 Avg:2^20.793 (2^21.000)

Done: Total time 01:39

Full test results are here (valid for 180 days only): https://pastebin.com/JFkpvhAA

Maybe this will be helpful for you, Jean_Luc
sr. member
Activity: 462
Merit: 701
May 04, 2020, 08:41:39 AM
#71
Yes, these estimations are trickly to calculate, i will try improve things...

Impressive result with the 8 RTX 2080 Ti Cheesy
member
Activity: 144
Merit: 10
May 04, 2020, 08:37:46 AM
#70
@Jean_Luc

On the GPU side of things, I get different estimations for "Suggested DP", "Expected operations", and "Expected RAM" for the same problem/interval size on machines with dissimilar hardware configurations, please see below. It may be also helpful to add standard estimations based on the interval size alone. For example, on one my of servers configured with 8x RTX 2080 Ti while using the program defaults, the performance in terms iterations per second quickly degrades from ~10G to less than ~1G.

Thanks

8x RTX 2080 Tis with PL=240W:
Code:
./Kangaroo14 -t 0 -d 20 -gpu -gpuId 0,1,2,3,4,5,6,7 in85.txt
Kangaroo v1.4beta
Start:FFFFFFFFFFFFFFFFFFFFF
Stop :1FFFFFFFFFFFFFFFFFFFFF
Keys :1
Number of CPU thread: 0
Range width: 2^85
Jump Avg distance: 2^41.99
Number of kangaroos: 2^24.09
Suggested DP: 17
Expected operations: 2^44.36
Expected RAM: 1646.6MB
DP size: 20 [0xfffff00000000000]
GPU: GPU #1 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used)
SolveKeyGPU Thread GPU#1: creating kangaroos...
GPU: GPU #2 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used)
SolveKeyGPU Thread GPU#2: creating kangaroos...
GPU: GPU #5 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used)
SolveKeyGPU Thread GPU#5: creating kangaroos...
GPU: GPU #6 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used)
SolveKeyGPU Thread GPU#6: creating kangaroos...
GPU: GPU #3 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used)
SolveKeyGPU Thread GPU#3: creating kangaroos...
GPU: GPU #0 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used)
SolveKeyGPU Thread GPU#0: creating kangaroos...
GPU: GPU #7 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used)
SolveKeyGPU Thread GPU#7: creating kangaroos...
GPU: GPU #4 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used)
SolveKeyGPU Thread GPU#4: creating kangaroos...
SolveKeyGPU Thread GPU#7: 2^21.09 kangaroos in 19929.3ms
SolveKeyGPU Thread GPU#2: 2^21.09 kangaroos in 20113.4ms
SolveKeyGPU Thread GPU#5: 2^21.09 kangaroos in 20136.4ms
SolveKeyGPU Thread GPU#6: 2^21.09 kangaroos in 20160.7ms
SolveKeyGPU Thread GPU#4: 2^21.09 kangaroos in 20179.5ms
SolveKeyGPU Thread GPU#0: 2^21.09 kangaroos in 20247.4ms
SolveKeyGPU Thread GPU#3: 2^21.09 kangaroos in 22114.7ms
SolveKeyGPU Thread GPU#1: 2^21.09 kangaroos in 22203.1ms
[9802.06 MK/s][GPU 9802.06 MK/s][Count 2^44.27][Dead 2][41:14 (Avg 38:27)][1550.0MB]
Key# 0 Pub:  0x0329C4574A4FD8C810B7E42A4B398882B381BCD85E40C6883712912D167C83E73A
       Priv: 0x11720C4F018D51B8CEBBA8

Done: Total time 41:48

1x RTX 2070 with PL=120W:
Code:
./Kangaroo14 -t 0 -d 20 -gpu -gpuId 0 in85.txt
Kangaroo v1.4beta
Start:FFFFFFFFFFFFFFFFFFFFF
Stop :1FFFFFFFFFFFFFFFFFFFFF
Keys :1
Number of CPU thread: 0
Range width: 2^85
Jump Avg distance: 2^42.03
Number of kangaroos: 2^20.17
Suggested DP: 21
Expected operations: 2^43.65
Expected RAM: 1003.7MB
DP size: 20 [0xfffff00000000000]
GPU: GPU #0 GeForce RTX 2070 (36x64 cores) Grid(72x128) (117.0 MB used)
SolveKeyGPU Thread GPU#0: creating kangaroos...
SolveKeyGPU Thread GPU#0: 2^20.17 kangaroos in 9119.2ms
[768.80 MK/s][GPU 768.80 MK/s][Count 2^43.99][Dead 3][07:13:01 (Avg 04:58:44)][1278.3MB]
Key# 0 Pub:  0x0329C4574A4FD8C810B7E42A4B398882B381BCD85E40C6883712912D167C83E73A
       Priv: 0x11720C4F018D51B8CEBBA8

Done: Total time 07:13:16


sr. member
Activity: 462
Merit: 701
May 04, 2020, 08:17:47 AM
#69
Thanks for the benchmark Wink

I committed on github.
GPU is not yet ready but CPU works (at least for 40bits)
R

Code:
pons@linpons:~/Kangaroo$ ./kangaroo -d 5 VC_CUDA8/in40_1000.txt

It runs in stat mode.
If you want to make test.
sr. member
Activity: 443
Merit: 350
May 04, 2020, 07:56:55 AM
#68
This is a benchmark post with speeds for Kangaroo GPU Solver. All the tests were made with default DP and default grid size (calculated by a program). I guess that some plays with DP and grid size could change (increase or decrease) the speed. If somebody knows the optimal values, please let us know.

Code:
 Card Model           Grid size      DP        Tested speed
---------------------------------------------------------------
GTX 1050 Ti          Grid(12x256)   DP 16     115 MKey/sec
GTX 1080 Ti          Grid(56x256)   DP 15     500 MKey/sec
Tesla T4 16Gb        Grid(80x128)   DP 14     565 MKey/sec
RTX 2080ti 11Gb      Grid(136x128)  DP 13     1225 MKey/sec
Tesla V100 32Gb      Grid(160x128)  DP 13     1420 MKey/sec
---------------------------------------------------------------

It looks like the Kangaroo solver works fine with GeForce family cards but not with Quadro/NVS family. For example Tesla V100 is 4-5 times more expensive than RTX 2080ti, but has the speed only 15% higher.

And we also can learn from this table that the most efficient card for kangaroo solver is RTX 2080ti 11Gb which has the best performance in terms of speed/cost.

PS. Tests were made for 10 keys with 2^72 range width

EDIT: some spelling mistakes were corrected
sr. member
Activity: 443
Merit: 350
May 04, 2020, 07:37:00 AM
#67
While the code is being optimized with the random walks, I have some more tests with 72bit key in order to know the speed rate per second for different GPU devices.

Tesla T4 16Gb has 565MKey/sec
RTX 2080ti 11Gb has 1225MKey/sec

Tesla T4 seems to be close to 1080ti results, but 2080ti is close to Tesla V100.

Tesla T4
Code:
$ ./kangaroo -gpu -t 0 in72.txt
Kangaroo v1.4beta
Start:59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D1000000000000000000
Stop :59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D1FFFFFFFFFFFFFFFFFF
Keys :10
Number of CPU thread: 0
Range width: 2^72
Jump Avg distance: 2^36.05
Number of kangaroos: 2^20.32
Suggested DP: 14
Expected operations: 2^37.15
Expected RAM: 710.0MB
DP size: 14 [0xfffc000000000000]
GPU: GPU #0 Tesla T4 (40x64 cores) Grid(80x128) (129.0 MB used)
SolveKeyGPU Thread GPU#0: creating kangaroos...
SolveKeyGPU Thread GPU#0: 2^20.32 kangaroos in 7606.8ms
[566.64 MK/s][GPU 566.64 MK/s][Count 2^37.26][Dead 1][05:32 (Avg 04:28)][772.3MB]
Key# 0 Pub:  0x038F63B86D8EE91D4B78FF4680F927DCC7754CF734A386ED5FA45E71DE9328F433
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D102F962838A4EE7364E
[566.40 MK/s][GPU 566.40 MK/s][Count 2^36.95][Dead 0][04:38 (Avg 04:28)][621.6MB]
Key# 1 Pub:  0x0389044AFFFD381B496D63F8C80CDAAB5E57E40DEE6A56C8AFA5194B1FFD83FEBB
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D1F5821BA583622EEE49
[563.86 MK/s][GPU 563.86 MK/s][Count 2^36.44][Dead 0][03:18 (Avg 04:30)][439.2MB]
Key# 2 Pub:  0x0338E88602F88C3268C68552C4C53987F41BB42335A8E36658A80D2F5BDF63615B
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D110DC466D7E3D293A10
[566.37 MK/s][GPU 566.37 MK/s][Count 2^38.16][Dead 3][10:27 (Avg 04:28)][1431.0MB]
Key# 3 Pub:  0x026776529C6C8932ABF9DCFDCB2DB2784DCE82164914D4C3294FECFE1B48F3BF27
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D1DE55BDE3E1B2F36A9B
[567.90 MK/s][GPU 567.90 MK/s][Count 2^37.10][Dead 2][05:06 (Avg 04:28)][693.4MB]
Key# 4 Pub:  0x03312940E0EE296C23B1E7888A4D23FED01358C1021FE2091D909B0A8D8AA80DE1
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D11B78CA8A9FBCDBDE3E
[567.67 MK/s][GPU 567.67 MK/s][Count 2^37.89][Dead 5][08:45 (Avg 04:28)][1194.9MB]
Key# 5 Pub:  0x020097A3826A7BC1EC5383AE390EEA8C436B2B6D2E3770493F7E993B164C9F233E
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D152477D7EC473A483D4
[568.12 MK/s][GPU 568.12 MK/s][Count 2^37.69][Dead 1][07:34 (Avg 04:27)][1036.6MB]
Key# 6 Pub:  0x03DAE902F3F4E0A62AA7DF422B2BF802CD9ADA747177F853390BEB7F203E4C3F84
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D10D2AB0CDE883E6B3BC
[568.25 MK/s][GPU 568.25 MK/s][Count 2^37.42][Dead 0][06:18 (Avg 04:27)][860.3MB]
Key# 7 Pub:  0x023C4E51FD6EB029CFD4DDCBD93FAD060C7D026D252EC37923355D1B192C69B9E3
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D1B69F2EC54CEEBA50C4
[566.46 MK/s][GPU 566.46 MK/s][Count 2^37.77][Dead 1][08:01 (Avg 04:28)][1096.9MB]
Key# 8 Pub:  0x03C3112770BC8455596AF38A2A2E3F556B1F02758945608364C3AFD22FAD806B8D
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D18D06BC6E25E5C7EAEF
[565.15 MK/s][GPU 565.15 MK/s][Count 2^36.78][Dead 0][04:06 (Avg 04:29)][554.3MB]
Key# 9 Pub:  0x0264AE637A90AA93798E7BA6CCFD57CB077BCCAE49D3AB76F6857CA044A423E3AC
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D1C8A4760400224E8C05

Done: Total time 01:04:59

RTX 2080ti
Code:
$ ./kangaroo -gpu -t 0 in72.txt
Kangaroo v1.4beta
Start:59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D1000000000000000000
Stop :59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D1FFFFFFFFFFFFFFFFFF
Keys :10
Number of CPU thread: 0
Range width: 2^72
Jump Avg distance: 2^36.00
Number of kangaroos: 2^21.09
Suggested DP: 13
Expected operations: 2^37.15
Expected RAM: 1419.0MB
DP size: 13 [0xfff8000000000000]
GPU: GPU #0 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used)
SolveKeyGPU Thread GPU#0: creating kangaroos...
SolveKeyGPU Thread GPU#0: 2^21.09 kangaroos in 15220.1ms
[1229.83 MK/s][GPU 1229.83 MK/s][Count 2^37.86][Dead 4][03:52 (Avg 02:03)][2330.7MB]
Key# 0 Pub:  0x038F63B86D8EE91D4B78FF4680F927DCC7754CF734A386ED5FA45E71DE9328F433
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D102F962838A4EE7364E
[1229.29 MK/s][GPU 1229.29 MK/s][Count 2^37.30][Dead 1][02:52 (Avg 02:03)][1578.0MB]
Key# 1 Pub:  0x0389044AFFFD381B496D63F8C80CDAAB5E57E40DEE6A56C8AFA5194B1FFD83FEBB
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D1F5821BA583622EEE49
[1224.87 MK/s][GPU 1224.87 MK/s][Count 2^37.83][Dead 3][04:02 (Avg 02:04)][2274.8MB]
Key# 2 Pub:  0x0338E88602F88C3268C68552C4C53987F41BB42335A8E36658A80D2F5BDF63615B
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D110DC466D7E3D293A10
[1219.55 MK/s][GPU 1219.55 MK/s][Count 2^36.94][Dead 1][02:18 (Avg 02:04)][1236.3MB]
Key# 3 Pub:  0x026776529C6C8932ABF9DCFDCB2DB2784DCE82164914D4C3294FECFE1B48F3BF27
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D1DE55BDE3E1B2F36A9B
[1222.71 MK/s][GPU 1222.71 MK/s][Count 2^36.18][Dead 0][01:28 (Avg 02:04)][732.0MB]
Key# 4 Pub:  0x03312940E0EE296C23B1E7888A4D23FED01358C1021FE2091D909B0A8D8AA80DE1
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D11B78CA8A9FBCDBDE3E
[1221.42 MK/s][GPU 1221.42 MK/s][Count 2^36.97][Dead 0][02:20 (Avg 02:04)][1255.1MB]
Key# 5 Pub:  0x020097A3826A7BC1EC5383AE390EEA8C436B2B6D2E3770493F7E993B164C9F233E
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D152477D7EC473A483D4
[1219.38 MK/s][GPU 1219.38 MK/s][Count 2^37.61][Dead 1][03:30 (Avg 02:04)][1953.7MB]
Key# 6 Pub:  0x03DAE902F3F4E0A62AA7DF422B2BF802CD9ADA747177F853390BEB7F203E4C3F84
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D10D2AB0CDE883E6B3BC
[1227.23 MK/s][GPU 1227.23 MK/s][Count 2^37.14][Dead 0][02:36 (Avg 02:04)][1414.3MB]
Key# 7 Pub:  0x023C4E51FD6EB029CFD4DDCBD93FAD060C7D026D252EC37923355D1B192C69B9E3
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D1B69F2EC54CEEBA50C4
[1224.22 MK/s][GPU 1224.22 MK/s][Count 2^37.10][Dead 0][02:32 (Avg 02:04)][1373.2MB]
Key# 8 Pub:  0x03C3112770BC8455596AF38A2A2E3F556B1F02758945608364C3AFD22FAD806B8D
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D18D06BC6E25E5C7EAEF
[1229.32 MK/s][GPU 1229.32 MK/s][Count 2^35.49][Dead 0][01:00 (Avg 02:03)][453.5MB]
Key# 9 Pub:  0x0264AE637A90AA93798E7BA6CCFD57CB077BCCAE49D3AB76F6857CA044A423E3AC
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D1C8A4760400224E8C05

Done: Total time 28:49
sr. member
Activity: 462
Merit: 701
May 04, 2020, 07:28:52 AM
#66
I use a random jump table with a controlled average.
I store as before, no inforamtion on symetry is needed, the key is solved later.


Code:
void Kangaroo::CreateJumpTable() {

  int jumpBit = rangePower / 2;
  if(jumpBit > 128) jumpBit = 128;
  int maxRetry = 100;
  bool ok = false;
  double maxAvg = pow(2.0,(double)jumpBit - 0.95);
  double minAvg = pow(2.0,(double)jumpBit - 1.05);
  double distAvg;
  //::printf("Jump Avg distance min: 2^%.2f\n",log2(minAvg));
  //::printf("Jump Avg distance max: 2^%.2f\n",log2(maxAvg));

  // Kangaroo jumps

  // Positive only
  while(!ok && maxRetry>0 ) {
    Int totalDist;
    totalDist.SetInt32(0);
    for(int i = 0; i < NB_JUMP; ++i) {
      jumpDistance[i].Rand(jumpBit);
      if(jumpDistance[i].IsZero())
        jumpDistance[i].SetInt32(1);
      totalDist.Add(&jumpDistance[i]);
    }
    distAvg = totalDist.ToDouble() / (double)(NB_JUMP);
    ok = distAvg>minAvg && distAvg    maxRetry--;
  }

  for(int i = 0; i < NB_JUMP; ++i) {
    Point J = secp->ComputePublicKey(&jumpDistance[i]);
    jumpPointx[i].Set(&J.x);
    jumpPointy[i].Set(&J.y);
  }

  if(!ok) {
    ::printf("Warning, jump Avg distance out of bounds: 2^%.2f (restart the program)\n",log2(distAvg));
  } else {
    ::printf("Jump Avg distance: 2^%.2f\n",log2(distAvg));
  }

}

Here is the code of the random walk:

Code:
void Kangaroo::SolveKeyCPU(TH_PARAM *ph) {

  // Global init
  int thId = ph->threadId;
  counters[thId] = 0;

  // Create Kangaroos
  KANGAROO *herd = new KANGAROO[CPU_GRP_SIZE];

  IntGroup *grp = new IntGroup(CPU_GRP_SIZE);
  Int *dx = new Int[CPU_GRP_SIZE];

  if(keyIdx==0) {
    ::printf("SolveKeyCPU Thread %d: %d kangaroos\n",ph->threadId,CPU_GRP_SIZE);
  }

  ph->hasStarted = true;

  // Using Affine coord
  Int dy;
  Int rx;
  Int ry;
  Int _s;
  Int _p;

  uint64_t minW = 0;
  uint64_t maxW = 0;
  uint64_t minT = 0;
  uint64_t maxT = 0;

  while(!endOfSearch) {
  for(int j = 0; j    Create(&herd[j],j%2);

  uint64_t cnt = 0;
  while(!endOfSearch && cnt
    for(int g = 0; g < CPU_GRP_SIZE; g++) {

      uint64_t jmp = herd[g].pos.x.bits64[0] % NB_JUMP;
      Int *p1x = &jumpPointx[jmp];
      Int *p2x = &herd[g].pos.x;
      dx[g].ModSub(p2x,p1x);

    }
    grp->Set(dx);
    grp->ModInv();

    for(int g = 0; g < CPU_GRP_SIZE; g++) {

      uint64_t jmp = herd[g].pos.x.bits64[0] % NB_JUMP;
      Int *p1x = &jumpPointx[jmp];
      Int *p1y = &jumpPointy[jmp];
      Int *p2x = &herd[g].pos.x;
      Int *p2y = &herd[g].pos.y;

      dy.ModSub(p2y,p1y);
      _s.ModMulK1(&dy,&dx[g]);
      _p.ModSquareK1(&_s);

      rx.ModSub(&_p,p1x);
      rx.ModSub(p2x);

      ry.ModSub(p2x,&rx);
      ry.ModMulK1(&_s);
      ry.ModSub(p2y);

      herd[g].distance.ModAddK1order(&jumpDistance[jmp]);

      // Equivalence symmetry class switch
      if(ry.ModPositiveK1())
        herd[g].distance.ModNegK1order();

      herd[g].pos.x.Set(&rx);
      herd[g].pos.y.Set(&ry);

    }

    for(int g = 0; g < CPU_GRP_SIZE && !endOfSearch; g++) {

      if(IsDP(herd[g].pos.x.bits64[3])) {
        LOCK(ghMutex);
        if(!endOfSearch) {

          if( !AddToTable(&herd[g].pos.x,&herd[g].distance,herd[g].type) ) {
            // Collision inside the same herd
            // We need to reset the kangaroo
            Create(&herd[g],herd[g].type,false);
            collisionInSameHerd++;
          }

        }
        UNLOCK(ghMutex);
      }

      if(!endOfSearch) counters[thId] ++;
      cnt++;

    }

  }
  }

  // Free
  delete grp;
  delete dx;
  delete[] herd;

  ph->isRunning = false;

}


Edit: And kangaroo creation:

Code:
void Kangaroo::Create(KANGAROO *k,int type,bool lock) {

  k->type = type;

  if(lock) LOCK(ghMutex);
  k->distance.Rand(rangePower - 1);
  if(lock) UNLOCK(ghMutex);

  if( type == TAME ) {

    // Tame in [0..N/2]
    k->pos = secp->ComputePublicKey(&k->distance);

  } else {

    // Wild in [-N/4..N/4]
    k->distance.ModSubK1order(&rangeWidthDiv4);
    Point O = secp->ComputePublicKey(&k->distance);
    k->pos = secp->AddDirect(keyToSearch,O);

  }

  // Equivalence symmetry class switch
  if(k->pos.y.ModPositiveK1())
    k->distance.ModNegK1order();

}
legendary
Activity: 1948
Merit: 2097
May 04, 2020, 06:59:38 AM
#65
There are only positive jumps in this table.
Loop are still possible because of the equivalence class switch "(xP,min{yP,q−yP})".

Only positive? Why? You mean you store only +k*G but you do -k*G too?

The bad news is that I had to put a large number of random jump (65536) to avoid cycles and this will decrease GPU performance Sad
I will investigate on choosing these jumps in order to reduce probability of cycle and having the smallest possible jump table.

I will try with python to do some tests. Could you provide me only the array of jumps you used?

like: [+-1,+-2,+-4,+-8,+-16].  

Given m and DP I want to see if we can set up a suitable array of jumps such that we minimize the probability of cycle.

You have to change your jumps. But choose them in a "intelligent" way, I think it is not too difficult.
'Intelligent' means that a man can't do this, let the probability do it. I think that the main problem is that you want to choose the jumps by yourself. Why?

You should study separately a sequence of jumps such that the probability to come back before 2^DP steps is low. This study depends only from jumps, DP and m (the average jump), not from points/coordinates/N!

For example, with these jumps [+-1,+-2,+-4,+-8,+-16]  you try to generate many sequences (starting from 0). If the number of sequences with lenght < 2^DP (or < k*2^DP) is too high, change randomly the jumps and retry. Lenght of sequence: number of steps before to get back to 0. Deal simply with integer numbers.

This setting phase should be done automatically at the start of the program. In this way for each DP you will have configured the jumps in order to get a lower probability of generating loops.
sr. member
Activity: 462
Merit: 701
May 04, 2020, 06:31:10 AM
#64
Hi,

Many thanks MrFreeDragon the tests Wink
Many thanks to arulbero for helping me solving the issue Smiley

I found the bugs, these was 2 !
The jump table size, I already checked that but the combination of the 2 bugs make this hard to find.
The split of equivalence class in the random walk.

There are only positive jumps in this table.
Loop are still possible because of the equivalence class switch "(xP,min{yP,q−yP})".

The bad news is that I had to put a large number of random jump (65536) to avoid cycles and this will decrease GPU performance Sad
I will investigate on choosing these jumps in order to reduce probability of cycle and having the smallest possible jump table.

The gain I get by spreading to [-N/8,N/8] was due to the fact that it optimize the overlap between Tame and Wild.
If we can compress the Wild to a very small range, we can reach 1.25 sqrt(N) but this a limit and it has dramatic effects on the border and on wild collisions.
 
Using dp 5 (500 trials, Wild in [-N/4 N/4])
[500] 2^17.581 Dead:0 Avg:2^20.608 (2^20.556)  => 1.52sqrt(N)

Using dp 5 (500 trials, Wild in [-N/8 N/8])
[500] 2^21.693 Dead:103 Avg:2^20.480 (2^20.556) => 1.39sqrt(N)


I will now code the GPU and I publish it ASAP Wink
sr. member
Activity: 443
Merit: 350
May 03, 2020, 10:11:21 PM
#63
This is just a test of speed on Tesla V100 32Gb.

For speed tests were used 72bit keys (available together with 14beta release)
The GPU speed is 1415-1425 Mkey/sec with v1.4beta. The 72bit keys is found for 2-3min per one key.

Code:
$ ./kangaroo -gpu -t 0 in72.txt
Kangaroo v1.4beta
Start:59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D1000000000000000000
Stop :59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D1FFFFFFFFFFFFFFFFFF
Keys :20
Number of CPU thread: 0
Range width: 2^72
Jump Avg distance: 2^36.04
Number of kangaroos: 2^21.32
Suggested DP: 13
Expected operations: 2^37.15
Expected RAM: 1419.0MB
DP size: 13 [0xfff8000000000000]
GPU: GPU #0 Tesla V100-PCIE-32GB (80x64 cores) Grid(160x128) (249.0 MB used)
SolveKeyGPU Thread GPU#0: creating kangaroos...
SolveKeyGPU Thread GPU#0: 2^21.32 kangaroos in 15264.7ms
[1424.23 MK/s][GPU 1424.23 MK/s][Count 2^37.12][Dead 0][02:00 (Avg 01:46)][1399.1MB]
Key# 0 Pub:  0x038F63B86D8EE91D4B78FF4680F927DCC7754CF734A386ED5FA45E71DE9328F433
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D102F962838A4EE7364E
[1421.86 MK/s][GPU 1421.86 MK/s][Count 2^37.40][Dead 3][02:40 (Avg 01:47)][1688.6MB]
Key# 1 Pub:  0x0389044AFFFD381B496D63F8C80CDAAB5E57E40DEE6A56C8AFA5194B1FFD83FEBB
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D1F5821BA583622EEE49
[1424.46 MK/s][GPU 1424.46 MK/s][Count 2^37.33][Dead 1][02:34 (Avg 01:46)][1619.6MB]
Key# 2 Pub:  0x0338E88602F88C3268C68552C4C53987F41BB42335A8E36658A80D2F5BDF63615B
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D110DC466D7E3D293A10
[1415.73 MK/s][GPU 1415.73 MK/s][Count 2^36.93][Dead 1][02:00 (Avg 01:47)][1224.5MB]
Key# 3 Pub:  0x026776529C6C8932ABF9DCFDCB2DB2784DCE82164914D4C3294FECFE1B48F3BF27
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D1DE55BDE3E1B2F36A9B
[1416.26 MK/s][GPU 1416.26 MK/s][Count 2^37.58][Dead 2][03:00 (Avg 01:47)][1916.6MB]
Key# 4 Pub:  0x03312940E0EE296C23B1E7888A4D23FED01358C1021FE2091D909B0A8D8AA80DE1
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D11B78CA8A9FBCDBDE3E
[1419.49 MK/s][GPU 1419.49 MK/s][Count 2^37.06][Dead 0][02:10 (Avg 01:47)][1340.6MB]
Key# 5 Pub:  0x020097A3826A7BC1EC5383AE390EEA8C436B2B6D2E3770493F7E993B164C9F233E
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D152477D7EC473A483D4
[1420.57 MK/s][GPU 1420.57 MK/s][Count 2^37.53][Dead 6][02:54 (Avg 01:47)][1848.9MB]
Key# 6 Pub:  0x03DAE902F3F4E0A62AA7DF422B2BF802CD9ADA747177F853390BEB7F203E4C3F84
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D10D2AB0CDE883E6B3BC
[1419.72 MK/s][GPU 1419.72 MK/s][Count 2^37.03][Dead 2][02:08 (Avg 01:47)][1316.6MB]
Key# 7 Pub:  0x023C4E51FD6EB029CFD4DDCBD93FAD060C7D026D252EC37923355D1B192C69B9E3
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D1B69F2EC54CEEBA50C4
[1420.12 MK/s][GPU 1420.12 MK/s][Count 2^37.10][Dead 1][02:14 (Avg 01:47)][1381.6MB]
Key# 8 Pub:  0x03C3112770BC8455596AF38A2A2E3F556B1F02758945608364C3AFD22FAD806B8D
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D18D06BC6E25E5C7EAEF
[1422.92 MK/s][GPU 1422.92 MK/s][Count 2^38.40][Dead 2][05:06 (Avg 01:47)][3383.7MB]
Key# 9 Pub:  0x0264AE637A90AA93798E7BA6CCFD57CB077BCCAE49D3AB76F6857CA044A423E3AC
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D1C8A4760400224E8C05
legendary
Activity: 1948
Merit: 2097
May 03, 2020, 11:39:36 AM
#62
The number field sieve is applicable to finite fields and to elliptic curves that admit embeddings into relatively small finite fields. Such elliptic curves are called pairing-friendly, and are not normally used for ordinary ECDH key agreement or ECDSA or EdDSA signatures like Bitcoin—secp256k1, for instance, ...

It is not allowed to copy something without quoting the source:

https://crypto.stackexchange.com/questions/52655/what-are-the-fastest-attacks-on-ecdlp


legendary
Activity: 1948
Merit: 2097
May 03, 2020, 10:24:11 AM
#61
It works with dp=0 because random walk are not really important, they just generate random sequence, and you don't care about the path, so it correctly solve the key (the distance is also stored in the hastable) but with dp>0 paths are of course important.
You're right there is probably a bug with my storage of symmetric point.

I have explained the bug,
the bug is not in the storage, is in the way you generate the next step of each sequence:

only symmetric jumps (+-G, +-2G, +-4G, +- 17G) lead a tame and a wild from 2 symmetric points (first collision) to other 2 symmetric points until they meet the 'same' 2 DPs (other 2 symmetric points).

If we call 2 symmetric point (a collision) T and W=-T , and DP > 0, the next step produces 2 different points:

T,   W=-T                                                                ->        T1,  W1        ->   Tk = DP,    Wk != Tk   Wk is not a DP!!

(x,y),  (x,-y)                                                           -> (x1,y1),  (x1',y1') ->    (xk, yk),  (xk',yk')

same x -> same jump = +s*G -> different points                x1 != x1'                    xk !=xk'

if T and W are 2 symmetric points, T1=T+s*G and W1=-T+s*G are not!!!

because you use +s*G for T and W, even if they have 2 opposite y-coordinates!

You have to use +s*G for T and -s*G for W (or viceversa)!!

In this way:

T,    W=-T                                                                       ->     T1=T+s*G , W1=-T-s*G    and T1 = -W1 !
(x,y),  (x,-y)                                                                    ->          (x1,y1)  ,     (x1,-y1)
same x -> opposite jumps = +-s*G -> symmetric points                        x1 = x1  
sr. member
Activity: 462
Merit: 701
May 03, 2020, 09:34:52 AM
#60
It works with dp=0 because random walk are not really important, they just generate random sequence, and you don't care about the path, so it correctly solve the key (the distance is also stored in the hastable) but with dp>0 paths are of course important.
You're right there is probably a bug with my storage of symmetric point.

legendary
Activity: 1948
Merit: 2097
May 03, 2020, 06:58:52 AM
#59
You still exploit the symmetry. Instead of searching between [0,N] you search between [0,N/2] and there is few calculation to make when a collision on a tame.x and wild.x is reached in order to get the good key. I did test and it works as expected.
I think the trick may be that after sqrt(N) total steps the kangaroos starts to overlap each other, still investigating...


No at all. You are simply searching in the second half of the interval, and you concentrate all the tame there.


In fact i have removed the negative jump.
 ... you search a priv key between [0,N/2].  I spread my tame between [0,N/2.G] and the wild between [P-N/4.G,P+N/4.G] and it converges to expected average of 1.47sqrt(N) using dp=0.

You would have the same exact result (with DP > 0) of this:

given [a,b],  translate to [0,N] and search between [N/2, N]

tame start from [N/2,N]
wild from [(P-N/2) - N/4, (P-N/2) + N/4] = [P - 3N/4, P - N/4]

you don't need the symmetry to do this. You are using the old algorithm with different start ranges.

Instead of searching between [0,N] you search between [0,N/2] and there is few calculation to make when a collision on a tame.x and wild.x is reached in order to get the good key.

with dp=0 there is not much bad collisions, no more than expected.
Strangely when using dp > 0, lots of dead kangaroos appear and probably cycle, even with a small number of kangaroo. I have to understand why may be bug somewhere...

Bug found!

if tame.x = wild.x how can you detect the collision? they move from (-Q , Q) to (-Q+k*G , +Q+k*G)
same x, same jump, different paths.
If you don't use negative jumps (+kG and -kG), this is not possible.

This explain pefectly why when you set DP > 0 you pass from 1.47.sqrt(N) to 2.sqrt(N), because:

    -Q     !=     Q
 tame.x  = wild.x  could be a collision, but at next step:

    -Q     !=     Q            -->       -Q+k*G   !=  Q+k*G     you have now 2 separate walks
 tame.x  = wild.x                         tame.x  !=   wild.x

then tame and wild don't reach a common DP, unless you turn -Q and Q into distinguished points setting DP=0. With DP>0  tame.x = wild.x (-Q  != Q) becomes a collision not detectable.
sr. member
Activity: 462
Merit: 701
May 03, 2020, 06:21:28 AM
#58
You still exploit the symmetry. Instead of searching between [0,N] you search between [0,N/2] and there is few calculation to make when a collision on a tame.x and wild.x is reached in order to get the good key. I did test and it works as expected.
I think the trick may be that after sqrt(N) total steps the kangaroos starts to overlap each other, still investigating...
legendary
Activity: 1948
Merit: 2097
May 03, 2020, 06:10:44 AM
#57
In fact i have removed the negative jump.
There is no need to do complex thing in order to take benefit of symmetry, the main trick is just to translate the public key by -N/2.G and you search a priv key between [0,N/2]. The symmetry is 100% free.

If you have removed the negative jumps you are not exploiting the symmetry! If a tame and a wild meet 2 symmetric points (same coordinate-x), they don't collide!

As I have a large number of kangaroo, each kangaroo does not move a lot. The average distance for sqrt(N/2) steps in (N/2)/numberOfKangaroo.

with dp=0 there is not much bad collisions, no more than expected.
Strangely when using dp > 0, lots of dead kangaroos appear and probably cycle, even with a small number of kangaroo. I have to understand why may be bug somewhere...

Loops are not possible if you go always in one direction.
sr. member
Activity: 462
Merit: 701
May 03, 2020, 05:44:00 AM
#56
In fact i have removed the negative jump.
There is no need to do complex thing in order to take benefit of symmetry, the main trick is just to translate the public key by -N/2.G and you search a priv key between [0,N/2]. The symmetry is 100% free.
Si I spread my tame between [0,N/2.G] and the wild between [P-N/4.G,P+N/4.G] and it converges to expected average of 1.47sqrt(N) using dp=0.
As I have a large number of kangaroo, each kangaroo does not move a lot. The average distance for sqrt(N/2) steps in (N/2)/numberOfKangaroo.

with dp=0 there is not much bad collisions, no more than expected.
Strangely when using dp > 0, lots of dead kangaroos appear and probably cycle, even with a small number of kangaroo. I have to understand why may be bug somewhere...
member
Activity: 873
Merit: 22
$$P2P BTC BRUTE.JOIN NOW ! https://uclck.me/SQPJk
May 03, 2020, 04:35:00 AM
#55
@Luc

Can you realise in your code modificarion of ranges ?


From:

Start-dfggsdgdsfh1111111
End- FFFFFABC112222233

To:
O

ABC888888

Etc...

Huh?



Biiigesr thank you.



@Luc, can you realiase this in code ?

Pliiiise Huh

legendary
Activity: 1948
Merit: 2097
May 03, 2020, 03:17:31 AM
#54
I must say I'm a bit lost, they announced an algorithm in 1.36√N and they measured 1.47√N. They say that the complexity is actually (1.36 + epsilon)√N and they estimate this epsilon to ~0.1 without being clear on it.
https://www.iacr.org/archive/pkc2010/60560372/60560372.pdf

If we compute the complexity of a symmetric set, we have well : (2(2−√2)PI)/√2 √N = 1.468√N !

So i would rather say that the complexity is (1.47+epsilon)√N with an epsilon which depend on number of kangaroo, dp and N as for the standard version.

Very interesting.

I'm still fighting with collision, when using dp > 0, it seems that the symmetry generates a lot of wrong collisions.

From your tests the algorithm converges in 1.47.sqrt(N) with dp=0:

I did a test using dp 0 (much slower):
The algorithm runs in 2^20.554 = ~1.47 sqrt(N) (1000 trials) as announced on the paper so it seems ok

but 2.sqrt(N) with dp=3:

i did a test with dp 3 and it also converge to 2^21 (same algorithm, no change on the range in function of dp).
This is tricky, there is something I need to understand with those random walks...

Assuming you used the same jumps, there is only one explanation: the algorithm produces a useful collision in 1.47*sqrt(n) steps but you take a long time to detect it because the walks are too short, they collapse in a loop before to reach a DP. It is not a problem of bad collisions between different tames or between different wilds but "between" the same tame / the same wild (loops). Then a wild and a tame meet frequently (first collision after 1.47.sqrt(n)) but they die immediately after. This issue can be fixed (in my opinion), because these bad collisions (I call them "bad collisions 2" = collision with itself = loop) are distinguishable from the others and you can lower them without increasing the other ones.

With DP = 3 means that the lenght of many walks are less than 8!!! It takes +36% (1.47 -> 2) to produce a collision followed by a DP!!!

You have to change your jumps. But choose them in a "intelligent" way, I think it is not too difficult.
'Intelligent' means that a man can't do this, let the probability do it. I think that the main problem is that you want to choose the jumps by yourself. Why?

You should study separately a sequence of jumps such that the probability to come back before 2^DP steps is low. This study depends only from jumps, DP and m (the average jump), not from points/coordinates/N!

For example, with these jumps [+-1,+-2,+-4,+-8,+-16]  you try to generate many sequences (starting from 0). If the number of sequences with lenght < 2^DP (or < k*2^DP) is too high, change randomly the jumps and retry. Lenght of sequence: number of steps before to get back to 0. Deal simply with integer numbers.

This setting phase should be done automatically at the start of the program. In this way for each DP you will have configured the jumps in order to get a lower probability of generating loops.
sr. member
Activity: 462
Merit: 701
May 03, 2020, 12:25:25 AM
#53
Hi,

I must say I'm a bit lost, they announced an algorithm in 1.36√N and they measured 1.47√N. They say that the complexity is actually (1.36 + epsilon)√N and they estimate this epsilon to ~0.1 without being clear on it.
https://www.iacr.org/archive/pkc2010/60560372/60560372.pdf

If we compute the complexity of a symmetric set, we have well : (2(2−√2)PI)/√2 √N = 1.468√N !

So i would rather say that the complexity is (1.47+epsilon)√N with an epsilon which depend on number of kangaroo, dp and N as for the standard version.

I'm still fighting with collision, when using dp > 0, it seems that the symmetry generates a lot of wrong collisions.

Jump to: