Author

Topic: Pollard's kangaroo ECDLP solver - page 141. (Read 59389 times)

sr. member
Activity: 462
Merit: 701
May 05, 2020, 08:41:58 AM
#87
OK, i you want to test large range and symmetry, put 512 in the NB_JUMP...

I'm currently working on the save work options:
Code:
-w workfile: Specify file to save work into
 -i workfile: Specify file to load work from
 -wi workInterval: Periodic interval (in seconds) for saving work

Concerning optimization of the complexity, I will investigate on the Wild compression and Tame/Wild ratio which seems more promising to me...
member
Activity: 144
Merit: 10
May 05, 2020, 08:33:57 AM
#86
Thanks for the tests Wink
Stop your program, it will never ends if you have let the NB_JUMP to 32 and even with 512 I don't know if it will end Wink
The second test with CPU is not displaying: "Kangaroo v1.4gamma (with symmetry)" ?
Did you git pull between the correction I made concerning this (commit at 13:22 UTC+2) ?


No, it was from yesterday’s commit, now running CPU tests with latest commit.
sr. member
Activity: 462
Merit: 701
May 05, 2020, 08:20:57 AM
#85
Thanks for the tests Wink
Stop your program, it will never ends if you have let the NB_JUMP to 32 and even with 512 I don't know if it will end Wink
The second test with CPU is not displaying: "Kangaroo v1.4gamma (with symmetry)" ?
Did you git pull between the correction I made concerning this (commit at 13:22 UTC+2) ?

member
Activity: 144
Merit: 10
May 05, 2020, 07:59:20 AM
#84
@Jean_Luc

Below are the CPU/GPU results for an interval 70-bit in size with and without the use of symmetry. So far, the use of symmetry is not looking too good for intervals greater than ~45-bit in size.

AMD Ryzen 7 1800X CPU without Symmetry:
Code:
 ./kangaroo -t 16 -d 16 in70.txt
Kangaroo v1.4beta
Start:1FFFFFFFFFFFFFFFFF
Stop :3FFFFFFFFFFFFFFFFF
Keys :1
Number of CPU thread: 16
Range width: 2^70
Jump Avg distance: 2^35.02
Number of kangaroos: 2^14.00
Suggested DP: 20
Expected operations: 2^36.15
Expected RAM: 89.6MB
DP size: 16 [0xffff000000000000]
SolveKeyCPU Thread 1: 1024 kangaroos
SolveKeyCPU Thread 9: 1024 kangaroos
SolveKeyCPU Thread 3: 1024 kangaroos
SolveKeyCPU Thread 10: 1024 kangaroos
SolveKeyCPU Thread 2: 1024 kangaroos
SolveKeyCPU Thread 8: 1024 kangaroos
SolveKeyCPU Thread 5: 1024 kangaroos
SolveKeyCPU Thread 0: 1024 kangaroos
SolveKeyCPU Thread 15: 1024 kangaroos
SolveKeyCPU Thread 12: 1024 kangaroos
SolveKeyCPU Thread 4: 1024 kangaroos
SolveKeyCPU Thread 11: 1024 kangaroos
SolveKeyCPU Thread 13: 1024 kangaroos
SolveKeyCPU Thread 6: 1024 kangaroos
SolveKeyCPU Thread 14: 1024 kangaroos
SolveKeyCPU Thread 7: 1024 kangaroos
[53.01 MK/s][GPU 0.00 MK/s][Count 2^36.10][Dead 0][26:31 (Avg 23:55)][91.4MB]
Key# 0 Pub:  0x0290E6900A58D33393BC1097B5AED31F2E4E7CBD3E5466AF958665BC0121248483
       Priv: 0x349B84B6431A6C4EF1

Done: Total time 26:32

AMD Ryzen 7 1800X CPU with Symmetry:
Code:
./kangaroo -t 16 -d 16 in70.txt
Kangaroo v1.4gamma
Start:1FFFFFFFFFFFFFFFFF
Stop :3FFFFFFFFFFFFFFFFF
Keys :1
Number of CPU thread: 16
Range width: 2^70
Jump Avg distance: 2^34.00
Number of kangaroos: 2^14.00
Suggested DP: 20
Expected operations: 2^35.55
Expected RAM: 59.7MB
DP size: 16 [0xffff000000000000]
SolveKeyCPU Thread 2: 1024 kangaroos
SolveKeyCPU Thread 0: 1024 kangaroos
SolveKeyCPU Thread 3: 1024 kangaroos
SolveKeyCPU Thread 1: 1024 kangaroos
SolveKeyCPU Thread 4: 1024 kangaroos
SolveKeyCPU Thread 5: 1024 kangaroos
SolveKeyCPU Thread 6: 1024 kangaroos
SolveKeyCPU Thread 7: 1024 kangaroos
SolveKeyCPU Thread 8: 1024 kangaroos
SolveKeyCPU Thread 9: 1024 kangaroos
SolveKeyCPU Thread 11: 1024 kangaroos
SolveKeyCPU Thread 10: 1024 kangaroos
SolveKeyCPU Thread 12: 1024 kangaroos
SolveKeyCPU Thread 13: 1024 kangaroos
SolveKeyCPU Thread 15: 1024 kangaroos
SolveKeyCPU Thread 14: 1024 kangaroos
[35.34 MK/s][GPU 0.00 MK/s][Count 2^39.22][Dead 416][05:39:09 (Avg 23:47)][99.6MB]
Key# 0 [1S]Pub:  0x0290E6900A58D33393BC1097B5AED31F2E4E7CBD3E5466AF958665BC0121248483
       Priv: 0x349B84B6431A6C4EF1
[  0] 2^39.219 Dead:416 Avg:2^39.219 (2^35.554)

Done: Total time 05:39:10

Nvidia RTX 2070 GPU without Symmetry:
Code:
./kangaroo -t 0 -d 14 -gpu -gpuId 0 in70.txt
Kangaroo v1.4gamma
Start:1FFFFFFFFFFFFFFFFF
Stop :3FFFFFFFFFFFFFFFFF
Keys :1
Number of CPU thread: 0
Range width: 2^70
Jump Avg distance: 2^33.97
Number of kangaroos: 2^20.17
Suggested DP: 14
Expected operations: 2^36.40
Expected RAM: 423.2MB
DP size: 14 [0xfffc000000000000]
GPU: GPU #0 GeForce RTX 2070 (36x64 cores) Grid(72x128) (117.0 MB used)
SolveKeyGPU Thread GPU#0: creating kangaroos...
SolveKeyGPU Thread GPU#0: 2^20.17 kangaroos in 7948.8ms
[857.53 MK/s][GPU 857.53 MK/s][Count 2^35.58][Dead 1][01:08 (Avg 01:45)][243.9MB]
Key# 0 [1S]Pub:  0x0290E6900A58D33393BC1097B5AED31F2E4E7CBD3E5466AF958665BC0121248483
       Priv: 0x349B84B6431A6C4EF1
[  0] 2^35.606 Dead:1 Avg:2^35.606 DeadAvg:1.0 (2^36.400)

Done: Total time 01:18

Nvidia RTX 2070 GPU with Symmetry still running:
Code:
./kangaroo -t 0 -d 14 -gpu -gpuId 0 in70.txt
Kangaroo v1.4gamma (with symmetry)
Start:1FFFFFFFFFFFFFFFFF
Stop :3FFFFFFFFFFFFFFFFF
Keys :1
Number of CPU thread: 0
Range width: 2^70
Jump Avg distance: 2^34.98
Number of kangaroos: 2^20.17
Suggested DP: 14
Expected operations: 2^36.06
Expected RAM: 333.8MB
DP size: 14 [0xfffc000000000000]
GPU: GPU #0 GeForce RTX 2070 (36x64 cores) Grid(72x128) (153.0 MB used)
SolveKeyGPU Thread GPU#0: creating kangaroos...
SolveKeyGPU Thread GPU#0: 2^20.17 kangaroos in 8044.0ms
[826.63 MK/s][GPU 826.63 MK/s][Count 2^39.99][Dead 1845][25:06 (Avg 01:26)][27.0MB]
sr. member
Activity: 462
Merit: 701
May 05, 2020, 06:30:51 AM
#83
I committed the code for the GPU.
Symmetry is not enabled by default, you have to edit the Constants.h file and comment out the USE_SYMMETRY define.

Code:
// Use symmetry
//#define USE_SYMMETRY

You can change the jump table size in the same file but high number decrease GPU performance.

Code:
// Number of random jumps
// Max 512 for the GPU
#define NB_JUMP 32

Unfortunately, symmetry on large range without killing GPU performance seems difficult as lots of kangaroo end in infinite cycle.
After all, the title of the article was: "Using Equivalence Classes to Accelerate Solving the Discrete Logarithm Problem in a Short Interval"
They well mention this problem of cycle in the article.

The trick of saving the last jump avoid cycle of 2, but not the others.
May be solution could be to precompute a random jump table in order to minimize cycles, but it could be difficult for large random walks of 2^dp. If we recreate kangaroo at each ~2^dp iterations, the creation of a kangaroo is expensive.
Symmetry can bring a maximum gain of sqrt(2) = ~1.41 , so to win, we have to find a way to do all the job in less than 41%.


sr. member
Activity: 462
Merit: 701
May 05, 2020, 12:57:07 AM
#82
Hi,

Thanks for this detailed report, very interesting !
I did few quick tests with a logic slightly different than yours, still only positive jump in the table and a I get the next one if equal to the last one.

Code:
#ifdef USE_SYMMETRY
        // Limit cycle
        if( jmp == herd[g].lastJump ) jmp = (herd[g].lastJump + 1) % NB_JUMP;
#endif

After 100 trials, DP=5, 40bit search, 2048 kangaroos:

With symmetry: Here only cycles or natural path collisions which have a distinguished point on it are detected.
The average length of walks is Avg/2048 = 2^20.614/2048 = 783

NB_JUMP = 32
[100] 2^20.263 Dead:44 Avg:2^20.823 DeadAvg:71.7 (2^20.614)

NB_JUMP = 64
[100] 2^19.985 Dead:17 Avg:2^20.662 DeadAvg:28.4 (2^20.614)

NB_JUMP = 128
[100] 2^19.346 Dead:4 Avg:2^20.581 DeadAvg:19.6 (2^20.614)

NB_JUMP = 256
[100] 2^20.938 Dead:20 Avg:2^20.510 DeadAvg:15.1 (2^20.614)

Then above 256 the dead average seems to converge to ~15, which should be the natural collision rate when using symmetry for this configuration.

No symmetry: Here we have only natural collisions in same herd, which is independent of the table size.

NB_JUMP = 64
[100] 2^22.203 Dead:5 Avg:2^21.098 DeadAvg:1.3 (2^21.097)

NB_JUMP = 128
[100] 2^19.432 Dead:0 Avg:2^21.044 DeadAvg:1.2 (2^21.097)

NB_JUMP = 256
[100] 2^20.805 Dead:1 Avg:2^20.959 DeadAvg:1.4 (2^21.097)

legendary
Activity: 1932
Merit: 2077
May 04, 2020, 04:36:13 PM
#81
Recap of my tests:


1)

normal generation of jumps

20000 walks
num_jumps: 32 + 32
max_length of each walk = 32
jumpbit = 20

7516 loops of length 2, 118 of length 4:

(7516+118+4+2) / 20000 = 38,2 % of the walks have a loop before 32 steps

7516/20000 = 37,58% of the walks have a loop with 2 only points

[7516, 0, 118, 0, 4, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

----------------------------------------------------------------------------------------------------------------------------------------------------

2)

if(next_jump != -prev_jump): ok
else
 next_jump = prev_jump  #(you double the previous jump)

20000 walks
num_jumps: 32 + 32
max_length of each walk = 32
jumpbit = 20

0 loops of length 2, only 139 loops of length 4 and 7 of length 6:

(139+7+9+1+3) / 20000 = 0,795 % against  38,2%!

[0, 0, 139, 0, 7, 0, 1, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

----------------------------------------------------------------------------------------------------------------------------------------------------

3)

if(next_jump != -prev_jump): ok
else
 next_jump = -sign(next_jump)*floor(distAvg)*G   #a different step, with the opposite sign of the normal step

20000 walks
num_jumps: 32 + 32
max_length of each walk = 32
jumpbit = 20

0 loops of length 2, only 128 loops of length 4 and 6 of length 6, 2 of length 8:

(128 + 1 + 6 +2 + 1 + 1) / 20000 = 0,695% of the walks have a loop before 32 steps

[0, 0, 128, 1, 6, 0, 2, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

-------------------------------------------------------------------------------------------------------------------------------------------------------

4)

if(next_jump != -prev_jump): ok
else
 next_jump = prev_jump  #(you double the previous jump)

20000 walks
num_jumps: 64 + 64
max_length of each walk = 32
jumpbit = 20

0 loops of length 2, only 40 loops of length 4:

(40+1+1) / 20000 = 0,21 % of the walks enter in a loop in less than 32 steps

[0, 0, 0, 40, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

loops of length 4 are of "second order":

probability that (a+b+c+d=0) = (1/128)*(127/128)*(1/128)
probability that each sequence of 4 consecutive points don't create a loop = (1 - (1/128)*(127/128)*(1/128))^32
probability to find a loop in a walk of 32 steps: 1 - (1 - (1/128)*(127/128)*(1/128))^32 =  0.001936

expected value of number of walks with a loop of 4 points: 2000 * 0.001936 = 38,7
-------------------------------------------------------------------------------------------------------------------------------------------------------

5)

if(next_jump != -prev_jump): ok
else
 next_jump = prev_jump  #(you double the previous jump)

20000 walks
num_jumps: 64 + 64
max_length of each walk = 128
jumpbit = 20

157 loops of length 4, only 4 loops of length 6:

(157+4+2+1+1+1+1+1+1+2+1+1+1+1+1+1+1) / 20000 = 0,89% of the walks enter in a loop in less than 128 steps


[0, 0, 157, 0, 4, 0, 2, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
sr. member
Activity: 462
Merit: 701
May 04, 2020, 12:41:25 PM
#80
OK thansk I'll try tomorow
The 1.4gamma is still under dev...
member
Activity: 144
Merit: 10
May 04, 2020, 12:27:34 PM
#79
@Jean_Luc

For larger search intervals, the previous version (v1.4beta) appears to be significantly faster in terms of wall clock time than the current version (v1.4gamma). I am ready to test with a 90-bit search intervals once the latest GPU code is ready.

AMD Ryzen 7 1800X Eight-Core Processor with program defaults (v1.4beta):
Code:
./kangaroo -t 16 in65.txt
Kangaroo v1.4beta
Start:FFFFFFFFFFFFFFFF
Stop :1FFFFFFFFFFFFFFFF
Keys :1
Number of CPU thread: 16
Range width: 2^65
Jump Avg distance: 2^32.01
Number of kangaroos: 2^14.00
Suggested DP: 17
Expected operations: 2^33.65
Expected RAM: 8.8MB
DP size: 17 [0xffff800000000000]
SolveKeyCPU Thread 9: 1024 kangaroos
SolveKeyCPU Thread 7: 1024 kangaroos
SolveKeyCPU Thread 10: 1024 kangaroos
SolveKeyCPU Thread 14: 1024 kangaroos
SolveKeyCPU Thread 1: 1024 kangaroos
SolveKeyCPU Thread 11: 1024 kangaroos
SolveKeyCPU Thread 2: 1024 kangaroos
SolveKeyCPU Thread 12: 1024 kangaroos
SolveKeyCPU Thread 6: 1024 kangaroos
SolveKeyCPU Thread 8: 1024 kangaroos
SolveKeyCPU Thread 4: 1024 kangaroos
SolveKeyCPU Thread 13: 1024 kangaroos
SolveKeyCPU Thread 15: 1024 kangaroos
SolveKeyCPU Thread 0: 1024 kangaroos
SolveKeyCPU Thread 5: 1024 kangaroos
SolveKeyCPU Thread 3: 1024 kangaroos
[52.89 MK/s][GPU 0.00 MK/s][Count 2^33.00][Dead 1][03:06 (Avg 04:14)][10.5MB]
Key# 0 Pub:  0x0230210C23B1A047BC9BDBB13448E67DEDDC108946DE6DE639BCC75D47C0216B1B
       Priv: 0x1A838B13505B26867

Done: Total time 03:06

AMD Ryzen 7 1800X Eight-Core Processor with DP=8 (v1.4beta):

Code:
./kangaroo -t 16 -d 8 in65.txt
Kangaroo v1.4beta
Start:FFFFFFFFFFFFFFFF
Stop :1FFFFFFFFFFFFFFFF
Keys :1
Number of CPU thread: 16
Range width: 2^65
Jump Avg distance: 2^31.97
Number of kangaroos: 2^14.00
Suggested DP: 17
Expected operations: 2^33.65
Expected RAM: 4011.6MB
DP size: 8 [0xff00000000000000]
SolveKeyCPU Thread 7: 1024 kangaroos
SolveKeyCPU Thread 11: 1024 kangaroos
SolveKeyCPU Thread 5: 1024 kangaroos
SolveKeyCPU Thread 10: 1024 kangaroos
SolveKeyCPU Thread 1: 1024 kangaroos
SolveKeyCPU Thread 13: 1024 kangaroos
SolveKeyCPU Thread 9: 1024 kangaroos
SolveKeyCPU Thread 4: 1024 kangaroos
SolveKeyCPU Thread 0: 1024 kangaroos
SolveKeyCPU Thread 2: 1024 kangaroos
SolveKeyCPU Thread 12: 1024 kangaroos
SolveKeyCPU Thread 6: 1024 kangaroos
SolveKeyCPU Thread 15: 1024 kangaroos
SolveKeyCPU Thread 8: 1024 kangaroos
SolveKeyCPU Thread 14: 1024 kangaroos
SolveKeyCPU Thread 3: 1024 kangaroos
[49.60 MK/s][GPU 0.00 MK/s][Count 2^33.43][Dead 0][04:20 (Avg 04:31)][3443.0MB]
Key# 0 Pub:  0x0230210C23B1A047BC9BDBB13448E67DEDDC108946DE6DE639BCC75D47C0216B1B
       Priv: 0x1A838B13505B26867

Done: Total time 04:33
legendary
Activity: 1932
Merit: 2077
May 04, 2020, 12:11:53 PM
#78
Many thanks.
I will try this tomorrow. Hope the test of the last jump and the symmetry will no kill GPU performance.
We have to find a determinist logic to get an other jump if we get the opposite.
Thanks Wink

It is only a test between 2 integers (32 bit <-> 32 bit, a previous index with a current index in an array of 32/64 elements,  I don't think it is too much )

you compute the next jump as usual, then

if(next_jump != -prev_jump): ok
else
 next_jump = prev_jump  (you double the previous jump)

P -> P+k*G -> P+k*G+k*G

Note: in this case the inverse of (x2-x1) is the same:

(P + k*G) + k*G has the same inverse of (P+k*G) - k*G,

then you can compute the inverse before to check the sign of the y-coordinate and the prev_jump
member
Activity: 144
Merit: 10
May 04, 2020, 11:59:43 AM
#77
@Kpot87
Please send a private message and I will respond when I have a minute or two.

@Jean_Luc
Please see below for 65-bit interval tests with your latest CPU (Kangaroo v1.4gamma) changes:

AMD Ryzen 7 1800X Eight-Core Processor with program defaults:
Code:
./kangaroo -t 16 in65.txt
Kangaroo v1.4gamma
Start:FFFFFFFFFFFFFFFF
Stop :1FFFFFFFFFFFFFFFF
Keys :1
Number of CPU thread: 16
Range width: 2^65
Jump Avg distance: 2^31.00
Number of kangaroos: 2^14.00
Suggested DP: 17
Expected operations: 2^33.33
Expected RAM: 7.3MB
DP size: 17 [0xffff800000000000]
SolveKeyCPU Thread 5: 1024 kangaroos
SolveKeyCPU Thread 0: 1024 kangaroos
SolveKeyCPU Thread 4: 1024 kangaroos
SolveKeyCPU Thread 2: 1024 kangaroos
SolveKeyCPU Thread 3: 1024 kangaroos
SolveKeyCPU Thread 7: 1024 kangaroos
SolveKeyCPU Thread 1: 1024 kangaroos
SolveKeyCPU Thread 6: 1024 kangaroos
SolveKeyCPU Thread 9: 1024 kangaroos
SolveKeyCPU Thread 8: 1024 kangaroos
SolveKeyCPU Thread 10: 1024 kangaroos
SolveKeyCPU Thread 11: 1024 kangaroos
SolveKeyCPU Thread 12: 1024 kangaroos
SolveKeyCPU Thread 13: 1024 kangaroos
SolveKeyCPU Thread 14: 1024 kangaroos
SolveKeyCPU Thread 15: 1024 kangaroos
[33.27 MK/s][GPU 0.00 MK/s][Count 2^35.06][Dead 145][19:25 (Avg 05:25)][7.5MB]
Key# 0 [1S]Pub:  0x0230210C23B1A047BC9BDBB13448E67DEDDC108946DE6DE639BCC75D47C0216B1B
       Priv: 0x1A838B13505B26867
[  0] 2^35.061 Dead:145 Avg:2^35.061 (2^33.333)

Done: Total time 19:27

AMD Ryzen 7 1800X Eight-Core Processor with DP=8:
Code:
./kangaroo -t 16 -d 8 in65.txt
Kangaroo v1.4gamma
Start:FFFFFFFFFFFFFFFF
Stop :1FFFFFFFFFFFFFFFF
Keys :1
Number of CPU thread: 16
Range width: 2^65
Jump Avg distance: 2^31.00
Number of kangaroos: 2^14.00
Suggested DP: 17
Expected operations: 2^33.05
Expected RAM: 2659.0MB
DP size: 8 [0xff00000000000000]
SolveKeyCPU Thread 0: 1024 kangaroos
SolveKeyCPU Thread 4: 1024 kangaroos
SolveKeyCPU Thread 8: 1024 kangaroos
SolveKeyCPU Thread 10: 1024 kangaroos
SolveKeyCPU Thread 9: 1024 kangaroos
SolveKeyCPU Thread 12: 1024 kangaroos
SolveKeyCPU Thread 7: 1024 kangaroos
SolveKeyCPU Thread 3: 1024 kangaroos
SolveKeyCPU Thread 1: 1024 kangaroos
SolveKeyCPU Thread 5: 1024 kangaroos
SolveKeyCPU Thread 13: 1024 kangaroos
SolveKeyCPU Thread 14: 1024 kangaroos
SolveKeyCPU Thread 15: 1024 kangaroos
SolveKeyCPU Thread 2: 1024 kangaroos
SolveKeyCPU Thread 11: 1024 kangaroos
SolveKeyCPU Thread 6: 1024 kangaroos
[34.98 MK/s][GPU 0.00 MK/s][Count 2^34.42][Dead 638][12:46 (Avg 04:14)][1250.3MB]
Key# 0 [1S]Pub:  0x0230210C23B1A047BC9BDBB13448E67DEDDC108946DE6DE639BCC75D47C0216B1B
       Priv: 0x1A838B13505B26867
[  0] 2^34.425 Dead:639 Avg:2^34.425 (2^33.054)

Done: Total time 12:52
sr. member
Activity: 462
Merit: 701
May 04, 2020, 11:54:06 AM
#76
Many thanks.
I will try this tomorrow. Hope the test of the last jump and the symmetry will no kill GPU performance.
We have to find a determinist logic to get an other jump if we get the opposite.
Thanks Wink
legendary
Activity: 1932
Merit: 2077
May 04, 2020, 11:20:46 AM
#75
I use a random jump table with a controlled average.
I store as before, no inforamtion on symetry is needed, the key is solved later.


Code:
void Kangaroo::CreateJumpTable() {

  int jumpBit = rangePower / 2;
  if(jumpBit > 128) jumpBit = 128;
  int maxRetry = 100;
  bool ok = false;
  double maxAvg = pow(2.0,(double)jumpBit - 0.95);
  double minAvg = pow(2.0,(double)jumpBit - 1.05);
  double distAvg;
  //::printf("Jump Avg distance min: 2^%.2f\n",log2(minAvg));
  //::printf("Jump Avg distance max: 2^%.2f\n",log2(maxAvg));

  // Kangaroo jumps

  // Positive only
  while(!ok && maxRetry>0 ) {
    Int totalDist;
    totalDist.SetInt32(0);
    for(int i = 0; i < NB_JUMP; ++i) {
      jumpDistance[i].Rand(jumpBit);
      if(jumpDistance[i].IsZero())
        jumpDistance[i].SetInt32(1);
      totalDist.Add(&jumpDistance[i]);
    }
    distAvg = totalDist.ToDouble() / (double)(NB_JUMP);
    ok = distAvg>minAvg && distAvg    maxRetry--;
  }

  for(int i = 0; i < NB_JUMP; ++i) {
    Point J = secp->ComputePublicKey(&jumpDistance[i]);
    jumpPointx[i].Set(&J.x);
    jumpPointy[i].Set(&J.y);
  }

  if(!ok) {
    ::printf("Warning, jump Avg distance out of bounds: 2^%.2f (restart the program)\n",log2(distAvg));
  } else {
    ::printf("Jump Avg distance: 2^%.2f\n",log2(distAvg));
  }

}

I simulated many walks with your CreateJumpTable (I did a porting in python).

I found a very interesting thing: finally I know how are these loops!!

I started 2000 walks (not together, I was interested in 'selfcollisions' only) from a position '0', and using only integer numbers I found out that:

almost all cycles have length exactly = 2!

I generated 32 positive jumps (and the 32 negative, don't worry, it is like you did but I prefer to write them)

Code:
jumpBit = 20
[b]Jump Avg distance[/b]:
minAvg, [b]distAvg[/b], maxAvg
506428.82601934916 [b]530326.5625[/b] 542776.9763909484

[907868, -907868, 716251, -716251, 1007711, -1007711, 720977, -720977, 901512, -901512, 835718, -835718, 363381, -363381, 475363, -475363, 770732, -770732, 217555, -217555, 67168, -67168, 615504, -615504, 584908, -584908, 768134, -768134, 450427, -450427, 926634, -926634, 458021, -458021, 854265, -854265, 589522, -589522, 280841, -280841, 172112, -172112, 32886, -32886, 279644, -279644, 525179, -525179, 200017, -200017, 372430, -372430, 66448, -66448, 905754, -905754, 1000160, -1000160, 331424, -331424, 170629, -170629, 401275, -401275]

and these are the results:

I got 787 (39,35%) collisions (only cycles!!) over 2000 walks (I stopped each walk after 32 steps) but the most important thing is that I understood why.

783 over 787 has length 2!!  Only 2 loops have length 4 and other 2 loops have length 6.

This is logic: what is the chance that a point Q produces a loop of length 2?

Q -> R -> Q   The probability is 1/64 (1/(number of effectively different jumps), if the jumps from R and Q is opposite to the jumps from Q and R.

Then when you produce a sequence of points, each point you generate has the probability 1/(number of effectively different jumps)) to create a loop of length 2 with the next point!

Some basic computation:  

probability to meet a 1 point that create a loop of length 2 with the next point: 1/NUM_JUMPS
probability to generate 32 (or 2^DP) consecutive points that not produce a 2-loop : (1 - 1/NUM_JUMPS)^32
probability to generate 32 consecutive points with at least one that produces a 2-loop: 1 - (1-NUM_JUMPS)^32

in my case: 1 - (1 - 1/64)^32 = 0.39586 probability of collision in a walk

The bad news is that I had to put a large number of random jump (65536) to avoid cycles and this will decrease GPU performance Sad
I will investigate on choosing these jumps in order to reduce probability of cycle and having the smallest possible jump table.


This explain perfectly why you have to use 65536 jumps instead of 32, because in this way you got:

1 - (1 - 1/65536)^32 = 0.000488

The simplest way to avoid to use a large number of jumps and to avoid almost all loops is to modify the condition of the jump, you have to store the last movement (jump):

you have: P -> Q -> R

when you have to decide how to do the next jump (from Q to R), do it normally unless the random jump is the opposite of the previous, in that case instead of doing -k*G do +k*G (in this way you avoid the loop and mantain the symmetry).

EDIT: the probability to have a loop of length 3 is instead very low, because you should have 3 numbers (jump) a,b,c such that a + b + c = 0 and that is not frequent (a,b,c are 3 random numbers between 32(64) generated in a 2^20 space!)

Second test:

20000 walks:

7610 loops of length 2, 101 loops of length 4, 5 loops of length 6,  ....
this is the sequence of lenght of each loop:

-> 7610, 0, 101, 0, 5, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 3, 0, 9, 0

38,05 %  of walk fall in a 2-loop!
38,66 % of loops in walks limited to 32 steps ...

jr. member
Activity: 43
Merit: 1
May 04, 2020, 10:50:18 AM
#74
@Jean_Luc

On the GPU side of things, I get different estimations for "Suggested DP", "Expected operations", and "Expected RAM" for the same problem/interval size on machines with dissimilar hardware configurations, please see below. It may be also helpful to add standard estimations based on the interval size alone. For example, on one my of servers configured with 8x RTX 2080 Ti while using the program defaults, the performance in terms iterations per second quickly degrades from ~10G to less than ~1G.

Thanks

8x RTX 2080 Tis with PL=240W:
Code:
./Kangaroo14 -t 0 -d 20 -gpu -gpuId 0,1,2,3,4,5,6,7 in85.txt
Kangaroo v1.4beta
Start:FFFFFFFFFFFFFFFFFFFFF
Stop :1FFFFFFFFFFFFFFFFFFFFF
Keys :1
Number of CPU thread: 0
Range width: 2^85
Jump Avg distance: 2^41.99
Number of kangaroos: 2^24.09
Suggested DP: 17
Expected operations: 2^44.36
Expected RAM: 1646.6MB
DP size: 20 [0xfffff00000000000]
GPU: GPU #1 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used)
SolveKeyGPU Thread GPU#1: creating kangaroos...
GPU: GPU #2 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used)
SolveKeyGPU Thread GPU#2: creating kangaroos...
GPU: GPU #5 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used)
SolveKeyGPU Thread GPU#5: creating kangaroos...
GPU: GPU #6 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used)
SolveKeyGPU Thread GPU#6: creating kangaroos...
GPU: GPU #3 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used)
SolveKeyGPU Thread GPU#3: creating kangaroos...
GPU: GPU #0 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used)
SolveKeyGPU Thread GPU#0: creating kangaroos...
GPU: GPU #7 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used)
SolveKeyGPU Thread GPU#7: creating kangaroos...
GPU: GPU #4 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used)
SolveKeyGPU Thread GPU#4: creating kangaroos...
SolveKeyGPU Thread GPU#7: 2^21.09 kangaroos in 19929.3ms
SolveKeyGPU Thread GPU#2: 2^21.09 kangaroos in 20113.4ms
SolveKeyGPU Thread GPU#5: 2^21.09 kangaroos in 20136.4ms
SolveKeyGPU Thread GPU#6: 2^21.09 kangaroos in 20160.7ms
SolveKeyGPU Thread GPU#4: 2^21.09 kangaroos in 20179.5ms
SolveKeyGPU Thread GPU#0: 2^21.09 kangaroos in 20247.4ms
SolveKeyGPU Thread GPU#3: 2^21.09 kangaroos in 22114.7ms
SolveKeyGPU Thread GPU#1: 2^21.09 kangaroos in 22203.1ms
[9802.06 MK/s][GPU 9802.06 MK/s][Count 2^44.27][Dead 2][41:14 (Avg 38:27)][1550.0MB]
Key# 0 Pub:  0x0329C4574A4FD8C810B7E42A4B398882B381BCD85E40C6883712912D167C83E73A
       Priv: 0x11720C4F018D51B8CEBBA8

Done: Total time 41:48

1x RTX 2070 with PL=120W:
Code:
./Kangaroo14 -t 0 -d 20 -gpu -gpuId 0 in85.txt
Kangaroo v1.4beta
Start:FFFFFFFFFFFFFFFFFFFFF
Stop :1FFFFFFFFFFFFFFFFFFFFF
Keys :1
Number of CPU thread: 0
Range width: 2^85
Jump Avg distance: 2^42.03
Number of kangaroos: 2^20.17
Suggested DP: 21
Expected operations: 2^43.65
Expected RAM: 1003.7MB
DP size: 20 [0xfffff00000000000]
GPU: GPU #0 GeForce RTX 2070 (36x64 cores) Grid(72x128) (117.0 MB used)
SolveKeyGPU Thread GPU#0: creating kangaroos...
SolveKeyGPU Thread GPU#0: 2^20.17 kangaroos in 9119.2ms
[768.80 MK/s][GPU 768.80 MK/s][Count 2^43.99][Dead 3][07:13:01 (Avg 04:58:44)][1278.3MB]
Key# 0 Pub:  0x0329C4574A4FD8C810B7E42A4B398882B381BCD85E40C6883712912D167C83E73A
       Priv: 0x11720C4F018D51B8CEBBA8

Done: Total time 07:13:16



Hi! How you connect 8RTX ? Rizer? Thanks
sr. member
Activity: 462
Merit: 701
May 04, 2020, 10:07:15 AM
#73
Many thanks.
Note that here you have an expected of 2^21 because it take in consideration the overload due dp and the number of kangaroos.
I see that it over estimate a bit.
These results are very interesting to adapt the calculation of expected memory and operations.

To answer a bit in details to HardwareCollector, these calculations are dependent on the range size, the number of kangaroo (so the number of thread, grid size, number of GPU,... ) and dp (number of distinguished point bits) so it is normal that you get different results on different configurations.

The calculation of suggested DP is a poor approximation , it need some works to improve it.

sr. member
Activity: 443
Merit: 350
May 04, 2020, 09:40:02 AM
#72
-snip-
I committed on github.
GPU is not yet ready but CPU works (at least for 40bits)
R
Code:
pons@linpons:~/Kangaroo$ ./kangaroo -d 5 VC_CUDA8/in40_1000.txt
It runs in stat mode.
If you want to make test.

Yes, I made a test with your last commit on CPU (16 threads) Intel Xeon 2Ghz: 1000 keys of 40bit range were found for 1:39 min Smiley So, this is 99 seconds for 1000 keys (10 keys for 1 second)

Code:
Kangaroo v1.4gamma
Start:62CE27C8FED90758A834C2CB6E3F19BC8A0B5E7D92C0FC0F57004A0000000000
Stop :62CE27C8FED90758A834C2CB6E3F19BC8A0B5E7D92C0FC0F57004AFFFFFFFFFF
Keys :1000
Number of CPU thread: 16
Range width: 2^40
Jump Avg distance: 2^18.99
Number of kangaroos: 2^14.00
Suggested DP: 5
Expected operations: 2^21.00
Expected RAM: 6.0MB
DP size: 5 [0xf800000000000000]
SolveKeyCPU Thread 0: 1024 kangaroos
SolveKeyCPU Thread 3: 1024 kangaroos
SolveKeyCPU Thread 1: 1024 kangaroos
SolveKeyCPU Thread 4: 1024 kangaroos
SolveKeyCPU Thread 2: 1024 kangaroos
SolveKeyCPU Thread 5: 1024 kangaroos
SolveKeyCPU Thread 7: 1024 kangaroos
SolveKeyCPU Thread 8: 1024 kangaroos
SolveKeyCPU Thread 9: 1024 kangaroos
SolveKeyCPU Thread 11: 1024 kangaroos
SolveKeyCPU Thread 6: 1024 kangaroos
SolveKeyCPU Thread 10: 1024 kangaroos
SolveKeyCPU Thread 12: 1024 kangaroos
SolveKeyCPU Thread 13: 1024 kangaroos
SolveKeyCPU Thread 14: 1024 kangaroos
SolveKeyCPU Thread 15: 1024 kangaroos

Key# 0 [1S]Pub:  0x022FF363EBC536B7CF20DECD41F0B86B90992937B4E38D9F616C7F980AACD9C922
       Priv: 0x62CE27C8FED90758A834C2CB6E3F19BC8A0B5E7D92C0FC0F57004A31CED094F0
[  0] 2^20.958 Dead:8 Avg:2^20.958 (2^21.000)

Key# 1 [1S]Pub:  0x0290E532FA5DABF668B4EC1F7FC4727AFEB1A4AE3FB3009E74045797B7C55C12A7
       Priv: 0x62CE27C8FED90758A834C2CB6E3F19BC8A0B5E7D92C0FC0F57004A31D2F397AF
[  1] 2^21.390 Dead:12 Avg:2^21.190 (2^21.000)

Key# 2 [1N]Pub:  0x02032B8F3648BBC1D1666CAC1D0100462913C72E8736DE615C807CC13B78A63FFA
       Priv: 0x62CE27C8FED90758A834C2CB6E3F19BC8A0B5E7D92C0FC0F57004A00A71C8425
[  2] 2^20.854 Dead:6 Avg:2^21.087 (2^21.000)

Key# 3 [1N]Pub:  0x03FB6688436949D86D446A2CEF4A66958D99490064F0BF9D7F677F83EC867F9B46
       Priv: 0x62CE27C8FED90758A834C2CB6E3F19BC8A0B5E7D92C0FC0F57004A73420E46DF
[  3] 2^21.579 Dead:29 Avg:2^21.226 (2^21.000)

Key# 4 [1N]Pub:  0x02CBCACB58AF8BFD517BDAE6363C3E050E988E46A222F79BAEE88A512D6B885F36
       Priv: 0x62CE27C8FED90758A834C2CB6E3F19BC8A0B5E7D92C0FC0F57004ACEF20EBCFD
[  4] 2^20.672 Dead:9 Avg:2^21.131 (2^21.000)

.............................................................
deleted due to message size restricrtions
.............................................................

Key#994 [1S]Pub:  0x038926680E04C0622E1DC996AEC5B7AF1CCFAB9186BC801D6563C67A9E6CD836B3
       Priv: 0x62CE27C8FED90758A834C2CB6E3F19BC8A0B5E7D92C0FC0F57004AF7B4AE2C1F
[994] 2^21.280 Dead:12 Avg:2^20.792 (2^21.000)

Key#995 [1N]Pub:  0x0260FA97C34F238CB14858BF439CBCE84C8BC0FAFA7B9F6B693A5ADF7E8ADB8648
       Priv: 0x62CE27C8FED90758A834C2CB6E3F19BC8A0B5E7D92C0FC0F57004AD8100D4FFE
[995] 2^20.110 Dead:2 Avg:2^20.792 (2^21.000)

Key#996 [1N]Pub:  0x03E7E47031870D9E1E028E7996B2FB14E494E659399F0C21698C9410451E85E97B
       Priv: 0x62CE27C8FED90758A834C2CB6E3F19BC8A0B5E7D92C0FC0F57004ADC798EE740
[996] 2^21.098 Dead:12 Avg:2^20.792 (2^21.000)

Key#997 [1N]Pub:  0x025F07F6D4F63D6FEB1857B3E82080C1EF18DDCB535C8698215B2B4CCA0A2E2302
       Priv: 0x62CE27C8FED90758A834C2CB6E3F19BC8A0B5E7D92C0FC0F57004AD23E45CBA9
[997] 2^20.731 Dead:11 Avg:2^20.792 (2^21.000)

Key#998 [1S]Pub:  0x039F0D9D4A0ECEB0EA57BB3183A4C691B2D1568C819386777F8DB1E91763607BD7
       Priv: 0x62CE27C8FED90758A834C2CB6E3F19BC8A0B5E7D92C0FC0F57004AB7B2FEFC39
[998] 2^20.949 Dead:7 Avg:2^20.792 (2^21.000)

Key#999 [1S]Pub:  0x034D3E8FB5B7D9FA27B207BB9A7EE0D43D30021682965BC3D495DFEC7C40D1ABB6
       Priv: 0x62CE27C8FED90758A834C2CB6E3F19BC8A0B5E7D92C0FC0F57004A0074C8377A
[999] 2^21.273 Dead:21 Avg:2^20.793 (2^21.000)

Done: Total time 01:39

Full test results are here (valid for 180 days only): https://pastebin.com/JFkpvhAA

Maybe this will be helpful for you, Jean_Luc
sr. member
Activity: 462
Merit: 701
May 04, 2020, 08:41:39 AM
#71
Yes, these estimations are trickly to calculate, i will try improve things...

Impressive result with the 8 RTX 2080 Ti Cheesy
member
Activity: 144
Merit: 10
May 04, 2020, 08:37:46 AM
#70
@Jean_Luc

On the GPU side of things, I get different estimations for "Suggested DP", "Expected operations", and "Expected RAM" for the same problem/interval size on machines with dissimilar hardware configurations, please see below. It may be also helpful to add standard estimations based on the interval size alone. For example, on one my of servers configured with 8x RTX 2080 Ti while using the program defaults, the performance in terms iterations per second quickly degrades from ~10G to less than ~1G.

Thanks

8x RTX 2080 Tis with PL=240W:
Code:
./Kangaroo14 -t 0 -d 20 -gpu -gpuId 0,1,2,3,4,5,6,7 in85.txt
Kangaroo v1.4beta
Start:FFFFFFFFFFFFFFFFFFFFF
Stop :1FFFFFFFFFFFFFFFFFFFFF
Keys :1
Number of CPU thread: 0
Range width: 2^85
Jump Avg distance: 2^41.99
Number of kangaroos: 2^24.09
Suggested DP: 17
Expected operations: 2^44.36
Expected RAM: 1646.6MB
DP size: 20 [0xfffff00000000000]
GPU: GPU #1 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used)
SolveKeyGPU Thread GPU#1: creating kangaroos...
GPU: GPU #2 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used)
SolveKeyGPU Thread GPU#2: creating kangaroos...
GPU: GPU #5 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used)
SolveKeyGPU Thread GPU#5: creating kangaroos...
GPU: GPU #6 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used)
SolveKeyGPU Thread GPU#6: creating kangaroos...
GPU: GPU #3 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used)
SolveKeyGPU Thread GPU#3: creating kangaroos...
GPU: GPU #0 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used)
SolveKeyGPU Thread GPU#0: creating kangaroos...
GPU: GPU #7 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used)
SolveKeyGPU Thread GPU#7: creating kangaroos...
GPU: GPU #4 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used)
SolveKeyGPU Thread GPU#4: creating kangaroos...
SolveKeyGPU Thread GPU#7: 2^21.09 kangaroos in 19929.3ms
SolveKeyGPU Thread GPU#2: 2^21.09 kangaroos in 20113.4ms
SolveKeyGPU Thread GPU#5: 2^21.09 kangaroos in 20136.4ms
SolveKeyGPU Thread GPU#6: 2^21.09 kangaroos in 20160.7ms
SolveKeyGPU Thread GPU#4: 2^21.09 kangaroos in 20179.5ms
SolveKeyGPU Thread GPU#0: 2^21.09 kangaroos in 20247.4ms
SolveKeyGPU Thread GPU#3: 2^21.09 kangaroos in 22114.7ms
SolveKeyGPU Thread GPU#1: 2^21.09 kangaroos in 22203.1ms
[9802.06 MK/s][GPU 9802.06 MK/s][Count 2^44.27][Dead 2][41:14 (Avg 38:27)][1550.0MB]
Key# 0 Pub:  0x0329C4574A4FD8C810B7E42A4B398882B381BCD85E40C6883712912D167C83E73A
       Priv: 0x11720C4F018D51B8CEBBA8

Done: Total time 41:48

1x RTX 2070 with PL=120W:
Code:
./Kangaroo14 -t 0 -d 20 -gpu -gpuId 0 in85.txt
Kangaroo v1.4beta
Start:FFFFFFFFFFFFFFFFFFFFF
Stop :1FFFFFFFFFFFFFFFFFFFFF
Keys :1
Number of CPU thread: 0
Range width: 2^85
Jump Avg distance: 2^42.03
Number of kangaroos: 2^20.17
Suggested DP: 21
Expected operations: 2^43.65
Expected RAM: 1003.7MB
DP size: 20 [0xfffff00000000000]
GPU: GPU #0 GeForce RTX 2070 (36x64 cores) Grid(72x128) (117.0 MB used)
SolveKeyGPU Thread GPU#0: creating kangaroos...
SolveKeyGPU Thread GPU#0: 2^20.17 kangaroos in 9119.2ms
[768.80 MK/s][GPU 768.80 MK/s][Count 2^43.99][Dead 3][07:13:01 (Avg 04:58:44)][1278.3MB]
Key# 0 Pub:  0x0329C4574A4FD8C810B7E42A4B398882B381BCD85E40C6883712912D167C83E73A
       Priv: 0x11720C4F018D51B8CEBBA8

Done: Total time 07:13:16


sr. member
Activity: 462
Merit: 701
May 04, 2020, 08:17:47 AM
#69
Thanks for the benchmark Wink

I committed on github.
GPU is not yet ready but CPU works (at least for 40bits)
R

Code:
pons@linpons:~/Kangaroo$ ./kangaroo -d 5 VC_CUDA8/in40_1000.txt

It runs in stat mode.
If you want to make test.
sr. member
Activity: 443
Merit: 350
May 04, 2020, 07:56:55 AM
#68
This is a benchmark post with speeds for Kangaroo GPU Solver. All the tests were made with default DP and default grid size (calculated by a program). I guess that some plays with DP and grid size could change (increase or decrease) the speed. If somebody knows the optimal values, please let us know.

Code:
 Card Model           Grid size      DP        Tested speed
---------------------------------------------------------------
GTX 1050 Ti          Grid(12x256)   DP 16     115 MKey/sec
GTX 1080 Ti          Grid(56x256)   DP 15     500 MKey/sec
Tesla T4 16Gb        Grid(80x128)   DP 14     565 MKey/sec
RTX 2080ti 11Gb      Grid(136x128)  DP 13     1225 MKey/sec
Tesla V100 32Gb      Grid(160x128)  DP 13     1420 MKey/sec
---------------------------------------------------------------

It looks like the Kangaroo solver works fine with GeForce family cards but not with Quadro/NVS family. For example Tesla V100 is 4-5 times more expensive than RTX 2080ti, but has the speed only 15% higher.

And we also can learn from this table that the most efficient card for kangaroo solver is RTX 2080ti 11Gb which has the best performance in terms of speed/cost.

PS. Tests were made for 10 keys with 2^72 range width

EDIT: some spelling mistakes were corrected
Jump to: