Pages:
Author

Topic: Pollard's kangaroo ECDLP solver - page 67. (Read 60698 times)

member
Activity: 406
Merit: 47
March 24, 2021, 06:40:55 AM

I not understand in c++

reference from python code

tame and wild both random  by using function random.randint(start,end)

problem in code is bits (puzzle 120 = 120 bit)

script generate tame and wild first

tame
t.append((3 << problem - 2) + random.randint(1, 1 << problem - 1))

wild
w.append(random.randint(1, 1 << problem - 1))


and use modulo  with  0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFEFFFFFC2F
( 115792089237316195423570985008687907853269984665640564039457584007908834671663 )
for generate some hash in hex value and compare

tame-wild = private key

I understand overview but not yet know in detail have a lot value in code


legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
March 24, 2021, 05:56:14 AM
Of all the images you found, this one:



most resembles what's actually going on in the program. It's just a random walk through different points.

Although now I sort of get why some people call this Pollard's lambda because the diagrams look like the Greek letter, but when two kangaroos collide which makes them go on the same trajectory, future points don't necessarily form a curve, but rather random lines like the above two.

@WanderingPhilosipher I observed a strange result after running your 80-bit range input, GPU speed suddenly tripled to 610MKeys/s and all the dead kangaroos and collisions disappeared.

It makes me think that JLPons' CUDA search routine doesn't run as fast with small ranges. Strangely my 8-thread Xeon exhausted the 40-bit range instantly, but when I tried the same range with 48 threads and a Tesla T4 it started taking forever to search it.
full member
Activity: 1232
Merit: 242
Shooters Shoot...
March 24, 2021, 01:18:47 AM
Quote
I search with keyword  Pollards kangaroo got a lot of image some from keyword Pollards  and some from keyword kangaroo

Are they same?

or kangaroo have multiple formula
in case if have multiple formula Can somebody help to try code other formula for testing?
The ones that form a circle of sorts looks like Pollard's Rho method, not Kangaroo
member
Activity: 406
Merit: 47
March 24, 2021, 01:07:05 AM
I search with keyword  Pollards kangaroo got a lot of image some from keyword Pollards  and some from keyword kangaroo

Are they same?

or kangaroo have multiple formula
in case if have multiple formula Can somebody help to try code other formula for testing?





























member
Activity: 406
Merit: 47
March 23, 2021, 11:10:10 PM
Some Speed test for you kangaroo

What are you speed result?

recommend to test puzzle #70 and puzzle #75 not a long time

try testing time not over 5-10 minute

Kangaroo.exe -gpu -o result.txt in.txt


Puzzle #40
in.txt
Code:
8000000000
FFFFFFFFFF
03A2EFA402FD5268400C77C20E574BA86409EDEDEE7C4020E4B9F0EDBEE53DE0D4
1050  laptop ==>  02 second  - 03 second
keyspace jump = 549755813888

Puzzle #60
in.txt
Code:
800000000000000
FFFFFFFFFFFFFFF
0348E843DC5B1BD246E6309B4924B81543D02B16C8083DF973A89CE2C7EB89A10D
1050  laptop ==>  20 second  - 25 second
keyspace = 576460752303423488

Puzzle #63
in.txt
Code:
4000000000000000
7FFFFFFFFFFFFFFF
0365EC2994B8CC0A20D40DD69EDFE55CA32A54BCBBAA6B0DDCFF36049301A54579
1050  laptop ==>  30 second  - 40 second
keyspace = 4611686018427387904

puzzle #65
in.txt
Code:
10000000000000000
1FFFFFFFFFFFFFFFF
0230210C23B1A047BC9BDBB13448E67DEDDC108946DE6DE639BCC75D47C0216B1B
1050  laptop ==>  2 minute
keyspace = 18446744073709551616

puzzle #70
in.txt
Code:
200000000000000000
3FFFFFFFFFFFFFFFFF
0290E6900A58D33393BC1097B5AED31F2E4E7CBD3E5466AF958665BC0121248483
1050  laptop ==>  4 minute - 6 minute solve
keyspace = 590295810358705651712

puzzle #75
in.txt
Code:
4000000000000000000
7FFFFFFFFFFFFFFFFFF
03726B574F193E374686D8E12BC6E4142ADEB06770E0A2856F5E4AD89F66044755
1050 laptop ==>  40 minute - 50 minute solve
keyspace = 18889465931478580854784

puzzle #80
in.txt
Code:
80000000000000000000
FFFFFFFFFFFFFFFFFFFF
037E1238F7B1CE757DF94FAA9A2EB261BF0AEB9F84DBF81212104E78931C2A19DC
(may be take  5-6 hour)
keyspace = 604462909807314587353088

puzzle #85
in.txt
Code:
1000000000000000000000
1FFFFFFFFFFFFFFFFFFFFF
0329C4574A4FD8C810B7E42A4B398882B381BCD85E40C6883712912D167C83E73A

puzzle #90
in.txt
Code:
20000000000000000000000
3FFFFFFFFFFFFFFFFFFFFFF
035C38BD9AE4B10E8A250857006F3CFD98AB15A6196D9F4DFD25BC7ECC77D788D5

puzzle #95
in.txt
Code:
400000000000000000000000
7FFFFFFFFFFFFFFFFFFFFFFF
02967A5905D6F3B420959A02789F96AB4C3223A2C4D2762F817B7895C5BC88A045

puzzle #100
in.txt
Code:
8000000000000000000000000
FFFFFFFFFFFFFFFFFFFFFFFFF
03D2063D40402F030D4CC71331468827AA41A8A09BD6FD801BA77FB64F8E67E617
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
March 21, 2021, 11:49:19 PM
Would someone explain in simple terms what this kangaroo tool does exactly? Thanks.

Basically we are generating a tame/wild "kangaroo" pair, which is actually just two sets of starting points. The starting point will be different depending on whether the kangaroo is labeled "wild" or "tame".

Then add [this is point addition we're talking here] slightly different incremental terms (mod the secp256k1 group order) until they either equal each other or we have added so much term value that it exceeds the search interval.

i.e if the interval is 2^40 we can only point add that much to the starting point before we have to discard the pair and start with a new one.

To speed things up we start hundreds of kangaroo pairs in parallel and see if any wild kangaroo equals any any tame kangaroo.

This program only stores the x coordinate of points do it might look like it's just storing and adding numbers, but actually the y point is computed from the x point so it's a compact way of storing points.
newbie
Activity: 12
Merit: 0
March 21, 2021, 09:01:49 PM
Would someone explain in simple terms what this kangaroo tool does exactly? Thanks.
jr. member
Activity: 56
Merit: 26
March 21, 2021, 03:41:57 PM
So you said you used 4 cores, how many "kangaroos" per core/thread were you running?

one kangaroo per core.

full member
Activity: 1232
Merit: 242
Shooters Shoot...
March 21, 2021, 03:31:33 PM

So each tame or wild thread, must jump, check for dp, if dp send to hashtable and then jump again, if no dp, jump again, rinse and repeat.

Yes !

In your test, you merely walked through 1 billion random points, correct?

1 billions points of random walk determined by the formula :
jD=[G,2*G,4*G...2^n-1*G]
 tamePos[j+1] =  tamePos[j] + jD[tamePos[j].x % n]    (like in Jean-Luc Pons algorithm explanation) j
So you said you used 4 cores, how many "kangaroos" per core/thread were you running?
jr. member
Activity: 56
Merit: 26
March 21, 2021, 03:23:13 PM

So each tame or wild thread, must jump, check for dp, if dp send to hashtable and then jump again, if no dp, jump again, rinse and repeat.

Yes !

In your test, you merely walked through 1 billion random points, correct?

1 billions points of random walk determined by the formula :
jD=[G,2*G,4*G...2^n-1*G]
 tamePos[j+1] =  tamePos[j] + jD[tamePos[j].x % n]    (like in Jean-Luc Pons algorithm explanation) j
full member
Activity: 1232
Merit: 242
Shooters Shoot...
March 21, 2021, 03:05:54 PM
So your speed (30MKey/s) does not include actually searching for/finding and writing/storing distinguished points to RAM/file?

Exactly, but i'don't think it will decreases dramatically the performance.

in the way i'm imagine the system, the clients (lots of ARM CPUs) will be only dedicated to be wild or tame kangaroo and to performs pure increments in the random walk.

If a distinguished point is found (i've to find a way to  be faster as possible) : example (AND mask beetween the first   of the 5 limbs (5*52bits)of X  and the desired DP,
 the client   send the X coordinate through a socket to the centralized server (computer with and hashtable checking continuously if a collision is found between received x coordinate and previously ones ).
I think it will  not slow significantly the computing because a client send a message to the server  only 1 on 2^DP times on average and a the stop of computing during the socket communication can be optimised to be fast.
 

I don't know your setup so I'm just asking questions.

So each tame or wild thread, must jump, check for dp, if dp send to hashtable and then jump again, if no dp, jump again, rinse and repeat.

In your test, you merely walked through 1 billion random points, correct?
jr. member
Activity: 56
Merit: 26
March 21, 2021, 02:56:45 PM
So your speed (30MKey/s) does not include actually searching for/finding and writing/storing distinguished points to RAM/file?

Exactly, but i'don't think it will decreases dramatically the performance.

in the way i'm imagine the system, the clients (lots of ARM CPUs) will be only dedicated to be wild or tame kangaroo and to performs pure increments in the random walk.

If a distinguished point is found (i've to find a way to  be faster as possible) : example (AND mask beetween the first   of the 5 limbs (5*52bits)of X  and the desired DP,
 the client   send the X coordinate through a socket to the centralized server (computer with and hashtable checking continuously if a collision is found between received x coordinate and previously ones ).
I think it will  not slow significantly the computing because a client send a message to the server  only 1 on 2^DP times on average and a the stop of computing during the socket communication can be optimised to be fast.
 
full member
Activity: 1232
Merit: 242
Shooters Shoot...
March 21, 2021, 02:26:04 PM


If you manage to push it a little over 30MKeys/s (like by using Neon instruction set) then you can match GPU performance.

In fact this speed is achieved with a modified version of the secp256k1 library of bitcoin core where every verification instructions have been removed for field and group function.
The library field_5*52.h  (instead of 10*26 for x86 cpu) (limbs registers) is automatically include for 64bits architecture and you've right it's maybe 10x faster.

I will study the NEON instructions to improve but it will be difficult because this benchmark doesn't include the DP detection code.
It's only a pure random walk benchmark on 1 Billion points.

 
So your speed (30MKey/s) does not include actually searching for/finding and writing/storing distinguished points to RAM/file?
jr. member
Activity: 56
Merit: 26
March 21, 2021, 01:42:39 PM


If you manage to push it a little over 30MKeys/s (like by using Neon instruction set) then you can match GPU performance.

In fact this speed is achieved with a modified version of the secp256k1 library of bitcoin core where every verification instructions have been removed for field and group function.
The library field_5*52.h  (instead of 10*26 for x86 cpu) (limbs registers) is automatically include for 64bits architecture and you've right it's maybe 10x faster.

I will study the NEON instructions to improve but it will be difficult because this benchmark doesn't include the DP detection code.
It's only a pure random walk benchmark on 1 Billion points.

 
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
March 21, 2021, 01:01:33 PM
The first benchmark on ARM (Raspberry PI 400 without overclocking ) are better than I expected (around 30MKeys/s with 4 cores).

That's an impressive speed, considering that Xeon E3-1230 @3.2GHz with 4c/8t only does 10MKeys/s and that the CPU inside this is:

Code:
Broadcom BCM2711 quad-core Cortex-A72 (ARM v8) 64-bit SoC @ 1.5GHz

So clock speed and thread count is halved on this silicon but yet it's 3x faster. And this makes sense because x86_64 has 16 64-bit registers and 8 128-bit SSE registers (Kangaroo doesn't use AVX yet), while armv8-a (the arch used in Cortex A72) has 64 registers and 32 more of the wider registers.

x86 was always an inefficient arch anyway because of all the backward compatibility it had to preserve. With ARM it's like "recompile for our new generation or else" and stuff written for armv8 won't work on armv7 AFAIK.

Do you know if there is a very low cost card (mini PC) with ARM CPU (similar to raspberry pi compute module 4 see link below) to do the work (in a parallelized way) and see if the ratio (computing power)/price might be better compared to the latest GPU CUDA Compatible graphics card.

A 2080Ti costs $1000 and can do about ~1100MKeys/s per my guessing. A Raspberry Pi 4 costs $35 and you can buy up 28 of those for each 2080Ti, and all those Pi 4's combined can do 30x28=840 MKeys/s which is only a little slower than GPU's speed. If you manage to push it a little over 30MKeys/s (like by using Neon instruction set) then you can match GPU performance.
jr. member
Activity: 56
Merit: 26
March 21, 2021, 12:29:32 PM
Hi everybody,
I'm writing a  light beta library in C  to run Pollard kangaroo algorithm on ARM64 CPU.

I am at the very beginning of this project but i am able to do benchmark.

when the library will be functionnal i will posted it on github.

the library (a shared object file) will be callable by python with ctypes module
The herds  of kangaroos will be controlled directly with python (for convenience and ease).
The idea is to have a lot of clients running on ARM64 low cost CPU to compute parallelized secp256k1 points additions  .
The memory (DP  points of a client) will be returned with a client-server socket data transfer (server will have the main hashtable of DP ) as the Kangaroo software from Jean-Luc Pons.

The first benchmark on ARM (Raspberry PI 400 without overclocking ) are better than I expected (around 30MKeys/s with 4 cores).

What do you think about this performance ?
Do you know if there is a very low cost card (mini PC) with ARM CPU (similar to raspberry pi compute module 4 see link below)  to do the work (in a parallelized way) and see if the ratio (computing power)/price might be better compared to the latest GPU CUDA Compatible graphics card.

https://www.raspberrypi.org/products/compute-module-4/?variant=raspberry-pi-cm4001000



member
Activity: 406
Merit: 47
March 21, 2021, 06:54:39 AM

Did you do this first?
 

yes I use nano edit "Makefile"


legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
March 21, 2021, 06:00:59 AM
~snip

Did you do this first?

For some reason nobody updated the CUDA path so you need to edit the Makefile yourself and find the line that says

Code:
CUDA = /usr/local/cuda-8.0

And replace it with

Code:
CUDA = /usr/local/cuda

The CUDA installer creates a symlink at /usr/local/cuda to whatever the actual installed CUDA version is.
member
Activity: 406
Merit: 47
March 21, 2021, 04:19:44 AM


Did you just type "make" or "make gpu=1"? (What will work is "make gpu=1 ccap=52" or 61 or 75 or whatever your CUDA compute cap is)

By the way. WSL speed will be slower than just compiling it on Windows itself, because WSL emulates system calls. The GPUs will also not be recognized unless you install the Ubuntu(?) WSL drivers from NVIDIA's download page, the Windows drivers don't work from WSL.

I use "make" only
make can compile success and can run kangaroo but can not using option -gpu
show error GPU code not compiled, use -DWITHGPU when compiling.


Code:
make gpu=1
mkdir -p obj
cd obj &&       mkdir -p SECPK1
cd obj && mkdir -p GPU
/usr/local/cuda-8.0/bin/nvcc -maxrregcount=0 --ptxas-options=-v --compile --compiler-options -fPIC -ccbin /usr/bin/g++-4.8 -m64 -O2 -I/usr/local/cuda-8.0/include -gencode=arch=compute_,code=sm_ -o obj/GPU/GPUEngine.o -c GPU/GPUEngine.cu
make: /usr/local/cuda-8.0/bin/nvcc: Command not found
make: *** [Makefile:75: obj/GPU/GPUEngine.o] Error 127


Code:
make gpu=1 ccap=52
cd obj &&       mkdir -p SECPK1
cd obj && mkdir -p GPU
/usr/local/cuda-8.0/bin/nvcc -maxrregcount=0 --ptxas-options=-v --compile --compiler-options -fPIC -ccbin /usr/bin/g++-4.8 -m64 -O2 -I/usr/local/cuda-8.0/include -gencode=arch=compute_52,code=sm_52 -o obj/GPU/GPUEngine.o -c GPU/GPUEngine.cu
make: /usr/local/cuda-8.0/bin/nvcc: Command not found
make: *** [Makefile:75: obj/GPU/GPUEngine.o] Error 127

I will try on real Ubuntu on small harddisk try install


legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
March 21, 2021, 04:03:42 AM
I try this version
https://github.com/JeanLucPons/Kangaroo

on my linux subsystem (windows 10)

I compile with command  make success
make

and use
./kangaroo -gpu in.txt

I got message

GPU code not compiled, use -DWITHGPU when compiling.

Did you just type "make" or "make gpu=1"? (What will work is "make gpu=1 ccap=52" or 61 or 75 or whatever your CUDA compute cap is)

By the way. WSL speed will be slower than just compiling it on Windows itself, because WSL emulates system calls. The GPUs will also not be recognized unless you install the Ubuntu(?) WSL drivers from NVIDIA's download page, the Windows drivers don't work from WSL.
Pages:
Jump to: