Pages:
Author

Topic: Pollard's kangaroo ECDLP solver - page 94. (Read 60189 times)

sr. member
Activity: 462
Merit: 701
June 20, 2020, 10:55:25 PM
Many thanks to all of you for these tests Wink
I will have a look at that tomorrow.

Concerning the server in the 2.0 the DP checking is enabled. It may slow down the server if too much DP.
If you compile by yourself, try to comment the line 542 in Network.cpp as below:

Code:
//#define VALIDITY_POINT_CHECK
#ifdef VALIDITY_POINT_CHECK
full member
Activity: 1232
Merit: 242
Shooters Shoot...
June 20, 2020, 10:13:55 PM
@COBRAS
I answered you on the github ticket.

New release 2.0 is out:
    Performance increase
    Kangaroo backup via the server (-wss)
    Fixed rare wrong points

https://github.com/JeanLucPons/Kangaroo/releases/tag/2.0
Thanks to test it Wink

Something is off with the server (version 3). I'll start with 6 clients and the server slowly but surely starts showing clients dropping off. However, when I go check the clients, they say "server ok". Never had that issue with previous server versions.

Bro, how to use -g option ?

I was try -g g1136,g1256,g2136,g2256 but not worekd !!!


Help me please, Very needed right now.

Big thank you

-gpu (tells program to look for compatible gpus)  then -g 136,256 (if you have more than one say 2, it would be -g 136,256,136,256) and if you have 2 gpus then -gpuId 0,1 (or whatever your numbers are)

Here's my config for a 4 gpu setup:
Code:
-gpu -g 150,384,150,384,150,384,150,384 -gpuId 0,1,2,3

your -g has to match your -gpuId option, meaning if you have 1 card, only put in grid size of 1 card ex: -g 150,384 -gpuId 0

make sense?

edit: it looks like you are trying to run 2 gpus so your setup should be:
Code:
-gpu -g 136,256,136,256 -gpuId 0,1
member
Activity: 873
Merit: 22
$$P2P BTC BRUTE.JOIN NOW ! https://uclck.me/SQPJk
June 20, 2020, 07:18:17 PM
Speed test

Code:
angaroo v2.0
Start:20000000000000000
Stop :7FFFFFFFFFFFFFFFFFFF
Keys :25
Number of CPU thread: 0
Range width: 2^79
Jump Avg distance: 2^38.96
Number of kangaroos: 2^22.09
Suggested DP: 14
Expected operations: 2^40.60
Expected RAM: 3885.1MB
DP size: 14 [0xFFFC000000000000]
GPU: GPU #0 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (177.0 MB used)
SolveKeyGPU Thread GPU#0: creating kangaroos...
GPU: GPU #1 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (177.0 MB used)
SolveKeyGPU Thread GPU#1: creating kangaroos...
SolveKeyGPU Thread GPU#0: 2^21.09 kangaroos [9.7s]
SolveKeyGPU Thread GPU#1: 2^21.09 kangaroos [10.2s]
[1737.96 MK/s][GPU 1737.96 MK/s][Count 2^39.63][Dead 1][09:20 (Avg 15:57)][1591.6/1996.1MB]

But computed only one key from in.txt file !!!

Code:

Kangaroo v2.0
Start:20000000000000000
Stop :7FFFFFFFFFFFFFFFFFFF
Keys :19
Number of CPU thread: 0
Range width: 2^79
Jump Avg distance: 2^38.96
Number of kangaroos: 2^22.09
Suggested DP: 14
Expected operations: 2^40.60
Expected RAM: 3885.1MB
DP size: 14 [0xFFFC000000000000]
GPU: GPU #0 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (177.0 MB used)
SolveKeyGPU Thread GPU#0: creating kangaroos...
GPU: GPU #1 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (177.0 MB used)
SolveKeyGPU Thread GPU#1: creating kangaroos...
SolveKeyGPU Thread GPU#0: 2^21.09 kangaroos [8.7s]
SolveKeyGPU Thread GPU#1: 2^21.09 kangaroos [9.9s]
[1419.18 MK/s][GPU 1419.18 MK/s][Count 2^40.49][Dead 0][16:50 (Avg 19:32)][2877.5/3603.3MB]


On Windows server 2019 many memory crashes.

@Jean_Luc maybe because -m1 option kangaroo.exe stop ALL work, but not 1 pubkey only work ?

Previous Alpha version is breac all computation too with "-m 1" flag.

Big thank you Jean_Luc_Pons for your work.
member
Activity: 873
Merit: 22
$$P2P BTC BRUTE.JOIN NOW ! https://uclck.me/SQPJk
June 20, 2020, 06:35:25 PM
@COBRAS
I answered you on the github ticket.

New release 2.0 is out:
    Performance increase
    Kangaroo backup via the server (-wss)
    Fixed rare wrong points

https://github.com/JeanLucPons/Kangaroo/releases/tag/2.0
Thanks to test it Wink

Something is off with the server (version 3). I'll start with 6 clients and the server slowly but surely starts showing clients dropping off. However, when I go check the clients, they say "server ok". Never had that issue with previous server versions.

Bro, how to use -g option ?

I was try -g g1136,g1256,g2136,g2256 but not worekd !!!


Help me please, Very needed right now.

Big thank you
full member
Activity: 1232
Merit: 242
Shooters Shoot...
June 20, 2020, 06:29:46 PM
@COBRAS
I answered you on the github ticket.

New release 2.0 is out:
    Performance increase
    Kangaroo backup via the server (-wss)
    Fixed rare wrong points

https://github.com/JeanLucPons/Kangaroo/releases/tag/2.0
Thanks to test it Wink

Something is off with the server (version 3). I'll start with 6 clients and the server slowly but surely starts showing clients dropping off. However, when I go check the clients, they say "server ok". Never had that issue with previous server versions.
member
Activity: 873
Merit: 22
$$P2P BTC BRUTE.JOIN NOW ! https://uclck.me/SQPJk
June 20, 2020, 11:24:50 AM
@Jeab_Luc help please:

eroor code compilation in Ubintu:

Code:
main.cpp:335:13: error: 'exit' was not declared in this scope exit(0);

Someone help me please fix this error Huh?


Br

How to fix this ?


Big thank you.
member
Activity: 873
Merit: 22
$$P2P BTC BRUTE.JOIN NOW ! https://uclck.me/SQPJk
June 20, 2020, 10:53:12 AM
@Jeab_Luc help please:

eroor code compilation in Ubintu:

Code:
main.cpp:335:13: error: 'exit' was not declared in this scope exit(0);

How to fix this ?


Big thank you.
jr. member
Activity: 30
Merit: 149
June 20, 2020, 09:56:14 AM
https://github.com/brichard19/eclambda

Can anyone try my tool on a 2080ti? On a 2080S it gets around 1300MKeys/sec when using 24-bit DP.


Will you commit your source ?

I have not yet decided if I want to.
sr. member
Activity: 462
Merit: 701
June 20, 2020, 09:38:30 AM
https://github.com/brichard19/eclambda

Can anyone try my tool on a 2080ti? On a 2080S it gets around 1300MKeys/sec when using 24-bit DP.


Will you commit your source ?
jr. member
Activity: 30
Merit: 149
June 20, 2020, 09:28:29 AM
https://github.com/brichard19/eclambda

Can anyone try my tool on a 2080ti? On a 2080S it gets around 1300MKeys/sec when using 24-bit DP.
sr. member
Activity: 652
Merit: 316
June 20, 2020, 04:32:51 AM
-snip-
Thanks to test it Wink

+100mkeys in 2.0 for 2080ti
Expected number of operations is very different in v1.11 and 2.0
Also in 2.0 decreased GPU memory usage and host memory.

Code:
Kangaroo v1.11alpha
Start:4000000000000000000
Stop :7FFFFFFFFFFFFFFFFFF
Keys :1
Number of CPU thread: 0
Range width: 2^74
Jump Avg distance: 2^37.02
Number of kangaroos: 2^22.09
Suggested DP: 11
Expected operations: 2^39.07
Expected RAM: 348.1MB
DP size: 16 [0xFFFF000000000000]
GPU: GPU #0 GeForce RTX 2080 Ti (68x64 cores) Grid(136x256) (417.0 MB used)
SolveKeyGPU Thread GPU#0: creating kangaroos...
SolveKeyGPU Thread GPU#0: 2^22.09 kangaroos [27.1s]
[1389.79 MK/s][GPU 1389.79 MK/s][Count 2^39.04][Dead 3][07:40 (Avg 06:55)][265.3/338.2MB]
Key# 0 [1S]Pub:  0x03726B574F193E374686D8E12BC6E4142ADEB06770E0A2856F5E4AD89F66044755
       Priv: 0x4C5CE114686A1336E07

Code:
Kangaroo v2.0
Start:4000000000000000000
Stop :7FFFFFFFFFFFFFFFFFF
Keys :1
Number of CPU thread: 0
Range width: 2^74
Jump Avg distance: 2^37.02
Number of kangaroos: 2^22.09
Suggested DP: 12
Expected operations: 2^38.60
Expected RAM: 254.9MB
DP size: 16 [0xFFFF000000000000]
GPU: GPU #0 GeForce RTX 2080 Ti (68x64 cores) Grid(136x256) (347.0 MB used)
SolveKeyGPU Thread GPU#0: creating kangaroos...
SolveKeyGPU Thread GPU#0: 2^22.09 kangaroos [19.4s]
[1502.21 MK/s][GPU 1502.21 MK/s][Count 2^38.26][Dead 0][04:03 (Avg 04:37)][154.8/200.3MB]
Key# 0 [1S]Pub:  0x03726B574F193E374686D8E12BC6E4142ADEB06770E0A2856F5E4AD89F66044755
       Priv: 0x4C5CE114686A1336E07
Thanks for new release.
jr. member
Activity: 40
Merit: 2
June 20, 2020, 01:17:34 AM
My speed increased by 60 MK/s Smiley
sr. member
Activity: 462
Merit: 701
June 20, 2020, 12:53:03 AM
@COBRAS
I answered you on the github ticket.

New release 2.0 is out:
    Performance increase
    Kangaroo backup via the server (-wss)
    Fixed rare wrong points

https://github.com/JeanLucPons/Kangaroo/releases/tag/2.0
Thanks to test it Wink
member
Activity: 348
Merit: 34
June 19, 2020, 10:36:50 PM
Sorry me for offtop.

Can someone share me compiled Ubuntu 16. CUDA 10 version of Kangaroo ? Very needed. Try compile latest from GitHub but get trebles.

in PM if someone can this doing.


Code:


 make gpu=1 ccap=20 all
cd obj &&       mkdir -p SECPK1
g++ -DWITHGPU -m64 -mssse3 -Wno-unused-result -Wno-write-strings -O2 -I. -I/usr/local/cuda-10.0/include -o obj/SECPK1/IntGroup.o -c SECPK1/IntGroup.cpp
SECPK1/IntGroup.cpp: In constructor 'IntGroup::IntGroup(int)':
[b]SECPK1/IntGroup.cpp:24:42: error: 'malloc' was not declared in this scope[/b]
   subp = (Int *)malloc(size * sizeof(Int));
                                          ^
SECPK1/IntGroup.cpp: In destructor 'IntGroup::~IntGroup()':
[b]SECPK1/IntGroup.cpp:28:12: error: 'free' was not declared in this scope[/b]
   free(subp);
            ^
Makefile:80: recipe for target 'obj/SECPK1/IntGroup.o' failed
make: *** [obj/SECPK1/IntGroup.o] Error 1



Huh??
Your gpu model ?
member
Activity: 873
Merit: 22
$$P2P BTC BRUTE.JOIN NOW ! https://uclck.me/SQPJk
June 19, 2020, 04:26:40 PM
Sorry me for offtop.

Can someone share me compiled Ubuntu 16. CUDA 10 version of Kangaroo ? Very needed. Try compile latest from GitHub but get trebles.

in PM if someone can this doing.


Code:


 make gpu=1 ccap=20 all
cd obj &&       mkdir -p SECPK1
g++ -DWITHGPU -m64 -mssse3 -Wno-unused-result -Wno-write-strings -O2 -I. -I/usr/local/cuda-10.0/include -o obj/SECPK1/IntGroup.o -c SECPK1/IntGroup.cpp
SECPK1/IntGroup.cpp: In constructor 'IntGroup::IntGroup(int)':
[b]SECPK1/IntGroup.cpp:24:42: error: 'malloc' was not declared in this scope[/b]
   subp = (Int *)malloc(size * sizeof(Int));
                                          ^
SECPK1/IntGroup.cpp: In destructor 'IntGroup::~IntGroup()':
[b]SECPK1/IntGroup.cpp:28:12: error: 'free' was not declared in this scope[/b]
   free(subp);
            ^
Makefile:80: recipe for target 'obj/SECPK1/IntGroup.o' failed
make: *** [obj/SECPK1/IntGroup.o] Error 1



Huh??
legendary
Activity: 1948
Merit: 2097
June 19, 2020, 01:36:49 PM

But why it didn`t work when we move DPs to range*32 with arulbero method ?


Because each patch can reach only 1/32 of the points.
sr. member
Activity: 652
Merit: 316
June 19, 2020, 01:32:16 PM
I tried to search keys in the same range as the working file. Everything is much faster:
I solve 1 key at 54bit range and only tamed wild DPs. After that i fulfilled test with 1000pubkeys and got a not bad result.
Expected op 2^28.06 for one key
in average i got 2^27.20 for one key. This value variable dependency how many DPs you gain in workfile.

here is workingfile info:
Code:
DP bits   : 8
Start     : 40000000000000
Stop      : 7FFFFFFFFFFFFF
DP Count  : 658682 2^19.329
HT Max    : 12 [@ 009F12]
HT Min    : 0 [@ 000015]
HT Avg    : 2.51
HT SDev   : 1.58
But why it didn`t work when we move DPs to range*32 with arulbero method ?
P.S interesting thing that 2^27.20 it it is exactly 2^28.06 - (tameDPs+wildDPs/2)*(2^DP)

member
Activity: 873
Merit: 22
$$P2P BTC BRUTE.JOIN NOW ! https://uclck.me/SQPJk
June 19, 2020, 07:23:08 AM
Also i done test 1000 pubs with the same range but with normal soving without tricks.
here result:
Total    OP: 273125509453.87 = 2^37.99
Average  OP: 28.04

Unfortunately the difference is very small.

--------------------------------------------------------------------------------------------------------

I read this article:

https://medium.com/@johncantrell97/how-i-checked-over-1-trillion-mnemonics-in-30-hours-to-win-a-bitcoin-635fe051a752

this is the puzzle https://twitter.com/alistairmilne/status/1266037520715915267

I think that zielar could have won that prize easily too.

About this part of the arcticle:

Quote
In a GPU you have four main types of memory available to you (Global, Constant, Local, and Private). Global memory is shared across all GPU cores and is very slow to access, you want to minimize its use as much as possible. Constant and Private memory are extremely fast but limited in space. I believe most devices only support 64kB of constant memory. Local memory is shared by a “group” of workers and its speed is somewhere between Global and Constant.

My goal was to fit everything I needed into the 64kB of constant memory and never need to read from global or local memory to maximize the speed of the program. This proved to be a bit tricky because the standard precomputed secp256k1 multiplication table took up exactly 64kB by itself.

@JeanLuc

How much constant memory do you use for the multiplication and for the addition?

32 jumps are 16kB for x and y-coordinate + 8 kB for their private keys (32 * 256bit = 8kB) + what else?



It is solved from 18.06

Good day. What was the length of the privkey Bro ?
full member
Activity: 282
Merit: 114
June 19, 2020, 07:18:45 AM
Also i done test 1000 pubs with the same range but with normal soving without tricks.
here result:
Total    OP: 273125509453.87 = 2^37.99
Average  OP: 28.04

Unfortunately the difference is very small.

--------------------------------------------------------------------------------------------------------

I read this article:

https://medium.com/@johncantrell97/how-i-checked-over-1-trillion-mnemonics-in-30-hours-to-win-a-bitcoin-635fe051a752

this is the puzzle https://twitter.com/alistairmilne/status/1266037520715915267

I think that zielar could have won that prize easily too.

About this part of the arcticle:

Quote
In a GPU you have four main types of memory available to you (Global, Constant, Local, and Private). Global memory is shared across all GPU cores and is very slow to access, you want to minimize its use as much as possible. Constant and Private memory are extremely fast but limited in space. I believe most devices only support 64kB of constant memory. Local memory is shared by a “group” of workers and its speed is somewhere between Global and Constant.

My goal was to fit everything I needed into the 64kB of constant memory and never need to read from global or local memory to maximize the speed of the program. This proved to be a bit tricky because the standard precomputed secp256k1 multiplication table took up exactly 64kB by itself.

@JeanLuc

How much constant memory do you use for the multiplication and for the addition?

32 jumps are 16kB for x and y-coordinate + 8 kB for their private keys (32 * 256bit = 8kB) + what else?



It is solved from 18.06
legendary
Activity: 1948
Merit: 2097
June 19, 2020, 01:40:55 AM
@JeanLuc
How much constant memory do you use for the multiplication and for the addition?
32 jumps are 16kB for x and y-coordinate + 8 kB for their private keys (32 * 256bit = 8kB) + what else?

I use the following setting to prefer L1 cache as shared mem is not used.
cudaDeviceSetCacheConfig(cudaFuncCachePreferL1);

In constant mem:
Code:
__device__ __constant__ uint64_t _0[] = { 0ULL,0ULL,0ULL,0ULL,0ULL };
__device__ __constant__ uint64_t _1[] = { 1ULL,0ULL,0ULL,0ULL,0ULL };
__device__ __constant__ uint64_t _P[] = { 0xFFFFFFFEFFFFFC2F,0xFFFFFFFFFFFFFFFF,0xFFFFFFFFFFFFFFFF,0xFFFFFFFFFFFFFFFF,0ULL };
__device__ __constant__ uint64_t MM64 = 0xD838091DD2253531; // 64bits lsb negative inverse of P (mod 2^64)
__device__ __constant__ uint64_t _O[] = { 0xBFD25E8CD0364141ULL,0xBAAEDCE6AF48A03BULL,0xFFFFFFFFFFFFFFFEULL,0xFFFFFFFFFFFFFFFFULL
__device__ __constant__ uint64_t jD[NB_JUMP][4];
__device__ __constant__ uint64_t jPx[NB_JUMP][4];
__device__ __constant__ uint64_t jPy[NB_JUMP][4];

I will definitely reduce jD to 128 bits in the next release, the less constant mem usage is better, there is 64Kb available but for L1 cache the lowest is the best.


128 bit * 32 = 4kB saved, good.

If you accept to break the compatibility with the #115 search, you can save another 1kB picking as jumps points with the first 32 bits of the x-coordinate = 0; you have many of them in the file of the old DPs.
Pages:
Jump to: