Pollard's kangaroo ECDLP solver - page 9.

Baboshka

newbie

Activity: 13

Merit: 0

Quote from: WanderingPhilospher on January 12, 2024, 05:23:15 PM

Yes, you are correct, it was meant to expand the search range, like you said, be able to solve 160.

However, I am pretty sure it had bugs or speed was lost. I'm not sure it he has tinkered with it any more since his original post.

Check out Etar's kangaroo version; it is same as JLPs (but trailing DPs; zeros at the end) but can solve up to 192 bit range.

Hi WanderingPhilospher, thanks for the quick answer

Actually the 256 version is not working, i cloned the code from the Repo and build it with CUDA 12.3 & sm_86 and tested it with the same example from JeanLucPons
This one

Code:

49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5e0000000000000000
49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5effffffffffffffff
0459A3BFDAD718C9D3FAC7C187F1139F0815AC5D923910D516E186AFDA28B221DC994327554CED887AAE5D211A2407CDD025CFC3779ECB9C9D7F2F1A1DDF3E9FF8
0335BB25364370D4DD14A9FC2B406D398C4B53C85BE58FCC7297BD34004602EBEC

The original JeanLucPons Kangaroo took 1:30 to find both minute while this 256 version didn't find anything even after 20 min ... therefore i think its not helpful.

Regarding the Etar's kangaroo .. you mean this one https://github.com/Etayson/Etarkangaroo ?
I wonder why there is no source code

what do you mean with this "trailing DPs; zeros at the end" can you explain more please

Regards

WanderingPhilospher

full member

Activity: 1232

Merit: 242

Shooters Shoot...

Quote from: Baboshka on January 12, 2024, 02:54:28 PM

Quote from: NotATether on March 25, 2021, 05:42:59 AM

I pushed an update to Kangaroo-256 a couple of days ago that fixes the botched GPU implementation. https://gitea.datahoarding.agency/ZenulAbidin/Kangaroo-256

Next I want to add the ability to create your own dpmask - whether you want certain bits to always be set and others always be cleared. I think I will need two different masks for this, one for the bits that should be set and another for the ones that should be cleared.

Currently all of the dpmask bits are at the left of every kangaroo but I think the ability to define it at random positions could influence the speed it takes to find a collision. I could even make the dpmask to change at random it I want to.

Hi NotATether, can you please explain what this version with 256 can do that the original version from JeanLucPons can not do ?

- Is it just faster ?
- can solve ranges that the JeanLucPons version can not solve (like puzzle 160 for example because the key space is wider)?

I know that i am a bit later as that was posted 3 years ago, but i find the subject very interesting.

Regards

Yes, you are correct, it was meant to expand the search range, like you said, be able to solve 160.

However, I am pretty sure it had bugs or speed was lost. I'm not sure it he has tinkered with it any more since his original post.

Check out Etar's kangaroo version; it is same as JLPs (but trailing DPs; zeros at the end) but can solve up to 192 bit range.

Baboshka

newbie

Activity: 13

Merit: 0

Quote from: NotATether on March 25, 2021, 05:42:59 AM

I pushed an update to Kangaroo-256 a couple of days ago that fixes the botched GPU implementation. https://gitea.datahoarding.agency/ZenulAbidin/Kangaroo-256

Next I want to add the ability to create your own dpmask - whether you want certain bits to always be set and others always be cleared. I think I will need two different masks for this, one for the bits that should be set and another for the ones that should be cleared.

Currently all of the dpmask bits are at the left of every kangaroo but I think the ability to define it at random positions could influence the speed it takes to find a collision. I could even make the dpmask to change at random it I want to.

Hi NotATether, can you please explain what this version with 256 can do that the original version from JeanLucPons can not do ?

- Is it just faster ?
- can solve ranges that the JeanLucPons version can not solve (like puzzle 160 for example because the key space is wider)?

I know that i am a bit later as that was posted 3 years ago, but i find the subject very interesting.

Regards

3dmlib

jr. member

Activity: 44

Merit: 2

Quote from: WanderingPhilospher on December 25, 2023, 09:57:03 PM

but I had tweaked the way DPs were found.

Can we have your code?

WanderingPhilospher

full member

Activity: 1232

Merit: 242

Shooters Shoot...

Quote

4090 cards have a lot more cache than previous ones.
Looking for changes to fit to cache as much data as possible to speed up kangaroo and prevent memory bottleneck kangaroo have.

Code def needs tweaked to see if there as any pickup in performance.

Code also needs tweaked in how it finds DPs.

The best speed I got out of Kangaroo with a 4090 was 7,750 MKey/s and an A100 got 7,350 MKey/s; but I had tweaked the way DPs were found.

I haven't messed with the code since #125 was found though.

Hope you have success!

Just found old exe file and ran a full 44 bit range to check avg speed; 3060Ti = 2,685 MKey/s Kangaroo.

3dmlib

jr. member

Activity: 44

Merit: 2

Quote from: arulbero on December 25, 2023, 04:06:47 PM

Quote from: 3dmlib on December 25, 2023, 03:15:19 PM

But why you use 01 as starting point? Roll Eyes

I already told you:

Quote from: arulbero on December 25, 2023, 10:00:36 AM

For sequential private keys there is the NextKey function.

DoubleDirect does, as the name indicates, only the double of P.

Then, if you start from G:

G
P = DoubleDirect(G) : G -> P = 2G
P = NextKey(P) : P -> P+1 = 3G
P = NextKey(P) : P -> P+1 = 4G
P = NextKey(P) : P -> P+1 = 5G

and so on

If you use 0000000000000000000000000000000000000000000000000000000000000002

as starting point it should work.

You can't compute G+G = 2G with NextKey, what is not clear?

Quote from: 3dmlib on December 22, 2023, 09:20:26 AM

Point Secp256K1::NextKey(Point &key) {
  // Input key must be reduced and different from G
  // in order to use AddDirect
  return AddDirect(key,G);
}

You are using the NextKey in the only case you can't (and it is clearly written)

Thank you very much. Now I understand, finally Wink

It working on CPU fine. Moving on to GPU

4090 cards have a lot more cache than previous ones.
Looking for changes to fit to cache as much data as possible to speed up kangaroo and prevent memory bottleneck kangaroo have.

arulbero

legendary

Activity: 1968

Merit: 2130

Quote from: 3dmlib on December 25, 2023, 03:15:19 PM

But why you use 01 as starting point? Roll Eyes

I already told you:

Quote from: arulbero on December 25, 2023, 10:00:36 AM

For sequential private keys there is the NextKey function.

DoubleDirect does, as the name indicates, only the double of P.

Then, if you start from G:

G
P = DoubleDirect(G) : G -> P = 2G
P = NextKey(P) : P -> P+1 = 3G
P = NextKey(P) : P -> P+1 = 4G
P = NextKey(P) : P -> P+1 = 5G

and so on

If you use 0000000000000000000000000000000000000000000000000000000000000002

as starting point it should work.

You can't compute G+G = 2G with NextKey, what is not clear?

Quote from: 3dmlib on December 22, 2023, 09:20:26 AM

Point Secp256K1::NextKey(Point &key) {
  // Input key must be reduced and different from G
  // in order to use AddDirect
  return AddDirect(key,G);
}

You are using the NextKey in the only case you can't (and it is clearly written)

3dmlib

jr. member

Activity: 44

Merit: 2

Quote from: WanderingPhilospher on December 25, 2023, 02:10:10 PM

Hard to tell anything without the results of what your program is spitting out.

What happens when you use NextKey? Give exact examples.

// test 1 (WORKING) returns correct public key and BTC address for 0x01 and 0x02 private keys
privateKey.SetBase16((char*)"0000000000000000000000000000000000000000000000000000000000000001");
point = secp->ComputePublicKey(&privateKey);
std::cout << " (" << secp->GetPublicKeyHex(true, point) << ") [" << secp->GetAddress(P2PKH, true, point) << "]" << std::endl;

privateKey.AddOne();

point = secp->ComputePublicKey(&privateKey);
std::cout << " (" << secp->GetPublicKeyHex(true, point) << ") [" << secp->GetAddress(P2PKH, true, point) << "]" << std::endl;

// test 2 (NOT WORKING) returns correct public key and BTC address for 0x01 private key only
privateKey.SetBase16((char*)"0000000000000000000000000000000000000000000000000000000000000001");
point = secp->ComputePublicKey(&privateKey);
std::cout << " (" << secp->GetPublicKeyHex(true, point) << ") [" << secp->GetAddress(P2PKH, true, point) << "]" << std::endl;

point = secp->NextKey(point);

std::cout << " (" << secp->GetPublicKeyHex(true, point) << ") [" << secp->GetAddress(P2PKH, true, point) << "]" << std::endl;

// test 3 as arulbero suggested (NOT WORKING) returns correct public key and BTC address for 0x01 private key only
privateKey.SetBase16((char*)"0000000000000000000000000000000000000000000000000000000000000001");
Point P(secp->ComputePublicKey(&privateKey));
std::cout << " (" << secp->GetPublicKeyHex(true, secp->ComputePublicKey(&privateKey)) << ") [" << secp->GetAddress(P2PKH, true, secp->ComputePublicKey(&privateKey)) << "]" << std::endl;

P = secp->NextKey(P);

std::cout << " (" << secp->GetPublicKeyHex(true, P) << ") [" << secp->GetAddress(P2PKH, true, P) << "]" << std::endl;

arulbero

legendary

Activity: 1968

Merit: 2130

Quote from: 3dmlib on December 25, 2023, 02:42:26 PM

Quote from: arulbero on December 25, 2023, 02:15:22 PM

do you set bool reduce = True?

Yes.
I'm using ComputePublicKey() function from vanitysearch project, not from kangaroo project. It do reduction by default at the end of ComputePublicKey() function.
NextKey() function still not working as it should.

Looking at this code:

https://github.com/JeanLucPons/Kangaroo/blob/354bb80491752262eaca3613557e1bd306b5414d/SECPK1/SECP256K1.cpp#L42C1-L52C4

Code:

  // Compute Generator table
  Point N(G);
  for(int i = 0; i < 32; i++) {
    GTable[i * 256] = N;
    N = DoubleDirect(N);
    for (int j = 1; j < 255; j++) {
      GTable[i * 256 + j] = N;
      N = AddDirect(N, GTable[i * 256]);
    }
    GTable[i * 256 + 255] = N; // Dummy point for check function
  }

I would try this:

Code:

Int* privateKey = "some random private key";

Point P(secp->ComputePublicKey(&privateKey));


while(1) {

      P = NextKey(P)
}

3dmlib

jr. member

Activity: 44

Merit: 2

Quote from: arulbero on December 25, 2023, 02:15:22 PM

do you set bool reduce = True?

Yes.
I'm using ComputePublicKey() function from vanitysearch project, not from kangaroo project. It do reduction by default at the end of ComputePublicKey() function.
NextKey() function still not working as it should.

arulbero

legendary

Activity: 1968

Merit: 2130

Quote from: 3dmlib on December 25, 2023, 01:52:39 PM

This is NOT working:
point = secp->ComputePublicKey(&privateKey);
while(1)
{
point = secp->NextKey(point);
}

https://github.com/JeanLucPons/Kangaroo/blob/354bb80491752262eaca3613557e1bd306b5414d/SECPK1/SECP256K1.cpp#L59C7-L59C60

Secp256K1::ComputePublicKey(Int *privKey,bool reduce)

do you set bool reduce = True?

WanderingPhilospher

full member

Activity: 1232

Merit: 242

Shooters Shoot...

Quote from: 3dmlib on December 25, 2023, 01:52:39 PM

Quote from: arulbero on December 25, 2023, 10:00:36 AM

Quote from: 3dmlib on December 22, 2023, 02:16:56 PM

UPD: 25.12.2023
Actually DoubleDirect() also not working. It starting to calculate wrong keys after some interval. Why this is can happen?
The only function work so far for sequential private keys to public keys is ComputePublicKey().

For sequential private keys there is the NextKey function.

DoubleDirect does, as the name indicates, only the double of P.

Then, if you start from G:

G
P = DoubleDirect(G) : G -> P = 2G
P = NextKey(P) : P -> P+1 = 3G
P = NextKey(P) : P -> P+1 = 4G
P = NextKey(P) : P -> P+1 = 5G

and so on

But what if I need to start not from G point...

Int* privateKey = "some random private key";
Point* point;

This is working:
point = secp->ComputePublicKey(&privateKey);
while(1)
{
   privateKey.AddOne();
   point = secp->ComputePublicKey(&privateKey);
}

This is NOT working:
point = secp->ComputePublicKey(&privateKey);
while(1)
{
   point = secp->NextKey(point);
}

Hard to tell anything without the results of what your program is spitting out.

What happens when you use NextKey? Give exact examples.

3dmlib

jr. member

Activity: 44

Merit: 2

Quote from: arulbero on December 25, 2023, 10:00:36 AM

Quote from: 3dmlib on December 22, 2023, 02:16:56 PM

UPD: 25.12.2023
Actually DoubleDirect() also not working. It starting to calculate wrong keys after some interval. Why this is can happen?
The only function work so far for sequential private keys to public keys is ComputePublicKey().

For sequential private keys there is the NextKey function.

DoubleDirect does, as the name indicates, only the double of P.

Then, if you start from G:

G
P = DoubleDirect(G) : G -> P = 2G
P = NextKey(P) : P -> P+1 = 3G
P = NextKey(P) : P -> P+1 = 4G
P = NextKey(P) : P -> P+1 = 5G

and so on

But what if I need to start not from G point...

Int* privateKey = "some random private key";
Point* point;

This is working:
point = secp->ComputePublicKey(&privateKey);
while(1)
{
   privateKey.AddOne();
   point = secp->ComputePublicKey(&privateKey);
}

This is NOT working:
point = secp->ComputePublicKey(&privateKey);
while(1)
{
   point = secp->NextKey(point);
}

arulbero

legendary

Activity: 1968

Merit: 2130

Quote from: 3dmlib on December 22, 2023, 02:16:56 PM

UPD: 25.12.2023
Actually DoubleDirect() also not working. It starting to calculate wrong keys after some interval. Why this is can happen?
The only function work so far for sequential private keys to public keys is ComputePublicKey().

For sequential private keys there is the NextKey function.

DoubleDirect does, as the name indicates, only the double of P.

Then, if you start from G:

G
P = DoubleDirect(G) : G -> P = 2G
P = NextKey(P) : P -> P+1 = 3G
P = NextKey(P) : P -> P+1 = 4G
P = NextKey(P) : P -> P+1 = 5G

and so on

3dmlib

jr. member

Activity: 44

Merit: 2

Quote from: arulbero on December 22, 2023, 11:36:37 AM

you need instead the DoubleDirect function:
https://github.com/JeanLucPons/Kangaroo/blob/354bb80491752262eaca3613557e1bd306b5414d/SECPK1/SECP256K1.cpp#L438

Thanks.
DoubleDirect() works, but it only about 10% faster, than do entire private key to public key calculation with ComputePublicKey() function.
Is any faster way possible to do this 'increment private key by 1 and get public key' method?

UPD: 25.12.2023
Actually DoubleDirect() also not working. It starting to calculate wrong keys after some interval. Why this is can happen?
The only function work so far for sequential private keys to public keys is ComputePublicKey().

arulbero

legendary

Activity: 1968

Merit: 2130

Quote from: 3dmlib on December 22, 2023, 09:20:26 AM

What is reduced public key?

A public key is a point of the elliptic curve. A point can be represented by 3 coordinates (X,Y,Z) (projective coordinates) or by 2 coordinates (x,y) (affine coordinates).

The passage from projective to affine coordinates is the "reduction"

x = X/Z
y = Y/Z

https://github.com/JeanLucPons/Kangaroo/blob/354bb80491752262eaca3613557e1bd306b5414d/SECPK1/Point.cpp#L64

Quote from: 3dmlib on December 22, 2023, 09:20:26 AM

Point Secp256K1::NextKey(Point &key) {
  // Input key must be reduced and different from G
  // in order to use AddDirect
  return AddDirect(key,G);
}

Is anybody can explain me why this function in secp256k1.cpp file not work correctly?
As I understand it should return next public key like as increment private key by 1 and compute public key from it.

The addition between 2 points in projective coordinates is performed by the function Add:

https://github.com/JeanLucPons/Kangaroo/blob/354bb80491752262eaca3613557e1bd306b5414d/SECPK1/SECP256K1.cpp#L369

and it is faster, because it avoids the inverse of x;

the addition between the same 2 points in affine coordinates is performed instead by the function AddDirect:

https://github.com/JeanLucPons/Kangaroo/blob/354bb80491752262eaca3613557e1bd306b5414d/SECPK1/SECP256K1.cpp#L238.

To compute NextKey function (that calls the AddDirect function), you have:

a) to use a point in affine coordinates (because the AddDirect function needs affine coordinates)
b) to avoid the case P = G, because to perform G+G (in the general P + P ) the AddDirect function doesn't work

you need instead the DoubleDirect function:
https://github.com/JeanLucPons/Kangaroo/blob/354bb80491752262eaca3613557e1bd306b5414d/SECPK1/SECP256K1.cpp#L438

3dmlib

jr. member

Activity: 44

Merit: 2

Hello.

Point Secp256K1::NextKey(Point &key) {
  // Input key must be reduced and different from G
  // in order to use AddDirect
  return AddDirect(key,G);
}

Is anybody can explain me why this function in secp256k1.cpp file not work correctly?
As I understand it should return next public key like as increment private key by 1 and compute public key from it.
What is reduced public key?

Thanks.

3dmlib

jr. member

Activity: 44

Merit: 2

Quote from: ABCbits on December 12, 2023, 03:43:07 AM

Quote from: 3dmlib on December 11, 2023, 08:53:58 AM

Quote from: ABCbits on December 11, 2023, 06:12:21 AM

--snip--

Yes. I tried all possible grid size combinations. Also, I tried to compile on latest cuda version using more recent compute_89 shader model. Same result. The maximum I have on 4090 is about 2500 MK/s. On 3090 is 2200 MK/s. 4090 should be faster. Looks like Kangaroo need some optimization for large number of cores 4090 have. Any ideas where this optimization can be made? Thanks.

I see. While i have no idea which optimization can be done, you might want to try fork of this software or implementation written by different people. Here are few example,
https://github.com/ZenulAbidin/Kangaroo-256
https://github.com/PawelGorny/Kangaroo
https://github.com/secoc/Pollard-Rho-kangaroo

P.S. i haven't tried any of those.

After some tests I found that memory speed is an issue. It GPU memory bound, not GPU chip. This is why it can't scale well for more GPU cores.

ABCbits

legendary

Activity: 2870

Merit: 7490

Crypto Swap Exchange

Quote from: 3dmlib on December 11, 2023, 08:53:58 AM

Quote from: ABCbits on December 11, 2023, 06:12:21 AM

--snip--

Yes. I tried all possible grid size combinations. Also, I tried to compile on latest cuda version using more recent compute_89 shader model. Same result. The maximum I have on 4090 is about 2500 MK/s. On 3090 is 2200 MK/s. 4090 should be faster. Looks like Kangaroo need some optimization for large number of cores 4090 have. Any ideas where this optimization can be made? Thanks.

I see. While i have no idea which optimization can be done, you might want to try fork of this software or implementation written by different people. Here are few example,
https://github.com/ZenulAbidin/Kangaroo-256
https://github.com/PawelGorny/Kangaroo
https://github.com/secoc/Pollard-Rho-kangaroo

P.S. i haven't tried any of those.

3dmlib

jr. member

Activity: 44

Merit: 2

Quote from: ABCbits on December 11, 2023, 06:12:21 AM

Quote from: 3dmlib on December 11, 2023, 05:08:43 AM

Hello. What parameters or something to change in the code to speed-up this program on 4090 card? It only 1.2 times faster than on 3090 card, but it should be about 1.9 times faster. Thanks.

After searching for all post in this thread[1] and github repository[2], it looks like you're first person who mention "4090" or any RTX 4000 series. And while it's obvious, have you tried all guidance on github repository. For example,

Quote from: https://github.com/JeanLucPons/Kangaroo

-g g1x,g1y,g2x,g2y,...: Specify GPU(s) kernel gridsize, default is 2*(MP),2*(Core/MP)

Quote from: https://github.com/JeanLucPons/Kangaroo

Powerfull GPUs with large number of cores won't be very efficient on small range, you can try to decrease the grid size in order to have less kangaroos but the GPU performance may not be optimal

[1] https://ninjastic.space/search?content=4090&topic_id=5244940
[2] https://github.com/search?q=repo%3AJeanLucPons%2FKangaroo%204090&type=issues

Yes. I tried all possible grid size combinations. Also, I tried to compile on latest cuda version using more recent compute_89 shader model. Same result. The maximum I have on 4090 is about 2500 MK/s. On 3090 is 2200 MK/s. 4090 should be faster. Looks like Kangaroo need some optimization for large number of cores 4090 have. Any ideas where this optimization can be made? Thanks.

Topic: Pollard's kangaroo ECDLP solver - page 9. (Read 60582 times)