Pollard's kangaroo ECDLP solver - page 4.

NotATether

legendary

Activity: 1568

Merit: 6660

bitcoincleanup.com / bitmixlist.org

Quote from: kTimesG on June 28, 2024, 05:16:10 PM

Quote from: kTimesG on June 08, 2024, 04:50:47 AM

Can anyone explain why a sloppy modular multiplication is used? The 3rd step is missing, otherwise there exist probability of an incorrect result. Or is it proven that only 2 steps were sufficient (I doubt it)?

Since no one answered, I will assume it's taken for granted that the multiplication used by JLP is correct (2-folding reduction steps). However, according to page 15 of this research paper:

https://eprint.iacr.org/2021/1151.pdf

under the analysis of "Sloppy reduction" it is proven that unless a correctness check is performed, then you need a 3rd reduction step.

Otherwise, the modular multiplication reduction has a chance (around one in 2**193) to be incorrect, because:

Quote

Thus, if sloppy reduction is applied to a randomly chosen number in the interval [0, R² − 1], the probability of an incorrect result is about r(r−1) / 2N

There is a missing potential carry that is assumed it can never happen, after the 2nd reduction. And no one can explain it.

I do recall JLP himself writing that it's using projective coordinates in the ecmath. So I guess for his modular multiplication implementation too (this is the same for VanitySearch).

In such a coordinate system, a different variable holds the divisor that's accumulated during the multiplication steps of the other two variables, so that reduction can be done much much later (as it is still expensive, even if you use heavily optimized algos of binary GCD and 2x52 legs).

Since the multiplication can be done by repeated addition, this page might come useful: https://www.nayuki.io/page/elliptic-curve-point-addition-in-projective-coordinates

nomachine

member

Activity: 503

Merit: 38

Quote from: kTimesG on June 28, 2024, 05:16:10 PM

This coming from a guy who confuses me with some digaran dude. I came across this puzzle in January this year while searching for some tools to help me recover a wallet.dat password from 2016.

Look, I'm old enough to remember when floppy disks were cutting-edge technology. Back in my day, we didn't have these fancy-schmancy high-resolution screens, and I didn't always need glasses to tell the difference between people. So, forgive me if I mix up faces or names now retired and then. At this rate, I'll be mistaking my cat for the vacuum cleaner Grin

kTimesG

member

Activity: 165

Merit: 26

Quote from: nomachine on June 28, 2024, 08:12:33 AM

Quote from: kTimesG on June 08, 2024, 04:50:47 AM

Can anyone explain why a sloppy modular multiplication is used? The 3rd step is missing, otherwise there exist probability of an incorrect result. Or is it proven that only 2 steps were sufficient (I doubt it)?

Quote from: kTimesG on June 25, 2024, 12:53:45 PM

I could brag that I managed to write from scratch a completely working Kangaroo that currently works 6x (six times) faster than both the original and all the n00b clones out there. I'm not gonna sell it or release it because for one, I don't need to prove anything to anyone except myself, and secondly, it's really ok if no one believes me that I can squeeze out 900 million jumps/s on an underclocked RTX 3050 laptop GPU that can never reach more than 200 Mops/s with JLP's app. BTW, the stats on JLP's program are biased, the real speed is slower than the one printed on the screen (there's some non-sense smoothing speed computation in there, and the computed time durations are bugged between worker threads and main thread). Real speed is 10% slower due to these bugs. Whatever.

It's impressive that you believe you've developed a version of the Kangaroo that outperforms both the original and all the clones by such a significant margin. Achieving 900 million jumps per second on an underclocked RTX 3050 laptop GPU is indeed a remarkable accomplishment!

Achieving a 6x speed improvement is no small feat, especially considering the limitations you’ve highlighted in the original program. If you truly have a solution that performs this well, sharing it on GitHub isn't just about proving anything to anyone. It's about fostering innovation, collaboration, and growth within the community. By making your work public, you can inspire others, drive progress, and leave a lasting impact on the field. Plus, being part of a collaborative effort can lead to new insights and improvements that benefit everyone.

So, why not contribute to the community and share your impressive work? Grin

This coming from a guy who confuses me with some digaran dude. I came across this puzzle in January this year while searching for some tools to help me recover a wallet.dat password from 2016.

It's not about "believing", and I truly don't care on making an impression. I already indicated multiple times what the bottlenecks are:

- use of GPU global memory due to use of local stacks, due to use of GPU group size and multiple places where you don't even need a stack at all. This is the #1 slowdown.
- parts of the algorithm can be merged together, some other parts are really just handbook methods, unoptimized to CUDA hardware and the parallel paradigm

Using registers only, smaller group size, actually using the fast shared memory on-chip for inter-thread calculations, thinking better what the algorithm implies when run on massively number of threads, do make an implementation run 6x faster. This is not simply some optimizations in JLP's Kangaroo, I am talking about a completely from scratch well-thought system here. It's very nice that this software was used to break 115 bit puzzle, but I will again have my suspicions that this was used for 120 or 125. It is just not up to that level of competence, from a design perspective.

Quote from: kTimesG on June 08, 2024, 04:50:47 AM

Can anyone explain why a sloppy modular multiplication is used? The 3rd step is missing, otherwise there exist probability of an incorrect result. Or is it proven that only 2 steps were sufficient (I doubt it)?

Since no one answered, I will assume it's taken for granted that the multiplication used by JLP is correct (2-folding reduction steps). However, according to page 15 of this research paper:

https://eprint.iacr.org/2021/1151.pdf

under the analysis of "Sloppy reduction" it is proven that unless a correctness check is performed, then you need a 3rd reduction step.

Otherwise, the modular multiplication reduction has a chance (around one in 2**193) to be incorrect, because:

Quote

Thus, if sloppy reduction is applied to a randomly chosen number in the interval [0, R² − 1], the probability of an incorrect result is about r(r−1) / 2N

There is a missing potential carry that is assumed it can never happen, after the 2nd reduction. And no one can explain it.

nomachine

member

Activity: 503

Merit: 38

Quote from: kTimesG on June 08, 2024, 04:50:47 AM

Can anyone explain why a sloppy modular multiplication is used? The 3rd step is missing, otherwise there exist probability of an incorrect result. Or is it proven that only 2 steps were sufficient (I doubt it)?

Quote from: kTimesG on June 25, 2024, 12:53:45 PM

I could brag that I managed to write from scratch a completely working Kangaroo that currently works 6x (six times) faster than both the original and all the n00b clones out there. I'm not gonna sell it or release it because for one, I don't need to prove anything to anyone except myself, and secondly, it's really ok if no one believes me that I can squeeze out 900 million jumps/s on an underclocked RTX 3050 laptop GPU that can never reach more than 200 Mops/s with JLP's app. BTW, the stats on JLP's program are biased, the real speed is slower than the one printed on the screen (there's some non-sense smoothing speed computation in there, and the computed time durations are bugged between worker threads and main thread). Real speed is 10% slower due to these bugs. Whatever.

It's impressive that you believe you've developed a version of the Kangaroo that outperforms both the original and all the clones by such a significant margin. Achieving 900 million jumps per second on an underclocked RTX 3050 laptop GPU is indeed a remarkable accomplishment!

Achieving a 6x speed improvement is no small feat, especially considering the limitations you’ve highlighted in the original program. If you truly have a solution that performs this well, sharing it on GitHub isn't just about proving anything to anyone. It's about fostering innovation, collaboration, and growth within the community. By making your work public, you can inspire others, drive progress, and leave a lasting impact on the field. Plus, being part of a collaborative effort can lead to new insights and improvements that benefit everyone.

So, why not contribute to the community and share your impressive work? Grin

Chail35

newbie

Activity: 6

Merit: 0

Quote from: Baskentliia on June 27, 2024, 08:53:30 AM

I have a question
It is for people who know the Kangaroo program well.

Kangaroo program has a multiple pubkey search feature, but it searches sequentially. that is, key1 key2 key3. It does not search for key2 before key1 is found. Likewise, it does not search for key3 unless key2 is found. I AM GIVING AN EXAMPLE. We have created 500 pubkeys. Is it possible to search all of them at the same time? Is there anyone working on this? can you share?

You would have to run on 500 different machines if you wanted to execute the search in parallel depending on the bit range, with the same amount of power. Think of it like this lets say you wanted to search 500 pubkeys in 125 bit range all at the same time, the computing power would be spread too thin it would be more efficient to just run a single pubkey search on 500 different machines concurrently, even though it sounds crazy. I think this is where FUTURE quantum computers would come in. now if you wanted to search for 500 pubkeys in say the 50 bit range, with modern a gpu you could do it relatively quick on 1 computer .

sdhgfdsjfgsiujdgf

newbie

Activity: 11

Merit: 0

Quote from: Baskentliia on June 27, 2024, 08:53:30 AM

I have a question
It is for people who know the Kangaroo program well.

Kangaroo program has a multiple pubkey search feature, but it searches sequentially. that is, key1 key2 key3. It does not search for key2 before key1 is found. Likewise, it does not search for key3 unless key2 is found. I AM GIVING AN EXAMPLE. We have created 500 pubkeys. Is it possible to search all of them at the same time? Is there anyone working on this? can you share?

It is like getting a taxi and asking to drive to multiple directions at the same time.

Baskentliia

jr. member

Activity: 65

Merit: 1

34Sf4DnMt3z6XKKoWmZRw2nGyfGkDgNJZZ

I have a question
It is for people who know the Kangaroo program well.

Kangaroo program has a multiple pubkey search feature, but it searches sequentially. that is, key1 key2 key3. It does not search for key2 before key1 is found. Likewise, it does not search for key3 unless key2 is found. I AM GIVING AN EXAMPLE. We have created 500 pubkeys. Is it possible to search all of them at the same time? Is there anyone working on this? can you share?

kTimesG

member

Activity: 165

Merit: 26

Can anyone explain why a sloppy modular multiplication is used? The 3rd step is missing, otherwise there exist probability of an incorrect result. Or is it proven that only 2 steps were sufficient (I doubt it)?

COBRAS

member

Activity: 873

Merit: 22

$$P2P BTC BRUTE.JOIN NOW ! https://uclck.me/SQPJk

Quote from: Black_Pawn on May 15, 2024, 05:23:51 AM

Anyone knows if pubKeys of range 2^120 -2^121 have been saved somewhere? Even for lower ranges like 2^118-2^119 if available. I understand there might be Kangaroo work files already saved in this range? We are trialling something at the college and I remember reading about this but can't find the post anymore. Reward is possible for verified files.

you can subtract from to 2^20 and divide 2^20 times to 2^20 and get 2^20 pubkeys one of them will be 2^100 if you talk about 2^120 pubkey.

madogss

newbie

Activity: 38

Merit: 0

Are there any places where save files are shared for puzzle #130 either using JeanLuc kangaroo or Etars kangaroo

graphite

jr. member

Activity: 46

Merit: 29

Im having a hard time understanding how distinguished points (DP) work. From what i know you only save the start point, trail length, and DP per a trail and start a new trail when a DP is found. You repeat this until you find a collision DP and some how find the "actual" collision of the pubkey.

I don't understand how finding and only storing DP would lead to finding the pubkey collision. I fully understand how the kangaroo method works by creating 2 maps to find a collision using the birthday paradox. but I'm not sure how you can do this without storing the whole set a points.

Black_Pawn

newbie

Activity: 9

Merit: 5

Anyone knows if pubKeys of range 2^120 -2^121 have been saved somewhere? Even for lower ranges like 2^118-2^119 if available. I understand there might be Kangaroo work files already saved in this range? We are trialling something at the college and I remember reading about this but can't find the post anymore. Reward is possible for verified files.

Baboshka

newbie

Activity: 13

Merit: 0

Quote from: Jean_Luc on March 29, 2024, 02:06:29 PM

It is possible to reduce a little bit the complexity of the classic kangaroo algorithm by spreading the starting kangaroo in a non uniform maner.
Roughly speaking, if the private key lies in certain areas it will be found faster. As a counterpart if it lies in certain other areas it will be found slower.
But in average, considering the whole range, the compexity is well reduced.
It you read and understand carefully the reference in the gitbub page, you will find the trick Wink

First welcome back master @Jean_Luc .. so happy to see ur post.

You cant compare your deep knowledge to others, as you have developed those tools and have a deep knowledge in the Math used.

I consider my self good in programming (i have many years experience in C++ , but very new to GPU) .. i have a good Math background but that part of "elliptic curve cryptography" is another world.

So its not easy .. yes i have downloaded ur source code but that not easy at all to understand the magic happening inside.

I am learning a lot, and i must say many thanks for @WanderingPhilospher as he is a big help answering my doubts ...

I don't know if you do stuff in youtube, but i can imagine that some simple videos explaining some of those aspects will be gorgeous .. maybe i wish too much ..

Regards

G34hj56

newbie

Activity: 2

Merit: 0

Quote from: Cricktor on April 23, 2024, 02:32:53 PM

Quote from: G34hj56 on April 22, 2024, 04:43:43 PM

...

I don't think anybody proficient enough for such a port would encounter your issue because they would take care of differences of Unix/Linux compared to Windows. In Linux OS mostly everything i a file and /dev/urandom is the non-blocking random number generating device (file). Reading from this file gives you not blocking random numbers.

You would have to port everything that's specific to Linux to the way it's done in Windows OS. Are you sure you have the knowledge to do this? Based on your question, I hightly doubt it.

I solved it and it's working now with VS2022 and CUDA 12.1. I made some mistakes in the linking.

My question was a little bit short, sorry

Cricktor

hero member

Activity: 714

Merit: 1010

Crypto Swap Exchange

Quote from: G34hj56 on April 22, 2024, 04:43:43 PM

...

I don't think anybody proficient enough for such a port would encounter your issue because they would take care of differences of Unix/Linux compared to Windows. In Linux OS mostly everything i a file and /dev/urandom is the non-blocking random number generating device (file). Reading from this file gives you not blocking random numbers.

You would have to port everything that's specific to Linux to the way it's done in Windows OS. Are you sure you have the knowledge to do this? Based on your question, I hightly doubt it.

G34hj56

newbie

Activity: 2

Merit: 0

Hi!

I migrated the kangaroo repo to VS2022 (windows), but I got an issue:
"Failed to open /dev/urandom No such file or directory"
Has anyone encountered this problem?

yoyodapro

jr. member

Activity: 50

Merit: 7

Quote from: WanderingPhilospher on April 01, 2024, 09:17:14 AM

Quote from: kTimesG on April 01, 2024, 09:08:24 AM

Quote from: WanderingPhilospher on April 01, 2024, 08:42:42 AM

You talk about crashing SSDs, more info that you don’t know about this program or how it works. If an SSD can’t handle being written to, 12 times a day, then what’s the point of even owning one lol. Yes, this program, you control what and how often DPs are saved to your hard drive.

Now you claim 130 with DP 32 is magnitudes more than 115 with DP 25, which further shows you don’t know how kangaroo or this program works. Magnitudes? lol.

Maybe look at the code for this program that you are commenting on, before commenting on it.

I believe you are the one who should look into the source code, before making any other statements about it. Maybe check the hash lookup part, you will be surprised?...

Let's say you wait 1 year to fill up your DP cache (in RAM) and decide to write it to disk. Do you think you will have to write the same amount of data you have in RAM? Before answering, maybe create a simple C program that maps a 2 TB file to memory and write a single byte at offset 10 GB for example. No seeks, no nothing. I don't think you get the actual difference between offline storage and accessing random addresses of volatile memory which is in O(1). This is why it's called "random address" indexing, any byte you want to read it gets completed in constant time.

You are right though about the storage size difference, it's just 2**(0.5) between #115 with DP 25 and #130 with DP 40.
I cannot answer you how many actual storage bytes were required because I don't care about how the program stored them, just their count.

Quote

This program also solved #115 in 13 days (114 bit key on the Secp256K1 field). It required 2**58.36 group operations using DP25 to complete.

2**58.38 operations = 57.37 bits jump for each kang set = 2**(57.37 - 25) stored jumps each, so storage space was around 2*2**32.37 items for the complete merged "central table" / hashmap. Which is almost perfeclty equal to my estimates.

I fully understand the code and how this program works. All aspects.

115 was solved with a little more than 300GB of, wait, checks notes, “files”.

That was your first rant, exabytes lol.

I hope no SSDs were lost solving 110 and 115 with this program. RIP SSDs and all of your exabytes that you stored. 😁

RIP my SSD's on the distributed Kangaroo project xD

CY4NiDE

member

Activity: 63

Merit: 14

Quote from: satashi_nokamato on April 19, 2024, 09:45:09 AM

When we restore searching from a saved work file, it only shows the first key in input file after the end range, so what happens to other keys we input for search?

And if we have a speed of 1000MK/s, it would divide by the number of keys right? meaning half the speed is used to search for the second key if we input 2 keys? If that is the case then the expected time should double if there is more than 1 key.
Thus having 4 keys to search for a 130 bit key could take up to 2**68 operations, correct?

This program cannot search for multiple keys simultaneously.

It will go over each key in the input file, one at a time.

If a solution is found then it goes to the next.

Someone correct me if I'm wrong, but otherwise I think you need to use the -m parameter to give up the search and start with the next key

In a scenario where the solution could not be found within the given range.

satashi_nokamato

jr. member

Activity: 50

Merit: 3

When we restore searching from a saved work file, it only shows the first key in input file after the end range, so what happens to other keys we input for search?

And if we have a speed of 1000MK/s, it would divide by the number of keys right? meaning half the speed is used to search for the second key if we input 2 keys? If that is the case then the expected time should double if there is more than 1 key.
Thus having 4 keys to search for a 130 bit key could take up to 2**68 operations, correct?

WanderingPhilospher

full member

Activity: 1232

Merit: 242

Shooters Shoot...

Quote from: iceland2k14 on April 10, 2024, 01:43:23 PM

Even if let's say #130 is barely possible by stretching limit and a lot of hashpower, but certainly we should start thinking of some ideas of improving further if we really have to attempt #135 plus.

I agree with you. I am currently searching for #130, but how can we improve any of the existing tools? We talking creating mathematical shortcuts such as division/subtraction, or making current programs faster, or both lol?!

I am definitely interested in all possibilities.

Also, I am a fan of your python coding, 10000%; really great job(s) on your github repos!

Topic: Pollard's kangaroo ECDLP solver - page 4. (Read 60381 times)