Pages:
Author

Topic: Pollard's kangaroo ECDLP solver - page 4. (Read 60037 times)

member
Activity: 503
Merit: 38
June 28, 2024, 05:39:04 PM
This coming from a guy who confuses me with some digaran dude. I came across this puzzle in January this year while searching for some tools to help me recover a wallet.dat password from 2016.

Look, I'm old enough to remember when floppy disks were cutting-edge technology. Back in my day, we didn't have these fancy-schmancy high-resolution screens, and I didn't always need glasses to tell the difference between people. So, forgive me if I mix up faces or names now retired and then. At this rate, I'll be mistaking my cat for the vacuum cleaner  Grin
member
Activity: 165
Merit: 26
June 28, 2024, 05:16:10 PM
Can anyone explain why a sloppy modular multiplication is used? The 3rd step is missing, otherwise there exist probability of an incorrect result. Or is it proven that only 2 steps were sufficient (I doubt it)?

I could brag that I managed to write from scratch a completely working Kangaroo that currently works 6x (six times) faster than both the original and all the n00b clones out there. I'm not gonna sell it or release it because for one, I don't need to prove anything to anyone except myself, and secondly, it's really ok if no one believes me that I can squeeze out 900 million jumps/s on an underclocked RTX 3050 laptop GPU that can never reach more than 200 Mops/s with JLP's app. BTW, the stats on JLP's program are biased, the real speed is slower than the one printed on the screen (there's some non-sense smoothing speed computation in there, and the computed time durations are bugged between worker threads and main thread). Real speed is 10% slower due to these bugs. Whatever.


It's impressive that you believe you've developed a version of the Kangaroo that outperforms both the original and all the clones by such a significant margin. Achieving 900 million jumps per second on an underclocked RTX 3050 laptop GPU is indeed a remarkable accomplishment!

Achieving a 6x speed improvement is no small feat, especially considering the limitations you’ve highlighted in the original program. If you truly have a solution that performs this well, sharing it on GitHub isn't just about proving anything to anyone. It's about fostering innovation, collaboration, and growth within the community. By making your work public, you can inspire others, drive progress, and leave a lasting impact on the field. Plus, being part of a collaborative effort can lead to new insights and improvements that benefit everyone.

So, why not contribute to the community and share your impressive work?  Grin

This coming from a guy who confuses me with some digaran dude. I came across this puzzle in January this year while searching for some tools to help me recover a wallet.dat password from 2016.

It's not about "believing", and I truly don't care on making an impression. I already indicated multiple times what the bottlenecks are:

- use of GPU global memory due to use of local stacks, due to use of GPU group size and multiple places where you don't even need a stack at all. This is the #1 slowdown.
- parts of the algorithm can be merged together, some other parts are really just handbook methods, unoptimized to CUDA hardware and the parallel paradigm

Using registers only, smaller group size, actually using the fast shared memory on-chip for inter-thread calculations, thinking better what the algorithm implies when run on massively number of threads, do make an implementation run 6x faster. This is not simply some optimizations in  JLP's Kangaroo, I am talking about a completely from scratch well-thought system here. It's very nice that this software was used to break 115 bit puzzle, but I will again have my suspicions that this was used for 120 or 125. It is just not up to that level of competence, from a design perspective.

Can anyone explain why a sloppy modular multiplication is used? The 3rd step is missing, otherwise there exist probability of an incorrect result. Or is it proven that only 2 steps were sufficient (I doubt it)?

Since no one answered, I will assume it's taken for granted that the multiplication used by JLP is correct (2-folding reduction steps). However, according to page 15 of this research paper:

https://eprint.iacr.org/2021/1151.pdf

under the analysis of "Sloppy reduction" it is proven that unless a correctness check is performed, then you need a 3rd reduction step.

Otherwise, the modular multiplication reduction has a chance (around one in 2**193) to be incorrect, because:

Quote
Thus, if sloppy reduction is applied to a randomly chosen number in the interval [0, R2 − 1], the probability of an incorrect result is about r(r−1) / 2N

There is a missing potential carry that is assumed it can never happen, after the 2nd reduction. And no one can explain it.
member
Activity: 503
Merit: 38
June 28, 2024, 08:12:33 AM
Can anyone explain why a sloppy modular multiplication is used? The 3rd step is missing, otherwise there exist probability of an incorrect result. Or is it proven that only 2 steps were sufficient (I doubt it)?

I could brag that I managed to write from scratch a completely working Kangaroo that currently works 6x (six times) faster than both the original and all the n00b clones out there. I'm not gonna sell it or release it because for one, I don't need to prove anything to anyone except myself, and secondly, it's really ok if no one believes me that I can squeeze out 900 million jumps/s on an underclocked RTX 3050 laptop GPU that can never reach more than 200 Mops/s with JLP's app. BTW, the stats on JLP's program are biased, the real speed is slower than the one printed on the screen (there's some non-sense smoothing speed computation in there, and the computed time durations are bugged between worker threads and main thread). Real speed is 10% slower due to these bugs. Whatever.


It's impressive that you believe you've developed a version of the Kangaroo that outperforms both the original and all the clones by such a significant margin. Achieving 900 million jumps per second on an underclocked RTX 3050 laptop GPU is indeed a remarkable accomplishment!

Achieving a 6x speed improvement is no small feat, especially considering the limitations you’ve highlighted in the original program. If you truly have a solution that performs this well, sharing it on GitHub isn't just about proving anything to anyone. It's about fostering innovation, collaboration, and growth within the community. By making your work public, you can inspire others, drive progress, and leave a lasting impact on the field. Plus, being part of a collaborative effort can lead to new insights and improvements that benefit everyone.

So, why not contribute to the community and share your impressive work?  Grin
newbie
Activity: 6
Merit: 0
June 27, 2024, 05:14:45 PM
I have a question
It is for people who know the Kangaroo program well.

Kangaroo program has a multiple pubkey search feature, but it searches sequentially. that is, key1 key2 key3. It does not search for key2 before key1 is found. Likewise, it does not search for key3 unless key2 is found. I AM GIVING AN EXAMPLE. We have created 500 pubkeys. Is it possible to search all of them at the same time? Is there anyone working on this? can you share?

You would have to run on 500 different machines if you wanted to execute the search in parallel depending on the bit range, with the same amount of power. Think of it like this lets say you wanted to search 500 pubkeys in 125 bit range all at the same time, the computing power would be spread too thin it would be more efficient to just run a single pubkey search on 500 different machines concurrently, even though it sounds crazy. I think this is where FUTURE quantum computers would come in. now if you wanted to search for 500 pubkeys in say the 50 bit range, with modern a gpu you could do it relatively quick on 1 computer .
newbie
Activity: 8
Merit: 0
June 27, 2024, 02:50:33 PM
I have a question
It is for people who know the Kangaroo program well.

Kangaroo program has a multiple pubkey search feature, but it searches sequentially. that is, key1 key2 key3. It does not search for key2 before key1 is found. Likewise, it does not search for key3 unless key2 is found. I AM GIVING AN EXAMPLE. We have created 500 pubkeys. Is it possible to search all of them at the same time? Is there anyone working on this? can you share?

It is like getting a taxi and asking to drive to multiple directions at the same time.
jr. member
Activity: 65
Merit: 1
34Sf4DnMt3z6XKKoWmZRw2nGyfGkDgNJZZ
June 27, 2024, 08:53:30 AM
I have a question
It is for people who know the Kangaroo program well.

Kangaroo program has a multiple pubkey search feature, but it searches sequentially. that is, key1 key2 key3. It does not search for key2 before key1 is found. Likewise, it does not search for key3 unless key2 is found. I AM GIVING AN EXAMPLE. We have created 500 pubkeys. Is it possible to search all of them at the same time? Is there anyone working on this? can you share?
member
Activity: 165
Merit: 26
June 08, 2024, 04:50:47 AM
Can anyone explain why a sloppy modular multiplication is used? The 3rd step is missing, otherwise there exist probability of an incorrect result. Or is it proven that only 2 steps were sufficient (I doubt it)?
member
Activity: 873
Merit: 22
$$P2P BTC BRUTE.JOIN NOW ! https://uclck.me/SQPJk
June 08, 2024, 02:03:45 AM
Anyone knows if pubKeys of range 2^120 -2^121 have been saved somewhere? Even for lower ranges like 2^118-2^119 if available. I understand there might be Kangaroo work files already saved in this range? We are trialling something at the college and I remember reading about this but can't find the post anymore. Reward is possible for verified files.

you can subtract from  to 2^20 and divide 2^20 times to 2^20 and get 2^20 pubkeys one of them will be 2^100 if you talk about 2^120 pubkey.
newbie
Activity: 38
Merit: 0
June 07, 2024, 11:49:39 PM
Are there any places where save files are shared for puzzle #130 either using JeanLuc kangaroo or Etars kangaroo
jr. member
Activity: 46
Merit: 29
May 30, 2024, 10:54:30 AM
Im having a hard time understanding how distinguished points (DP) work. From what i know you only save the start point, trail length, and DP per a trail and start a new trail when a DP is found. You repeat this until you find a collision DP and some how find the "actual" collision of the pubkey.

I don't understand how finding and only storing DP would lead to finding the pubkey collision. I fully understand how the kangaroo method works by creating 2 maps to find a collision using the birthday paradox. but I'm not sure how you can do this without storing the whole set a points.
newbie
Activity: 9
Merit: 5
May 15, 2024, 05:23:51 AM
Anyone knows if pubKeys of range 2^120 -2^121 have been saved somewhere? Even for lower ranges like 2^118-2^119 if available. I understand there might be Kangaroo work files already saved in this range? We are trialling something at the college and I remember reading about this but can't find the post anymore. Reward is possible for verified files.
newbie
Activity: 9
Merit: 0
May 02, 2024, 05:44:40 PM
It is possible to reduce a little bit the complexity of the classic kangaroo algorithm by spreading the starting kangaroo in a non uniform maner.
Roughly speaking, if the private key lies in certain areas it will be found faster. As a counterpart if it lies in certain other areas it will be found slower.
But in average, considering the whole range, the compexity is well reduced.
It you read and understand carefully the reference in the gitbub page, you will find the trick Wink



First welcome back master @Jean_Luc .. so happy to see ur post.

You cant compare your deep knowledge to others, as you have developed those tools and have a deep knowledge in the Math used.

I consider my self good in programming (i have many years experience in C++ , but very new to GPU) .. i have a good Math background but that part of "elliptic curve cryptography" is another world.

So its not easy .. yes i have downloaded ur source code but that not easy at all to understand the magic happening inside.

I am learning a lot, and i must say many thanks for @WanderingPhilospher as he is a big help answering my doubts ...


I don't know if you do stuff in youtube, but i can imagine that some simple videos explaining some of those aspects will be gorgeous .. maybe i wish too much ..


Regards
newbie
Activity: 2
Merit: 0
April 23, 2024, 03:39:10 PM
...

I don't think anybody proficient enough for such a port would encounter your issue because they would take care of differences of Unix/Linux compared to Windows. In Linux OS mostly everything i a file and /dev/urandom is the non-blocking random number generating device (file). Reading from this file gives you not blocking random numbers.

You would have to port everything that's specific to Linux to the way it's done in Windows OS. Are you sure you have the knowledge to do this? Based on your question, I hightly doubt it.

I solved it and it's working now with VS2022 and CUDA 12.1. I made some mistakes in the linking.

My question was a little bit short, sorry
hero member
Activity: 714
Merit: 1010
Crypto Swap Exchange
April 23, 2024, 02:32:53 PM
...

I don't think anybody proficient enough for such a port would encounter your issue because they would take care of differences of Unix/Linux compared to Windows. In Linux OS mostly everything i a file and /dev/urandom is the non-blocking random number generating device (file). Reading from this file gives you not blocking random numbers.

You would have to port everything that's specific to Linux to the way it's done in Windows OS. Are you sure you have the knowledge to do this? Based on your question, I hightly doubt it.
newbie
Activity: 2
Merit: 0
April 22, 2024, 04:43:43 PM
Hi!

I migrated the kangaroo repo to VS2022 (windows), but I got an issue:
"Failed to open /dev/urandom No such file or directory"
Has anyone encountered this problem?
jr. member
Activity: 50
Merit: 7
April 19, 2024, 11:01:36 PM
You talk about crashing SSDs, more info that you don’t know about this program or how it works. If an SSD can’t handle being written to, 12 times a day, then what’s the point of even owning one lol. Yes, this program, you control what and how often DPs are saved to your hard drive.

Now you claim 130 with DP 32 is magnitudes more than 115 with DP 25, which further shows you don’t know how kangaroo or this program works. Magnitudes? lol.

Maybe look at the code for this program that you are commenting on, before commenting on it.
I believe you are the one who should look into the source code, before making any other statements about it. Maybe check the hash lookup part, you will be surprised?...

Let's say you wait 1 year to fill up your DP cache (in RAM) and decide to write it to disk. Do you think you will have to write the same amount of data you have in RAM? Before answering, maybe create a simple C program that maps a 2 TB file to memory and write a single byte at offset 10 GB for example. No seeks, no nothing. I don't think you get the actual difference between offline storage and accessing random addresses of volatile memory which is in O(1). This is why it's called "random address" indexing, any byte you want to read it gets completed in constant time.

You are right though about the storage size difference, it's just 2**(0.5) between #115 with DP 25 and #130 with DP 40.
I cannot answer you how many actual storage bytes were required because I don't care about how the program stored them, just their count.

Quote
This program also solved #115 in 13 days (114 bit key on the Secp256K1 field). It required 2**58.36 group operations using DP25 to complete.
2**58.38 operations = 57.37 bits jump for each kang set = 2**(57.37 - 25) stored jumps each, so storage space was around 2*2**32.37 items for the complete merged "central table" / hashmap. Which is almost perfeclty equal to my estimates.
I fully understand the code and how this program works. All aspects.

115 was solved with a little more than 300GB of, wait, checks notes, “files”.

That was your first rant, exabytes lol.

I hope no SSDs were lost solving 110 and 115 with this program. RIP SSDs and all of your exabytes that you stored. 😁

RIP my SSD's on the distributed Kangaroo project xD
member
Activity: 63
Merit: 14
April 19, 2024, 07:36:18 PM
When we restore searching from a saved work file, it only shows the first key in input file after the end range, so what happens to other keys we input for search?

And if we have a speed of 1000MK/s, it would divide by the number of keys right? meaning half the speed is used to search for the second key if we input 2 keys? If that is the case then the expected time should double if there is more than 1 key.
Thus having 4 keys to search for a 130 bit key could take up to 2**68 operations, correct?

This program cannot search for multiple keys simultaneously.

It will go over each key in the input file, one at a time.

If a solution is found then it goes to the next.

Someone correct me if I'm wrong, but otherwise I think you need to use the -m parameter to give up the search and start with the next key

In a scenario where the solution could not be found within the given range.





jr. member
Activity: 50
Merit: 3
April 19, 2024, 09:45:09 AM
When we restore searching from a saved work file, it only shows the first key in input file after the end range, so what happens to other keys we input for search?

And if we have a speed of 1000MK/s, it would divide by the number of keys right? meaning half the speed is used to search for the second key if we input 2 keys? If that is the case then the expected time should double if there is more than 1 key.
Thus having 4 keys to search for a 130 bit key could take up to 2**68 operations, correct?
full member
Activity: 1232
Merit: 242
Shooters Shoot...
April 10, 2024, 11:13:05 PM
Even if let's say #130 is barely possible by stretching limit and a lot of hashpower, but certainly we should start thinking of some ideas of improving further if we really have to attempt #135 plus.
I agree with you. I am currently searching for #130, but how can we improve any of the existing tools? We talking creating mathematical shortcuts such as division/subtraction, or making current programs faster, or both lol?!

I am definitely interested in all possibilities.

Also, I am a fan of your python coding, 10000%; really great job(s) on your github repos!
member
Activity: 503
Merit: 38
April 10, 2024, 01:55:09 PM
There is a new NVIDIA GPU that was announced, the GB200 which is exponentially more powerful than the current gpus.

You need GB200 NVL72 rack to solve 130. Find out there how much it costs.  Grin

I meant when they become available through data centers and gpu servers.

I think there will be a long line to wait. 
Pages:
Jump to: