Author

Topic: 5-7 kangaroo method (Read 967 times)

sr. member
Activity: 1579
Merit: 267
October 21, 2024, 06:28:41 AM
#44
My point is clear: even a single kangaroo must be fast. If your kangs jump 46K per seconds each - you are on the right way Smiley
Though usually with this approach you will suffer from slow inversion.

thx. Well said, usually - since you basically solved 120 single-handed, I'm sure we both know this isn't the case, unless it's coded by a 1st year college student (or forked from github, which may be even worse). Or is it? Let's leave this as a mistery at least for a while, please, probably less than 3 guys here may understand it. You seem to know a few things, mister...

You can call me Mr. Hot.
?
Activity: -
Merit: -
January 04, 2025, 03:59:39 PM
#40
I feel like you're talking nonsense and I can't understand a single word.
newbie
Activity: 9
Merit: 0
December 29, 2024, 03:11:29 AM
#39
LLE. Damn, you made me go back to the drawing board, just thinking on what you might have used to get to such speeds. For some of the things I came up with as explanation, there's literally just a couple of results on the entire web, but if those work as advertised, I may get a 3x speedup as well, just by freeing up a lot of registers that keep useless information. There's no other way I see, since I already got 100% compute throughput in the CUDA profiler but I'm nowhere near your speed. Great, now I also have a headache.

Don't work so hard, there is no reason for that Smiley
One more tip for you. As far as I remember there is one more problem in original JLP GPU code: every GPU thread processes A LOT of kangs. It will cause an issue with DPs for high-number puzzles like #130. It's obvious but it seems nobody understands it. Don't you see it too?

I remember that JLP kangaroo only support less than 125bit interval search as the output dist is only 32*4 bits?
sr. member
Activity: 1579
Merit: 267
October 21, 2024, 06:26:57 AM
#38
That dude expands the math as you gets hooked.

My point is clear: even a single kangaroo must be fast. If your kangs jump 46K per seconds each - you are on the right way Smiley
Though usually with this approach you will suffer from slow inversion.

thx. Well said, usually - since you basically solved 120 single-handed, I'm sure we both know this isn't the case, unless it's coded by a 1st year college student (or forked from github, which may be even worse). Or is it? Let's leave this as a mistery at least for a while, please, probably less than 3 guys here may understand it. You seem to know a few things, mister...

You can call me Mr. Hot.
member
Activity: 165
Merit: 26
October 20, 2024, 03:51:40 PM
#37
My point is clear: even a single kangaroo must be fast. If your kangs jump 46K per seconds each - you are on the right way Smiley
Though usually with this approach you will suffer from slow inversion.

thx. Well said, usually - since you basically solved 120 single-handed, I'm sure we both know this isn't the case, unless it's coded by a 1st year college student (or forked from github, which may be even worse). Or is it? Let's leave this as a mistery at least for a while, please, probably less than 3 guys here may understand it. You seem to know a few things, mister...
?
Activity: -
Merit: -
October 20, 2024, 02:23:40 PM
#36
Yeah, but still, what's your point? That JLP had it wrong? I already knew this!

My point is clear: even a single kangaroo must be fast. If your kangs jump 46K per seconds each - you are on the right way Smiley
Though usually with this approach you will suffer from slow inversion.

What matters is the effective total speed, not the speed of each individual kangaroo.

If each kangaroo jumps only 180 times a sec - it really matters  Grin
member
Activity: 110
Merit: 61
October 20, 2024, 12:57:44 PM
#35
So for DP=32 every kang finds a DP every 270 days (on average).
Correct?
Now the problem is obvious, isn't it?

Yes, such a problem does exist. The kangaroos still find the DP, because of their huge amounts, but the paths are too short and the first DP found by single kangaroo is always much worse than the subsequent ones.

As for me, I launched JLP Kangaroo with specially modified grid size, group size and other slight improvements, lets call it "fast kangaroos". But I personally know programming in CUDA quite poorly, so I couldn't write a good kernel for these conditions, and I couldn't achieve the necessary speed increase so that, so to speak, quality worked, not quantity.


?
Activity: -
Merit: -
October 20, 2024, 11:44:56 AM
#34
Yeah, sorry, forgot I had changed the group size to 512, though it only impacts the initial time to compute the kangaroos, not the kernel performance. Some people here claim to get to 4 Gk/s on a RTX 4090, I did not investigate too much what the best grid size would actually be, I gave up trying as soon as the GPU -check flag was filling the stdout with "warning: xxxxx DPs were lost" and "item not found"...

Probably they use even larger grid for 4GH...
So if we have 16.7M kangs at 3GH it means that each kang jumps 180 times per sec, it's 15.5M per day.
For #130 you have to use DP=30 at least, or better 32 (especially for #135).
So for DP=32 every kang finds a DP every 270 days (on average).
Correct?
Now the problem is obvious, isn't it?
member
Activity: 165
Merit: 26
October 20, 2024, 11:33:45 AM
#33
No, I don't mean the speed itself, the problem is in DPs.
To demonstrate it, I need to know best speed parameters for 4090 with JLP's app.
When I start Kangaroo.exe with default parameters it shows "Grid(256x128)" and speed is about 1.8GH. I remember that speed can be better if use bigger grid, but don't remember best values. I think nobody uses it with default grid size if there are better values.
If you know them, I will explain the problem with exact numbers.

Optimal grid size in JLP's implementation is a hide and seek game. That's because theoretically optimal size results in much degraded slowdown. There's also a huge amount of kangaroos computed first (gridSize x 512) and you can end up with resource overload.

Since his kernel compiles to 96 registers (ccap 8.9 for RTX 4090) maybe try a grid like (128 * 2) x 512 so each SM fits 2 blocks each of 512 threads. That's close to 70 million kangaroos.

What problem was in DPs, except for the fact that there's all the chances in the world they might be too many of them and get lost?

So 128*2 * 512 = 131K threads and each thread processes a group of 128 kangs, right? It's 16.8M kangs, correct? And speed is about 3GH?

Yeah, sorry, forgot I had changed the group size to 512, though it only impacts the initial time to compute the kangaroos, not the kernel performance. Some people here claim to get to 4 Gk/s on a RTX 4090, I did not investigate too much what the best grid size would actually be, I gave up trying as soon as the GPU -check flag was filling the stdout with "warning: xxxxx DPs were lost" and "item not found"...
?
Activity: -
Merit: -
October 20, 2024, 11:06:47 AM
#32
No, I don't mean the speed itself, the problem is in DPs.
To demonstrate it, I need to know best speed parameters for 4090 with JLP's app.
When I start Kangaroo.exe with default parameters it shows "Grid(256x128)" and speed is about 1.8GH. I remember that speed can be better if use bigger grid, but don't remember best values. I think nobody uses it with default grid size if there are better values.
If you know them, I will explain the problem with exact numbers.

Optimal grid size in JLP's implementation is a hide and seek game. That's because theoretically optimal size results in much degraded slowdown. There's also a huge amount of kangaroos computed first (gridSize x 512) and you can end up with resource overload.

Since his kernel compiles to 96 registers (ccap 8.9 for RTX 4090) maybe try a grid like (128 * 2) x 512 so each SM fits 2 blocks each of 512 threads. That's close to 70 million kangaroos.

What problem was in DPs, except for the fact that there's all the chances in the world they might be too many of them and get lost?

So 128*2 * 512 = 131K threads and each thread processes a group of 128 kangs, right? It's 16.8M kangs, correct? And speed is about 3GH?
?
Activity: -
Merit: -
October 20, 2024, 09:09:12 AM
#31
No, I don't mean the speed itself, the problem is in DPs.
To demonstrate it, I need to know best speed parameters for 4090 with JLP's app.
When I start Kangaroo.exe with default parameters it shows "Grid(256x128)" and speed is about 1.8GH. I remember that speed can be better if use bigger grid, but don't remember best values. I think nobody uses it with default grid size if there are better values.
If you know them, I will explain the problem with exact numbers.
?
Activity: -
Merit: -
October 20, 2024, 06:02:46 AM
#30
LLE. Damn, you made me go back to the drawing board, just thinking on what you might have used to get to such speeds. For some of the things I came up with as explanation, there's literally just a couple of results on the entire web, but if those work as advertised, I may get a 3x speedup as well, just by freeing up a lot of registers that keep useless information. There's no other way I see, since I already got 100% compute throughput in the CUDA profiler but I'm nowhere near your speed. Great, now I also have a headache.

Don't work so hard, there is no reason for that Smiley
One more tip for you. As far as I remember there is one more problem in original JLP GPU code: every GPU thread processes A LOT of kangs. It will cause an issue with DPs for high-number puzzles like #130. It's obvious but it seems nobody understands it. Don't you see it too?
member
Activity: 165
Merit: 26
October 17, 2024, 07:38:11 AM
#29
#120 was pretty easy, just 20pcs of 4090 for six weeks.

Let's say you needed 1.36 sqrt(n) ops, that would mean

Code:
1.36*sqrt(2**119)/(14*86400)/20

op/s per each GPU, which is close to 46 Gk/s

Now, a 4090 can do at max 82 Tflop/s

That's 1782 float32 ops for a single jump.

For a jump you'd need around 6 256-bit muls and 6 additions (if inversions amortized to negligible).

To do a 256-bit mod mul you'd need around 8 int32 * 8 int32 = 64 32-bit multiplications and 56 32-bit additions = 119 operations just for the first step and around another 18 mul + a few adds or so to reduce mod N.
For a single 256-bit addition you'd need around 8 operations (16 to do mod N).

That adds up to around 170 or so ops per 256-bit mul and 16 per add.

Estimated cost: 170 * 6 + 16 * 6 = 1116 ops.

Theoretically, feasible since 1116 < 1782, but are you some kind of a genius or WTF? I didn't even put in the rest of the overhead work required.

LE: ok, I f**ed up. I noticed you said six weeks.

Still, that ends up as 15 Gk/s which is crazy fast.

---

LLE. Damn, you made me go back to the drawing board, just thinking on what you might have used to get to such speeds. For some of the things I came up with as explanation, there's literally just a couple of results on the entire web, but if those work as advertised, I may get a 3x speedup as well, just by freeing up a lot of registers that keep useless information. There's no other way I see, since I already got 100% compute throughput in the CUDA profiler but I'm nowhere near your speed. Great, now I also have a headache.
?
Activity: -
Merit: -
October 17, 2024, 07:09:58 AM
#28
Most people here don't have access to/possibility to rent much computing power, so it doesn't matter to them whether the K is 2.0, 1.7 or 1.2. That's why attempts to do some kind of magic begin.  

Yes, most people don't have resources to solve a high-number puzzles, but it does not mean that they should use magic. It's waste of time unless you are real genius who can break EC, but in this case you don't need to solve puzzles, you have better things to do. So people waste their time for nothing, better buy lottery tickets.

How many GPUs did you use to solve 120-130?

#120 was pretty easy, just 20pcs of 4090 for six weeks.

By the way, I was very surprised when I saw how quickly kangaroos get into a primitive two-point cycle.

It's not rocket science: on average every 2*SizeOfListOfJumps jumps.

Problem is on dealing with cycles on a GPU, oh well, let me just say that having even a couple of registers used for something else than performing some parts of some jump of some kangaroo, can decrease the speed by well over 100%. Not to mention the extra required access to memory when the registers are all full (to "deal with the cycles" one way or the other) bringing the speed down even further.

You just don't have enough skills in GPU coding.
member
Activity: 165
Merit: 26
October 17, 2024, 03:58:24 AM
#27
Fruitless cycles is the main problem of using equivalence, but there is few good ways to deal with them. By the way, I was very surprised when I saw how quickly kangaroos get into a primitive two-point cycle.

How does "good ways" look like? Without decreasing the overall speed more than ~35% (that would be the point at which dealing with the cycles has no benefits over not having them at all, e.g going from 1.36 to 1.72).

Of course when you're running on a CPU, dealing with the cycles is almost un-noticeable, so it's worth it, and can indeed bring down the complexity to 1.36 or even lower (1.25 at the best).But this benefit is quickly offset by the much more performant GPU devices.

Problem is on dealing with cycles on a GPU, oh well, let me just say that having even a couple of registers used for something else than performing some parts of some jump of some kangaroo, can decrease the speed by well over 100%. Not to mention the extra required access to memory when the registers are all full (to "deal with the cycles" one way or the other) bringing the speed down even further.

Not sure of RetiredCoder's method, technically if he solved 120 as he said ("crazy fast kangaroos") then he had no cycles to deal with at all. Or he found some clever way to deal with them in a way that doesn't greatly bring a GPU to a halt... but I'm not counting on that, it would be a waste of resources IMHO.
member
Activity: 110
Merit: 61
October 17, 2024, 02:12:30 AM
#26
Such papers is just a way to get a degree, zero new ideas.
And all this is not important actually.
The main problem that you won't get better than K=1.7 this way even if you use 1000 instead of 3-4-5-7 kangs.
K=1.7 is boring, really. It's like solving puzzles #6x  Grin
Why don't you try this? https://www.iacr.org/archive/pkc2010/60560372/60560372.pdf
Going this way (a long way btw) you can "invent" something interesting and get real K (not in theory but real) around 1.2 if you are smart enough Cheesy I call this approach "mirrored" kangs.
But, of course, this way is not so easy as "normal" kangs that always jump in one direction.
Just a tip for you because it seems you read some papers instead of creating magic circles/code as most people here Cheesy
Though I doubt that you will break #135, it's always fun to invent and discover so you can spend a lot of time having fun...

Most people here don't have access to/possibility to rent much computing power, so it doesn't matter to them whether the K is 2.0, 1.7 or 1.2. That's why attempts to do some kind of magic begin. How many GPUs did you use to solve 120-130?

Fruitless cycles is the main problem of using equivalence, but there is few good ways to deal with them. By the way, I was very surprised when I saw how quickly kangaroos get into a primitive two-point cycle.

Also, thanks for sharing 120 PK.
member
Activity: 165
Merit: 26
October 11, 2024, 04:29:54 AM
#25
John M. Pollard: “Randomness in these algorithms is crucial for their functioning.”

Impress me a little. First, where is that citation?

Secondly, if you make claims, it is not our responsibility to prove you are wrong, but YOUR responsibility to prove you are right.

But I'll make an exception for you, it is your lucky day Smiley

So, how exactly do you use randomness in your Kangaroo understanding of how the algorithm works?

And how exactly you break the birthday paradox, and ruin the stuff entirely, let us understand what do you mean by "making the algorithm more deterministic" by "removing the randomness"? What randomness to remove in the first place? Where is that located, and how do you mess around with it, to be able to claim that you create determinism out of uncontrollable chaos?


By applying precomputation in the Kangaroo algorithm, greater efficiency is achieved because precomputed intermediate results are reused instead of recalculating them each time. This reduces the number of necessary operations and, consequently, the computation time.

However, by reducing the number of operations, the probability of collisions also decreases. This is because the birthday paradox is based on the probability that two elements in a large set will coincide. With fewer operations, there are fewer opportunities for collisions to occur.

What mental institution did you ask ChatGPT to pretend to come out from before producing these paragraphs? I cannot find a single phrase in this quote that makes common sense. Did you even provide the correct tokens to GPT, to begin with? Seems like you asked something related to some birthday paradox, and made GPT hallucinate a little about what precomputing means for kangaroo, while this topic discusses totally different things. Maybe read some actual real-life studies before making a shame of yourself again.
jr. member
Activity: 47
Merit: 12
gmaxwell creator of 1000 BTC puzzl + Pinapple fund
October 10, 2024, 08:52:39 PM
#24
Why do people think they are smarter than studied people that spent many month with big brains on this topic?
Kangaroo cannot be optimized more than what you can find publicly.
Accept YOU won't crack 135bit with kangaroo.

Edit: Don't bother with some ~10% performance optimizations. Maybe some nerds care.
But in the end you feed this to the big players so they can crack it. Good job!
newbie
Activity: 24
Merit: 2
October 10, 2024, 06:06:34 PM
#23
John M. Pollard: “Randomness in these algorithms is crucial for their functioning.”

Impress me a little. First, where is that citation?

Secondly, if you make claims, it is not our responsibility to prove you are wrong, but YOUR responsibility to prove you are right.

But I'll make an exception for you, it is your lucky day Smiley

So, how exactly do you use randomness in your Kangaroo understanding of how the algorithm works?

And how exactly you break the birthday paradox, and ruin the stuff entirely, let us understand what do you mean by "making the algorithm more deterministic" by "removing the randomness"? What randomness to remove in the first place? Where is that located, and how do you mess around with it, to be able to claim that you create determinism out of uncontrollable chaos?


By applying precomputation in the Kangaroo algorithm, greater efficiency is achieved because precomputed intermediate results are reused instead of recalculating them each time. This reduces the number of necessary operations and, consequently, the computation time.

However, by reducing the number of operations, the probability of collisions also decreases. This is because the birthday paradox is based on the probability that two elements in a large set will coincide. With fewer operations, there are fewer opportunities for collisions to occur.
member
Activity: 165
Merit: 26
October 10, 2024, 05:44:36 PM
#22
John M. Pollard: “Randomness in these algorithms is crucial for their functioning.”

Impress me a little. First, where is that citation?

Secondly, if you make claims, it is not our responsibility to prove you are wrong, but YOUR responsibility to prove you are right.

But I'll make an exception for you, it is your lucky day Smiley

So, how exactly do you use randomness in your Kangaroo understanding of how the algorithm works?

And how exactly you break the birthday paradox, and ruin the stuff entirely, let us understand what do you mean by "making the algorithm more deterministic" by "removing the randomness"? What randomness to remove in the first place? Where is that located, and how do you mess around with it, to be able to claim that you create determinism out of uncontrollable chaos?
newbie
Activity: 24
Merit: 2
October 10, 2024, 05:30:59 PM
#21
I don't understand why certain people prefer to create new accounts and troll around serious discussions.

@ElonMusk, if you are who I believe to be - you may need some professional help - friendly advice.

It's obvious no amount of evidence, common sense, or explanations can treat certain issues. This is like a dick contest: I show you that kangaroo with precomputed DPs can break any 80 bits or so key in a couple of seconds (random, sequential, you name it, as long as its in the range), there is always someone that brings up the "but its probabilistic" argument, however has nothing even remotely similar to offer, wtf?! Maybe I missed the day where everyone uses a super-computer and runs BSGS with a few PB of RAM (not that it would suffice).

This topic was concluded already: the 4 kangaroo method remains the fastest method of the kangaroo family, 5 kangaroos or more are sub-optimal, just as suggested in the original paper. Anything else is off-topic.

EOF.

Stop promoting garbage. If you precompute certain values, you might lose the effect of the birthday paradox in the sense that you are reducing randomness and possible combinations. Precomputation can make the problem more deterministic and less dependent on the random coincidences that make the birthday paradox so surprising. You are leading people into a void.

Excellent!  Grin
I think now it's time for some magic circles, don't hesitate!


The birthday paradox is based on probability and randomness, which allows for finding collisions more efficiently than with brute force. By modifying the kangaroo algorithm in a more deterministic way, this probabilistic advantage is lost, and it becomes an approach closer to brute force.

The essence of the birthday paradox is that, with enough randomness, it is more likely to find collisions in a large set of data. By eliminating or reducing randomness, this property is lost, and the algorithm become less efficient.

You are completely and utterly unaware of how Kangaroo works, please stop now.

There is no randomness in the algorithm, ZERO randomness. I repeat: there is no usage of any random features, random functions, or random generators in the Kangaroo algorithm. In contrast, it is actually very well deterministic in nature, otherwise the collisions will not occur. . Having more DPs simply means there are more traps to be hit, increasing the chances. So please go back to the drawing board.

John M. Pollard: “Randomness in these algorithms is crucial for their functioning.”
newbie
Activity: 24
Merit: 2
October 10, 2024, 05:14:47 PM
#20
Stop promoting garbage. If you precompute certain values, you might lose the effect of the birthday paradox in the sense that you are reducing randomness and possible combinations. Precomputation can make the problem more deterministic and less dependent on the random coincidences that make the birthday paradox so surprising. You are leading people into a void.

Excellent!  Grin
I think now it's time for some magic circles, don't hesitate!


The birthday paradox is based on probability and randomness, which allows for finding collisions more efficiently than with brute force. By modifying the kangaroo algorithm in a more deterministic way, this probabilistic advantage is lost, and it becomes an approach closer to brute force.

The essence of the birthday paradox is that, with enough randomness, it is more likely to find collisions in a large set of data. By eliminating or reducing randomness, this property is lost, and the algorithm become less efficient.
?
Activity: -
Merit: -
October 10, 2024, 04:58:42 PM
#19
Stop promoting garbage. If you precompute certain values, you might lose the effect of the birthday paradox in the sense that you are reducing randomness and possible combinations. Precomputation can make the problem more deterministic and less dependent on the random coincidences that make the birthday paradox so surprising. You are leading people into a void.

Excellent!  Grin
I think now it's time for some magic circles, don't hesitate!
newbie
Activity: 24
Merit: 2
October 10, 2024, 04:37:28 PM
#18
You only need to store 0.5 sqrt(N) keys (the baby steps) : from 1*G to 0.5*sqrt(N)*G

And next step is to calculate required memory for 0.5*sqrt(2^134) points.
After this calculation you will lose all your interest in BSGS  Grin
Or may be you are going to break a lot of <64bit points? Even so, kang is faster because you can have precomputed db of DPs and reduce time in 10 times or even more.


If the goal is to retrieve a private key in a 2^134 interval, there is no doubt: BSGS is not the algorithm to use.

But in a 2^64 interval, BSGS can be faster.  

For one 64bit point without precomputing - yes. For many points - kang with precomputed DPs is faster.
Of course you can do the same for BSGS, precompute more points to speedup it, but it won't be so effective as for kang, to speedup BSGS in 10 times you will have to precompute (and store) much more points than for kang (because BSGS does not use birthday paradox sqrt).

Stop promoting garbage. If you precompute certain values, you might lose the effect of the birthday paradox in the sense that you are reducing randomness and possible combinations. Precomputation can make the problem more deterministic and less dependent on the random coincidences that make the birthday paradox so surprising. You are leading people into a void.
?
Activity: -
Merit: -
October 10, 2024, 01:18:29 PM
#17
If we keep silence, maybe he will go away  Cheesy
member
Activity: 165
Merit: 26
October 10, 2024, 01:08:37 PM
#16
Kangaroo is only probabilistic, don't lose focus, Kangaroo doesn't need much power to play the birthday paradox puzzles. Be careful not to ruin the birthday paradox in the search for more speed.

How does that matter? You can't have fast memory in the order of 2**67 elements, and likely not in the lifetime of our grand-grand children. Don't confuse fast memory with storage capacity, the difference between these two is in the order of 5 magnitudes or more. So the only option is to use whatever can be applied to the constraints we're having, no matter if it's probabilistic or not. Kangaroo doesn't require fast memory at all, it requires zero memory actually, and whatever slow storage, only for the offline step (matching), which doesn't influence at all the computation parts. BSGS directly depends on having access to fast memory and this is part of the algorithm itself.
newbie
Activity: 24
Merit: 2
October 10, 2024, 12:16:41 PM
#15


And still people believe BSGS would work faster. for whatever reason, with TB of RAM.. I let them continue believing that.




BSGS algorithm  applied to secp256k1 curve only need 1.0 sqrt(N) operations (on average) and 1.5 sqrt(N) operations in the worst case:

https://eprint.iacr.org/2015/605.pdf

You only need to store 0.5 sqrt(N) keys (the baby steps) : from 1*G to 0.5*sqrt(N)*G
and compute in the worst case 1.0 * sqrt(N) keys (the giant steps)

Only additions and subtractions, no multiplications.

Quote
Conclusion

...

The new algorithms, like the original, have low overhead but high memory.
This means that our algorithms are useful only for discrete logarithm problems
small enough for the lists to fit into fast memory

Kangaroo doesn't need fast memory. Or any memory at all. I'd say it's a good trade-off.

Kangaroo is only probabilistic, don't lose focus, Kangaroo doesn't need much power to play the birthday paradox puzzles. Be careful not to ruin the birthday paradox in the search for more speed.
member
Activity: 165
Merit: 26
October 10, 2024, 12:04:22 PM
#14


And still people believe BSGS would work faster. for whatever reason, with TB of RAM.. I let them continue believing that.




BSGS algorithm  applied to secp256k1 curve only need 1.0 sqrt(N) operations (on average) and 1.5 sqrt(N) operations in the worst case:

https://eprint.iacr.org/2015/605.pdf

You only need to store 0.5 sqrt(N) keys (the baby steps) : from 1*G to 0.5*sqrt(N)*G
and compute in the worst case 1.0 * sqrt(N) keys (the giant steps)

Only additions and subtractions, no multiplications.

Quote
Conclusion

...

The new algorithms, like the original, have low overhead but high memory.
This means that our algorithms are useful only for discrete logarithm problems
small enough for the lists to fit into fast memory

Kangaroo doesn't need fast memory. Or any memory at all. I'd say it's a good trade-off.
?
Activity: -
Merit: -
October 10, 2024, 10:18:39 AM
#13
You only need to store 0.5 sqrt(N) keys (the baby steps) : from 1*G to 0.5*sqrt(N)*G

And next step is to calculate required memory for 0.5*sqrt(2^134) points.
After this calculation you will lose all your interest in BSGS  Grin
Or may be you are going to break a lot of <64bit points? Even so, kang is faster because you can have precomputed db of DPs and reduce time in 10 times or even more.


If the goal is to retrieve a private key in a 2^134 interval, there is no doubt: BSGS is not the algorithm to use.

But in a 2^64 interval, BSGS can be faster.  

For one 64bit point without precomputing - yes. For many points - kang with precomputed DPs is faster.
Of course you can do the same for BSGS, precompute more points to speedup it, but it won't be so effective as for kang, to speedup BSGS in 10 times you will have to precompute (and store) much more points than for kang (because BSGS does not use birthday paradox sqrt).
newbie
Activity: 24
Merit: 2
October 10, 2024, 10:13:21 AM
#12


And still people believe BSGS would work faster. for whatever reason, with TB of RAM.. I let them continue believing that.




BSGS algorithm  applied to secp256k1 curve only need 1.0 sqrt(N) operations (on average) and 1.5 sqrt(N) operations in the worst case:

https://eprint.iacr.org/2015/605.pdf

You only need to store 0.5 sqrt(N) keys (the baby steps) : from 1*G to 0.5*sqrt(N)*G
and compute in the worst case 1.0 * sqrt(N) keys (the giant steps)

Only additions and subtractions, no multiplications.

BSGS is faster, but Kangaroo is very good at aiming, how good will he be?
legendary
Activity: 1948
Merit: 2097
October 10, 2024, 10:03:52 AM
#11
You only need to store 0.5 sqrt(N) keys (the baby steps) : from 1*G to 0.5*sqrt(N)*G

And next step is to calculate required memory for 0.5*sqrt(2^134) points.
After this calculation you will lose all your interest in BSGS  Grin
Or may be you are going to break a lot of <64bit points? Even so, kang is faster because you can have precomputed db of DPs and reduce time in 10 times or even more.


If the goal is to retrieve a private key in a 2^134 interval, there is no doubt: BSGS is not the algorithm to use.

But in a 2^64 interval, BSGS can be faster. 
?
Activity: -
Merit: -
October 10, 2024, 09:47:51 AM
#10
You only need to store 0.5 sqrt(N) keys (the baby steps) : from 1*G to 0.5*sqrt(N)*G

And next step is to calculate required memory for 0.5*sqrt(2^134) points.
After this calculation you will lose all your interest in BSGS  Grin
Or may be you are going to break a lot of <64bit points? Even so, kang is faster because you can have precomputed db of DPs and reduce time in 10 times or even more.
legendary
Activity: 1948
Merit: 2097
October 10, 2024, 08:47:53 AM
#9


And still people believe BSGS would work faster. for whatever reason, with TB of RAM.. I let them continue believing that.




BSGS algorithm  applied to secp256k1 curve only need 1.0 sqrt(N) operations (on average) and 1.5 sqrt(N) operations in the worst case:

https://eprint.iacr.org/2015/605.pdf

You only need to store 0.5 sqrt(N) keys (the baby steps) : from 1*G to 0.5*sqrt(N)*G
and compute in the worst case 1.0 * sqrt(N) keys (the giant steps)

Only additions and subtractions, no multiplications.
member
Activity: 165
Merit: 26
October 09, 2024, 02:28:34 PM
#8

I also discovered that C can go as low as 1.05 if we also allow 1.05 sqrt(N) stored items, with 3 kangaroos alone, and only addition group operations. And still people believe BSGS would work faster. for whatever reason, with TB of RAM.. I let them continue believing that.


Can you provide a link?

https://bitcointalksearch.org/topic/m.64515007

I only have experimental proofs (e.g "it works as advertised"). Don't ask why though. I'm not a number theorist. It's basically the 3 kangaroo method combined with the van Oorschot parallelization method, without using DPs.
legendary
Activity: 1948
Merit: 2097
October 09, 2024, 10:56:44 AM
#7

I also discovered that C can go as low as 1.05 if we also allow 1.05 sqrt(N) stored items, with 3 kangaroos alone, and only addition group operations. And still people believe BSGS would work faster. for whatever reason, with TB of RAM.. I let them continue believing that.


Can you provide a link?
?
Activity: -
Merit: -
October 08, 2024, 02:49:38 PM
#6
OK, I don't insist, see no reasons to do that if you are happy with 1.7 Grin
member
Activity: 165
Merit: 26
October 08, 2024, 02:16:49 PM
#5
Such papers is just a way to get a degree, zero new ideas.
And all this is not important actually.
The main problem that you won't get better than K=1.7 this way even if you use 1000 instead of 3-4-5-7 kangs.
K=1.7 is boring, really. It's like solving puzzles #6x  Grin
Why don't you try this? https://www.iacr.org/archive/pkc2010/60560372/60560372.pdf
Going this way (a long way btw) you can "invent" something interesting and get real K (not in theory but real) around 1.2 if you are smart enough Cheesy I call this approach "mirrored" kangs.
But, of course, this way is not so easy as "normal" kangs that always jump in one direction.
Just a tip for you because it seems you read some papers instead of creating magic circles/code as most people here Cheesy
Though I doubt that you will break #135, it's always fun to invent and discover so you can spend a lot of time having fun...

Thanks, we all already know about that paper, and I discussed it a few times in some of my posts. I also implemented it (even in my public Python kangaroo script) and it indeed brings C to ~ 1.25 - 1.45 but at the expense of much more complex logic, since we can have cycles in the side by side walks, which need to be handled. In addition, that method completely ignores the cost of having to create new starting points, which is not something to be ignored (in fact, is very very costly). This problem is also mentioned in the 3 & 4 kangaroo methods paper (which was actually a follow up to the paper you cited).

There is also a more recent "fix" to the Gaudry-Schost method (but it is written in Chinese) which brings down C a little bit more, by refining the wild set bounds.

However even though C is lower than 1.7, the actual runtime is worse, because of the additional restarts, and cycle detection logic. That means it requires more computing power and additional registers, at the machine level. If well optimized, it can beat 4-Kangaroo on a CPU + RAM, but on a GPU the added requirements (cycle detection, more registers) do not reach the efficiency (throughput) of Kangaroo method, which is simple and direct: jump forward, check if DP, repeat.

I don't intend on ruining someone's tenure, but I also highly suspect that the paper about 5 and 7 kangaroos was not reviewed by supervisors or even by the peer community..

I also discovered that C can go as low as 1.05 if we also allow 1.05 sqrt(N) stored items, with 3 kangaroos alone, and only addition group operations. And still people believe BSGS would work faster. for whatever reason, with TB of RAM.. I let them continue believing that.

Oh and BTW 3 & 4-kang methods are already "mirrored", this is basically the reason why C was brought down from 2.0 to 1.72.
?
Activity: -
Merit: -
October 08, 2024, 11:26:12 AM
#4
Such papers is just a way to get a degree, zero new ideas.
And all this is not important actually.
The main problem that you won't get better than K=1.7 this way even if you use 1000 instead of 3-4-5-7 kangs.
K=1.7 is boring, really. It's like solving puzzles #6x  Grin
Why don't you try this? https://www.iacr.org/archive/pkc2010/60560372/60560372.pdf
Going this way (a long way btw) you can "invent" something interesting and get real K (not in theory but real) around 1.2 if you are smart enough Cheesy I call this approach "mirrored" kangs.
But, of course, this way is not so easy as "normal" kangs that always jump in one direction.
Just a tip for you because it seems you read some papers instead of creating magic circles/code as most people here Cheesy
Though I doubt that you will break #135, it's always fun to invent and discover so you can spend a lot of time having fun...
member
Activity: 165
Merit: 26
October 08, 2024, 05:34:32 AM
#3

It is basically the same problem as when some guys attempt to divide some public key by 2, 4, 8 etc.

Every time we "divide" we cannot tell if the resulting point lies in the lower half of the interval, or whether it is offset by (groupOrder + 1)/2, because we don't know the parity of the private key.


Regarding this very specific problem, one limited approach I can think of is to subtract G from your initial pubkey and keep them both; the initial and the offset resulting from this subtraction.

Now one of these two is sure to have an even private key, as some pubkey minus G = its private key minus 0x1.

Then you "divide" them both and at least one resulting offset will be within the desired interval.

Maybe you could find a similar work around for your specific problem


Yes, that approach works, but what are the consequences?

First, let's say the private key is 0x1. So, we would rather add G and then divide, otherwise we need to add a special condition check in the algorithm ("is P the point at infinity"), making it slower.

But after division, we have two resulting points:

[1/2]P
[1/2](P + G)

and we don't know which one of these two is the one that sits in the lower interval, and which one sits in another interval (points starting at keys [1/2 mod n] = (n+1)/2 onward).

So we must look for either of them. Since we now have two keys to search for, any algorithm will require two times more work at some stage, in an interval that is half in size (either the lower interval, or the [1/2 + k] interval)

Let's say our algorithm has an average run-time of C * sqrt(N) where C is some (any) constant factor and N is the size of the interval.

Let's simplify C to be 1 (we'll have the same end-result if C is either lower or greater then 1).

If we look for the private key in the original interval, average run-time is then sqrt(N)

If we look for 2 private keys in a half-range interval, average run-time is then 2 * sqrt(N/2) = sqrt(2) * sqrt(N) = 1.41 sqrt(N)

My conclusion (non-definitive and open for debates): the average run-time increases by 41% every time we  "divide public key by 2" and attempt to work-around the parity problem.

Now, the cited study uses values for the exponent as 6-decimal digit figures. Following the same logic, I would think that this means the predicted results of his analysis are only valid if the private key that is beaing searched is a multiple of 1.000.000. Otherwise, his wild starting points (the exponents are being used to compute initial wild kangaroo positions) are not really guaranteed to lie in the search interval, but rather in 1.000.000 different intervals.
member
Activity: 63
Merit: 14
October 07, 2024, 02:52:20 PM
#2

It is basically the same problem as when some guys attempt to divide some public key by 2, 4, 8 etc.

Every time we "divide" we cannot tell if the resulting point lies in the lower half of the interval, or whether it is offset by (groupOrder + 1)/2, because we don't know the parity of the private key.


Regarding this very specific problem, one limited approach I can think of is to subtract G from your initial pubkey and keep them both; the initial and the offset resulting from this subtraction.

Now one of these two is sure to have an even private key, as some pubkey minus G = its private key minus 0x1.

Then you "divide" them both and at least one resulting offset will be within the desired interval.

Maybe you could find a similar work around for your specific problem
member
Activity: 165
Merit: 26
October 07, 2024, 04:00:21 AM
#1
So while rethinking a bit on Pollard's 3 kangaroo method I was reminded about this paper:

Kangaroo Methods for Solving the Interval Discrete Logarithm Problem
https://www.math.auckland.ac.nz/~sgal018/AFhons.pdf

In that, the author attempts to find better running times than the 4-kangaroo method presented by Pollard et. all in 2011 or so, and claims to have found them, at least by using 5 kangaroos. Then he also presents a 7-kangaroo algorithm with an almost similar run-time (but not really better).

However I'm either stupid or missed something obvious. The author uses some smart approximation algorithms to find some constants that should be used for the exponents of the Tame and Wild kangaroos.

But these constants are not integers, so the author suggests to use the closest integer values when multiplied by N (interval size). The entire analysis is based on the assumption that these Wild kangaroos start off inside the interval.

What I do not understand is: if we are given a public key. and are told it is found in some interval of size N, then the only known points that we can tell for sure are in some interval are:

- the public key itself
- the opposite point (between -N and -1)
- integer multiples of either of the above two points
- added or subtracted known deltas of any point in the above three cases

So, how did the author reached the conclusion of finding better algorithms, given that we can't be sure that when we do an exponentiation of the form:

Code:
g**0.815N * h

then the resulting point will only ever lie in a known interval as the original only when both N and the private key are both divisible by 1000?

It is basically the same problem as when some guys attempt to divide some public key by 2, 4, 8 etc.

Every time we "divide" we cannot tell if the resulting point lies in the lower half of the interval, or whether it is offset by (groupOrder + 1)/2, because we don't know the parity of the private key.

So, is this study wrong? There are also no given experimental tests.
Jump to: