Solving ECDLP with Kangaroos: Part 1 + 2 + RCKangaroo - page 4.

RetiredCoder

member

Activity: 112

Merit: 83

I'm farming kangaroos :)

Quote from: kTimesG on December 29, 2024, 09:49:20 AM

Can you explain how that is, without having to be a C++ expert? Once you have P - Q you are at a point that is basically the end of a line, how does that help you further (unless maybe you save it and walk further, but this grows your herd size by 2x of course at each step). The cheap point is still one extra semi-addition (sort of), are you sure you're counting it correctly?

In my tests, as DP grows, K grows as well, but less than double, so overall it's an improvement. But then again, it depends what you are counting.

I don't check cheap points for DP, it's senseless. Instead, I choose cheap point as next point if it has even X. So I use it for kang walking and prefer even X when possible, it helps kangs to collide faster.
Here is the related code, since it's open-source you can easily compile and check everything by yourself:

Code:

if ((kangs[i].p.x.data[0] & 1) != 0)
{
AddP.y.NegModP();
EcPoint p2 = ec.AddPointsHaveInv(Saved, AddP, inversion);
if ((p2.x.data[0] & 1) == 0)
{
kangs[i].p = p2;
kangs[i].dist = SavedD;
if (!inv)
kangs[i].dist.Sub(EcJumps[jmp_ind].dist);
else
kangs[i].dist.Add(EcJumps[jmp_ind].dist);
}
}

You can use this trick only for methods that use symmetry.
It works, I know why it works, and I never saw this trick before.

kTimesG

member

Activity: 165

Merit: 26

Quote from: RetiredCoder on December 29, 2024, 07:08:55 AM

I'm not talking about the trick with cheap second point, I mentioned that everybody knows about it. But noone could use this trick to improve K at DP>0 and you explained why. But I can and I demonstrated it. If you have any links to sources or papers that do it (improve K at any DP>0) - let me know!

Can you explain how that is, without having to be a C++ expert? Once you have P - Q you are at a point that is basically the end of a line, how does that help you further (unless maybe you save it and walk further, but this grows your herd size by 2x of course at each step). The cheap point is still one extra semi-addition (sort of), are you sure you're counting it correctly?

In my tests, as DP grows, K grows as well, but less than double, so overall it's an improvement. But then again, it depends what you are counting.

Quote from: RetiredCoder on December 29, 2024, 07:08:55 AM

Yes I'm still sure that your non-looping method with K=1.0 (at DP>0) does not exist Grin

Remains to be seen

It's debatable what K means anymore once the inner guts of adding points start to become intertwined and you only need parts of the final result. We'd need the hyperelliptic equations style of notation to get the complexity (how many field ops we need grouped by op type).

RetiredCoder

member

Activity: 112

Merit: 83

I'm farming kangaroos :)

Quote from: kTimesG on December 29, 2024, 06:59:55 AM

Quote from: RetiredCoder on December 29, 2024, 03:46:07 AM

One more piece of science! Cheesy

Everybody knows that when we calculate "NextPoint = PreviousPoint + JumpPoint", we can also quickly calculate "PreviousPoint - JumpPoint" because the inversion is the same.
Therefore, if the inversion calculation takes a lot of time, this second point is cheap for us, and we can use it to improve K.
I updated Part #1: Added "SOTA+" method with K = 1.02.

This was something already known and smart guys here already used this knowledge for example in faster vanity searches and even for BSGS. arulbero posted this trick many years ago and it is also in the literature. I used the P + Q and P - Q same inverse speedup even for Python scripts, however there is one big problem here: if P - Q is a DP then this only marginally improves the collision probability lower and lower as DP value grows, because the walk does not continue from P - Q, it continues from P + Q. It is what I call a "weak DP". For low DPs it indeed brings what you call "K" down a lot, or more precisely, you get an increase in DP throughput with the same "K" if not a larger "K".

I'm not talking about the trick with cheap second point, I mentioned that everybody knows about it. But noone could use this trick to improve K at DP>0 and you explained why. But I can and I demonstrated it. If you have any links to sources or papers that do it (improve K at any DP>0) - let me know!

Quote from: kTimesG on December 29, 2024, 06:59:55 AM

Do you remember a few months ago when you said you don't believe I reached a K of ~1.0 and it was most likely a bug in my code? Well, do you still believe this today?

Yes I'm still sure that your non-looping method with K=1.0 (at DP>0) does not exist Grin

kTimesG

member

Activity: 165

Merit: 26

Quote from: RetiredCoder on December 29, 2024, 03:46:07 AM

One more piece of science! Cheesy

Everybody knows that when we calculate "NextPoint = PreviousPoint + JumpPoint", we can also quickly calculate "PreviousPoint - JumpPoint" because the inversion is the same.
Therefore, if the inversion calculation takes a lot of time, this second point is cheap for us, and we can use it to improve K.
I updated Part #1: Added "SOTA+" method with K = 1.02.

This was something already known and smart guys here already used this knowledge for example in faster vanity searches and even for BSGS. arulbero posted this trick many years ago and it is also in the literature. I used the P + Q and P - Q same inverse speedup even for Python scripts, however there is one big problem here: if P - Q is a DP then this only marginally improves the collision probability lower and lower as DP value grows, because the walk does not continue from P - Q, it continues from P + Q. It is what I call a "weak DP". For low DPs it indeed brings what you call "K" down a lot, or more precisely, you get an increase in DP throughput with the same "K" if not a larger "K".

Do you remember a few months ago when you said you don't believe I reached a K of ~1.0 and it was most likely a bug in my code? Well, do you still believe this today?

minobia

newbie

Activity: 51

Merit: 0

Quote from: RetiredCoder on December 29, 2024, 04:29:48 AM

Quote from: minobia on December 29, 2024, 04:07:47 AM

The actual DPs (1176754K) far exceed the expected DPs (109142K), meaning the solver is generating significantly more DPs than anticipated.
The more i lower the DP bits from 16 the worse it gets, it does not generate a single Solved Point.

For solving points with saved tames you must use same "-range" and "-dp" values that you used for tames generation.

Thanks, i must had it messed up!

RetiredCoder

member

Activity: 112

Merit: 83

I'm farming kangaroos :)

Quote from: minobia on December 29, 2024, 04:07:47 AM

The actual DPs (1176754K) far exceed the expected DPs (109142K), meaning the solver is generating significantly more DPs than anticipated.
The more i lower the DP bits from 16 the worse it gets, it does not generate a single Solved Point.

For solving points with saved tames you must use same "-range" and "-dp" values that you used for tames generation.

minobia

newbie

Activity: 51

Merit: 0

Quote

[INFO] Found existing Tames file: C:\RCKangaroo\v1\tames85.dat
[INFO] Processing higher bits: 0/1125899906842623
[INFO] Executing command for high offset: 0
[DEBUG] Command: powershell -Command "& \"C:\\RCKangaroo\\v1\\RCKangaroo.exe\" -range 85 -start 0 -dp 16 -pubkey 02145d2611c823a396ef6712ce0f712f09b9b4f3135e3e0aa3230fb9b6d08d1e16 -tames \"C:\RCKangaroo\v1\tames85.dat\""
********************************************************************************
* RCKangaroo v3.0 (c) 2024 RetiredCoder *
********************************************************************************

This software is free and open-source: https://github.com/RetiredC
It demonstrates fast GPU implementation of SOTA Kangaroo method for solving ECDLP
Windows version
CUDA devices: 1, CUDA driver/runtime: 12.6/12.6
GPU 0: NVIDIA GeForce RTX 4080 SUPER, 15.99 GB, 80 CUs, cap 8.9, PCI 59, L2 size: 65536 KB
Total GPUs for work: 1

MAIN MODE

Solving public key
X: 145D2611C823A396EF6712CE0F712F09B9B4F3135E3E0AA3230FB9B6D08D1E16
Y: 667A05E9A1BDD6F70142B66558BD12CE2C0F9CBC7001B20C8A6A109C80DC5330
Offset: 0000000000000000000000000000000000000000000000000000000000000000

Solving point: Range 85 bits, DP 16, start...
SOTA method, estimated ops: 2^42.702, RAM for DPs: 4.253 GB. DP and GPU overheads not included!
Estimated DPs per kangaroo: 222.050.
load tames...
tames loaded
GPU 0: allocated 1501 MB, 491520 kangaroos. OldGpuMode: No
GPUs started...

The actual DPs (1176754K) far exceed the expected DPs (109142K), meaning the solver is generating significantly more DPs than anticipated.
The more i lower the DP bits from 16 the worse it gets, it does not generate a single Solved Point.

Anyone ?

Quote

MAIN: Speed: 5242 MKeys/s, Err: 0, DPs: 1280854K/109142K, Time: 0d:00h:39m/0d:00h:22m
MAIN: Speed: 5242 MKeys/s, Err: 0, DPs: 1281650K/109142K, Time: 0d:00h:40m/0d:00h:22m
MAIN: Speed: 5197 MKeys/s, Err: 0, DPs: 1282445K/109142K, Time: 0d:00h:40m/0d:00h:22m
MAIN: Speed: 5242 MKeys/s, Err: 0, DPs: 1283247K/109142K, Time: 0d:00h:40m/0d:00h:22m
MAIN: Speed: 5194 MKeys/s, Err: 0, DPs: 1284042K/109142K, Time: 0d:00h:40m/0d:00h:22m
MAIN: Speed: 5193 MKeys/s, Err: 0, DPs: 1284838K/109142K, Time: 0d:00h:40m/0d:00h:22m
MAIN: Speed: 5242 MKeys/s, Err: 0, DPs: 1285640K/109142K, Time: 0d:00h:40m/0d:00h:22m
MAIN: Speed: 5193 MKeys/s, Err: 0, DPs: 1286435K/109142K, Time: 0d:00h:41m/0d:00h:22m
MAIN: Speed: 5194 MKeys/s, Err: 0, DPs: 1287228K/109142K, Time: 0d:00h:41m/0d:00h:22m
MAIN: Speed: 5242 MKeys/s, Err: 0, DPs: 1288031K/109142K, Time: 0d:00h:41m/0d:00h:22m
MAIN: Speed: 5193 MKeys/s, Err: 0, DPs: 1288825K/109142K, Time: 0d:00h:41m/0d:00h:22m
MAIN: Speed: 5242 MKeys/s, Err: 0, DPs: 1289619K/109142K, Time: 0d:00h:41m/0d:00h:22m
MAIN: Speed: 5197 MKeys/s, Err: 0, DPs: 1290422K/109142K, Time: 0d:00h:41m/0d:00h:22m
MAIN: Speed: 5197 MKeys/s, Err: 0, DPs: 1291217K/109142K, Time: 0d:00h:42m/0d:00h:22m
MAIN: Speed: 5193 MKeys/s, Err: 0, DPs: 1292011K/109142K, Time: 0d:00h:42m/0d:00h:22m
MAIN: Speed: 5197 MKeys/s, Err: 0, DPs: 1292813K/109142K, Time: 0d:00h:42m/0d:00h:22m
MAIN: Speed: 5242 MKeys/s, Err: 0, DPs: 1293608K/109142K, Time: 0d:00h:42m/0d:00h:22m
MAIN: Speed: 5242 MKeys/s, Err: 0, DPs: 1294403K/109142K, Time: 0d:00h:42m/0d:00h:22m
MAIN: Speed: 5197 MKeys/s, Err: 0, DPs: 1295206K/109142K, Time: 0d:00h:42m/0d:00h:22m
MAIN: Speed: 5242 MKeys/s, Err: 0, DPs: 1296000K/109142K, Time: 0d:00h:43m/0d:00h:22m
MAIN: Speed: 5193 MKeys/s, Err: 0, DPs: 1296794K/109142K, Time: 0d:00h:43m/0d:00h:22m
MAIN: Speed: 5242 MKeys/s, Err: 0, DPs: 1297596K/109142K, Time: 0d:00h:43m/0d:00h:22m
MAIN: Speed: 5197 MKeys/s, Err: 0, DPs: 1298390K/109142K, Time: 0d:00h:43m/0d:00h:22m
MAIN: Speed: 5194 MKeys/s, Err: 0, DPs: 1299185K/109142K, Time: 0d:00h:43m/0d:00h:22m
MAIN: Speed: 5193 MKeys/s, Err: 0, DPs: 1299980K/109142K, Time: 0d:00h:43m/0d:00h:22m

RetiredCoder

member

Activity: 112

Merit: 83

I'm farming kangaroos :)

One more piece of science! Cheesy

Everybody knows that when we calculate "NextPoint = PreviousPoint + JumpPoint", we can also quickly calculate "PreviousPoint - JumpPoint" because the inversion is the same.
Therefore, if the inversion calculation takes a lot of time, this second point is cheap for us, and we can use it to improve K.
I updated Part #1: Added "SOTA+" method with K = 1.02.

RetiredCoder

member

Activity: 112

Merit: 83

I'm farming kangaroos :)

It takes a lot of time to confirm K at higher ranges.
For v3.0 I did tests for "-range 89 -dp 22": I started 12 instances of RCKangaroo on 12 4090 (one GPU per one instance to get larger path for kangaroos), solved 133 points in total with average K=1.21.
For range 100 I did the same and just waited for two points solved so at least it can solve it, though I cannot confirm exact K because it would take a couple of weeks.

WanderingPhilospher

full member

Activity: 1232

Merit: 242

Shooters Shoot...

Quote from: atom13 on December 28, 2024, 04:34:41 PM

I had time again and was able to continue taking tests.
I think there is a bug in the tool when solving large bit areas.

I was able to solve bits 70, 75, and 80 without any problems, but from 85 onwards none of the puzzles that had already been solved. For example, I changed values at Bit 85 start, etc. but it was never found. An example, after that i stop -> Err: 0, DPs: 98481K/77175K, Time: 0d:00h:13m/0d:00h:10m

Don't think I'm making a mistake because it solved bit 70, 75, 80. If I had time again I would look at it more intensively, I'm not sure but maybe there is an overflow, .....
Can someone else please test this too??

Please, it is better to send the command line you are / were using. That way someone can take a better look at everything. Maybe you forgot to use the correct subtract amount, wrong key, etc. Could be anything, could be nothing, but it's the first thing to look at.

atom13

newbie

Activity: 9

Merit: 0

I had time again and was able to continue taking tests.
I think there is a bug in the tool when solving large bit areas.

I was able to solve bits 70, 75, and 80 without any problems, but from 85 onwards none of the puzzles that had already been solved. For example, I changed values at Bit 85 start, etc. but it was never found. An example, after that i stop -> Err: 0, DPs: 98481K/77175K, Time: 0d:00h:13m/0d:00h:10m

Don't think I'm making a mistake because it solved bit 70, 75, 80. If I had time again I would look at it more intensively, I'm not sure but maybe there is an overflow, .....
Can someone else please test this too??

Lolo54

member

Activity: 131

Merit: 32

yes it is perfectly clear now ok thank you and congratulations again for your work and sharing it is impressive ! I hope you find #135, you deserve it sincerely Wink

RetiredCoder

member

Activity: 112

Merit: 83

I'm farming kangaroos :)

Quote from: Lolo54 on December 28, 2024, 10:58:37 AM

I must not have understood correctly but in this case the ram displayed is indeed 0.315GB and -max 3 it should take 900 MB or even 1 GB but it took 352 MB exactly 352055KB, why? Huh

Because there is some fixed RAM size required to store table itself (about 200MB):

Code:

	if (gMax > 0)
	{
		MaxTotalOps = gMax * ops;
		double ram_max = (32 + 4 + 4) * MaxTotalOps / dp_val; //+4 for grow allocation and memory fragmentation
		ram_max += sizeof(TListRec) * 256 * 256 * 256; //3byte-prefix table
		ram_max /= (1024 * 1024 * 1024); //GB
		printf("Max allowed number of ops: 2^%.3f, max RAM for DPs: %.3f GB\r\n", log2(MaxTotalOps), ram_max);
	}

Lolo54

member

Activity: 131

Merit: 32

Quote from: RetiredCoder on December 28, 2024, 10:36:43 AM

When app starts it displays estimated RAM for DPs based on -range and -dp options, multiply it by -max option to get required RAM (roughly).
Or download latest sources and compile them, I added max RAM size calculation.

Code:

...RCKangaroo.exe" -dp 16 -range 75 -start 4000000000000000000 -tames rckang75a.dat -max 3
********************************************************************************
*                    RCKangaroo v3.0  (c) 2024 RetiredCoder                    *
********************************************************************************

This software is free and open-source: https://github.com/RetiredC
It demonstrates fast GPU implementation of SOTA Kangaroo method for solving ECDLP
Windows version
CUDA devices: 1, CUDA driver/runtime: 12.6/12.6
GPU 0: NVIDIA GeForce RTX 2070, 8.00 GB, 36 CUs, cap 7.5, PCI 1, L2 size: 4096 KB
Total GPUs for work: 1

TAMES GENERATION MODE

Solving point: Range 75 bits, DP 16, start...
SOTA method, estimated ops: 2^37.702, RAM for DPs: 0.315 GB. DP and GPU overheads not included!
Max allowed number of ops: 2^39.287
Estimated DPs per kangaroo: 2.891. DP overhead is big, use less DP value if possible!
GPU 0: allocated 3477 MB, 1179648 kangaroos. OldGpuMode: Yes
GPUs started...
GEN: Speed: 1149 MKeys/s, Err: 0, DPs: 216K/3410K, Time: 0d:00h:00m/0d:00h:03m
GEN: Speed: 1582 MKeys/s, Err: 0, DPs: 467K/3410K, Time: 0d:00h:00m/0d:00h:02m
GEN: Speed: 1566 MKeys/s, Err: 0, DPs: 701K/3410K, Time: 0d:00h:00m/0d:00h:02m
GEN: Speed: 1552 MKeys/s, Err: 0, DPs: 935K/3410K, Time: 0d:00h:00m/0d:00h:02m
GEN: Speed: 1542 MKeys/s, Err: 0, DPs: 1168K/3410K, Time: 0d:00h:00m/0d:00h:02m
GEN: Speed: 1538 MKeys/s, Err: 0, DPs: 1402K/3410K, Time: 0d:00h:01m/0d:00h:02m
GEN: Speed: 1533 MKeys/s, Err: 0, DPs: 1636K/3410K, Time: 0d:00h:01m/0d:00h:02m
GEN: Speed: 1532 MKeys/s, Err: 0, DPs: 1871K/3410K, Time: 0d:00h:01m/0d:00h:02m
GEN: Speed: 1523 MKeys/s, Err: 0, DPs: 2104K/3410K, Time: 0d:00h:01m/0d:00h:02m
….
GEN: Speed: 1519 MKeys/s, Err: 0, DPs: 9282K/3410K, Time: 0d:00h:06m/0d:00h:02m
GEN: Speed: 1521 MKeys/s, Err: 0, DPs: 9515K/3410K, Time: 0d:00h:06m/0d:00h:02m
GEN: Speed: 1519 MKeys/s, Err: 0, DPs: 9749K/3410K, Time: 0d:00h:07m/0d:00h:02m
GEN: Speed: 1519 MKeys/s, Err: 0, DPs: 9983K/3410K, Time: 0d:00h:07m/0d:00h:02m
GEN: Speed: 1519 MKeys/s, Err: 0, DPs: 10217K/3410K, Time: 0d:00h:07m/0d:00h:02m
Operations limit reached
Stopping work ...
saving tames...
tames saved

I must not have understood correctly but in this case the ram displayed is indeed 0.315GB and -max 3 it should take 900 MB or even 1 GB but it took 352 MB exactly 352055KB, why? Huh

RetiredCoder

member

Activity: 112

Merit: 83

I'm farming kangaroos :)

When app starts it displays estimated RAM for DPs based on -range and -dp options, multiply it by -max option to get required RAM (roughly).
Or download latest sources and compile them, I added max RAM size calculation.

Lolo54

member

Activity: 131

Merit: 32

Quote from: minobia on December 28, 2024, 09:48:10 AM

Quote from: Lolo54 on December 28, 2024, 08:47:15 AM

Very interesting this update, for fun I did the test with a precalculation of the points on #75 bit with -max 3 this already requires 350 Mo of free space.
how many Go available requires for example the precalculation of a 100bit space with -max 10 ?

I did a 85bit -max10 the Generation took 4 hours and 35GB of space

ok thank you for your answer I'm going to test a 90 in -max 5 to see but I'm not sure I'll have enough space we'll see

minobia

newbie

Activity: 51

Merit: 0

Quote from: Lolo54 on December 28, 2024, 08:47:15 AM

Very interesting this update, for fun I did the test with a precalculation of the points on #75 bit with -max 3 this already requires 350 Mo of free space.
how many Go available requires for example the precalculation of a 100bit space with -max 10 ?

I did a 85bit -max10 the Generation took 4 hours and 35GB of space

Lolo54

member

Activity: 131

Merit: 32

Very interesting this update, for fun I did the test with a precalculation of the points on #75 bit with -max 3 this already requires 350 Mo of free space.
how many Go available requires for example the precalculation of a 100bit space with -max 10 ?

minobia

newbie

Activity: 51

Merit: 0

Quote from: RetiredCoder on December 27, 2024, 12:17:15 PM

RCKangaroo v3.0:

- added "-tames" and "-max" options.
- fixed some bugs.

Amazing thanks a lot for investing so much of your time in this project so that we can enjoy it!!

mitkopasa

newbie

Activity: 1

Merit: 0

Because I am searching by dividing the whole range into smaller ranges. Even though I used -range option, it does not terminate and continues searching. However, the -max option in V3 can solve this problem. I haven't tried it yet, but I will try, thanks.

Quote from: RetiredCoder on December 27, 2024, 03:20:53 PM

Quote from: ? on December 27, 2024, 02:28:45 PM

Hello, thank you very much for your work! Sir, could you also add the -end function to indicate the end of a range? I think many people here would like to break a large range into small ones and go through it)

Why don't you want to use "-range" for that?

Topic: Solving ECDLP with Kangaroos: Part 1 + 2 + RCKangaroo - page 4. (Read 7214 times)