Hi! pretty cool tool :-)
Do you plan to implement if possible to have
- a "continue" option in case the gpu stop and you want to continue from where is stopped ?
- a "multiple pubkey" option like an input file with a list of pubkey ?
thanks and have a good one
1. Maybe.
2. No, it's bad idea.
I must admit you used some really clever tricks to make maximum usage of shared memory (L1) and L2 caches. I'm still trying to figure out the way you keep track of the jump distances using the shared memory instead of updating them using L2.
After adapting my own kernel to load/store stuff using L2 (instead of only once, before and after all the jumps) I reached 9.7 GK/s on RTX 4090 (64 jump points, DP 32), which was an increase of 75% in speed, and I haven't even tried to do micro-optimizations on it, like before. So I guess this was the missing lack of knowledge to be able go beyond the advertised 8+ GK/s stated by others around here, after trying every possible advanced optimizations I could think of to speed things up.
So did you start work on solving 135?
Yes, 10G for 4090 is ok.
And then one day you will understand that the only way to improve it further - use symmetry and get sqrt(2) boost. Yes you will lose some speed but total improvement worth it.
From RCKangaroo readme:
Fastest ECDLP solvers will always use SOTA method, as it's 1.39 times faster and requires less memory for DPs compared to the best 3-way kangaroos with K=1.6. Even if you already have a faster implementation of kangaroo jumps, incorporating SOTA method will improve it further. While adding the necessary loop-handling code will cause you to lose about 5–15% of your current speed, the SOTA method itself will provide a 39% performance increase. Overall, this translates to roughly a 25% net improvement, which should not be ignored if your goal is to build a truly fast solver.