I think BSGS-cuda is work better than JLP BSGS
JLP BSGS is good but using very long time (for my GPU)
JLP's BSGS does not support GPU; his is CPU only.
Side by side tests of BSGS Cuda and JLP's Kangaroo...
4 pubkeys all in 65 bit range:
Kangaroo total time = 2 mins 34 seconds:
[4921.81 MK/s][GPU 4517.36 MK/s][Count 2^33.89][Dead 0][04s (Avg 04s)][121.0/159.5MB]
Key# 0 [1S]Pub: 0x02400C76A4D227D7BCFE00DC5CE7C935DE02AD42749A712ED4D98D290313DC49D2
Priv: 0x17838B13505B26867
[1135.79 MK/s][GPU 1135.79 MK/s][Count 2^34.27][Dead 0][34s (Avg 18s)][156.6/202.5MB]
Key# 1 [1S]Pub: 0x021D6440B8338632692397D3D98FB6B62055E267E4333EC2A9316E72845649109A
Priv: 0x18838B13505B26867
[1485.74 MK/s][GPU 1485.74 MK/s][Count 2^34.64][Dead 0][36s (Avg 13s)][201.9/258.9MB]
Key# 2 [1S]Pub: 0x03047BA9686B470D7BCCFF8305D1C440389CE43A111CA79DFD25C9943B1949F729
Priv: 0x1012A713505B26867
[1835.35 MK/s][GPU 1835.35 MK/s][Count 2^34.94][Dead 2][38s (Avg 11s)][246.9/315.1MB]
Key# 3 [1S]Pub: 0x02094C07F799C681B9A501A70618E260E47E777A141BF6A445523254DAF1085385
Priv: 0x1F028A10C05B26867
Done: Total time 02:34
BSGS Cuda total time = 1 min 29 seconds:
GPU#2 Cnt:000000000000000000000000000000000000000000000000b850800000000001 859MKey/s x134217728 2^29.75 x2^28=2^57.75
KEY!!>000000000000000000000000000000000000000000000001f028a10c05b26867
Pub: 094c07f799c681b9a501a70618e260e47e777a141bf6a445523254daf1085385c22b8f7747f0b280dac05dc2f60085de07af8e080bf32a1d3befb1f83c1f5404
****************************
Found in 19 seconds
GPU #0 finished
GPU #2 finished
GPU #1 finished
GPU #3 finished
Total time 00:01:29s
cuda finished ok
Press Enter to exit
For at least this range (and probably more up to a certain size) the BSGS Cuda program will be faster, for checking multiple pubkeys, as the spin up time between
pub keys (finding a pub key and moving to the next pub key) is a lot faster than kangaroo program.