Pages:
Author

Topic: BSGS solver for cuda - page 10. (Read 3292 times)

member
Activity: 406
Merit: 45
October 11, 2021, 09:56:51 PM
#19

4 pubkeys all in 65 bit range:


work fast on 65 bit range

still limited for power can fine on 120 bit right

and still limited to fine range that on 65 bit nearly point to hit key
member
Activity: 406
Merit: 45
October 11, 2021, 09:46:53 PM
#18
Quote
I think BSGS-cuda is work better than JLP BSGS
JLP BSGS is good but using very long time (for my GPU)
JLP's BSGS does not support GPU; his is CPU only.

Correct, Sorry I forget it, I mean it use very slow for my laptop work , sometime I give up to end task for waiting longtime overnight
full member
Activity: 1050
Merit: 219
Shooters Shoot...
October 11, 2021, 03:31:57 PM
#17
Quote
I think BSGS-cuda is work better than JLP BSGS
JLP BSGS is good but using very long time (for my GPU)
JLP's BSGS does not support GPU; his is CPU only.

Side by side tests of BSGS Cuda and JLP's Kangaroo...

4 pubkeys all in 65 bit range:

Kangaroo total time = 2 mins 34 seconds:
Code:
[4921.81 MK/s][GPU 4517.36 MK/s][Count 2^33.89][Dead 0][04s (Avg 04s)][121.0/159.5MB]
Key# 0 [1S]Pub:  0x02400C76A4D227D7BCFE00DC5CE7C935DE02AD42749A712ED4D98D290313DC49D2
       Priv: 0x17838B13505B26867
[1135.79 MK/s][GPU 1135.79 MK/s][Count 2^34.27][Dead 0][34s (Avg 18s)][156.6/202.5MB]
Key# 1 [1S]Pub:  0x021D6440B8338632692397D3D98FB6B62055E267E4333EC2A9316E72845649109A
       Priv: 0x18838B13505B26867
[1485.74 MK/s][GPU 1485.74 MK/s][Count 2^34.64][Dead 0][36s (Avg 13s)][201.9/258.9MB]
Key# 2 [1S]Pub:  0x03047BA9686B470D7BCCFF8305D1C440389CE43A111CA79DFD25C9943B1949F729
       Priv: 0x1012A713505B26867
[1835.35 MK/s][GPU 1835.35 MK/s][Count 2^34.94][Dead 2][38s (Avg 11s)][246.9/315.1MB]
Key# 3 [1S]Pub:  0x02094C07F799C681B9A501A70618E260E47E777A141BF6A445523254DAF1085385
       Priv: 0x1F028A10C05B26867

Done: Total time 02:34

BSGS Cuda total time = 1 min 29 seconds:
Code:
GPU#2 Cnt:000000000000000000000000000000000000000000000000b850800000000001 859MKey/s x134217728 2^29.75 x2^28=2^57.75
KEY!!>000000000000000000000000000000000000000000000001f028a10c05b26867
Pub: 094c07f799c681b9a501a70618e260e47e777a141bf6a445523254daf1085385c22b8f7747f0b280dac05dc2f60085de07af8e080bf32a1d3befb1f83c1f5404
****************************
Found in 19 seconds
GPU #0 finished
GPU #2 finished
GPU #1 finished
GPU #3 finished
Total time 00:01:29s
cuda finished ok

Press Enter to exit


For at least this range (and probably more up to a certain size) the BSGS Cuda program will be faster, for checking multiple pubkeys, as the spin up time between
pub keys (finding a pub key and moving to the next pub key) is a lot faster than kangaroo program.
sr. member
Activity: 616
Merit: 312
October 11, 2021, 03:24:13 PM
#16
Mantadory update v.1.2.1
*bug fixed with multy GPU searching.
member
Activity: 406
Merit: 45
October 11, 2021, 06:46:42 AM
#15

Thank Etar

I think BSGS-cuda is work better than JLP BSGS
JLP BSGS is good but using very long time (for my GPU)

I test first sample command from github page

speed result (GPU GTX 1050 on laptop)
Code:
Found in 972 seconds
Total time 00:16:21s


Code:
bsgscudaHT2.exe -t 512 -b 68 -p 256 -pb 59A3BFDAD718C9D3FAC7C187F1139F0815AC5D923910D516E186AFDA28B221DC994327554CED887AAE5D211A2407CDD025CFC3779ECB9C9D7F2F1A1DDF3E9FF8 -pk 0x49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5e0000000000000000 -w 26
Number of GPU threads set to #512
Number of GPU blocks set to #68
Number of pparam set to #256
Pubkey set to 0x59a3bfdad718c9d3fac7c187f1139f0815ac5d923910d516e186afda28b221dc994327554ced887aae5d211a2407cdd025cfc3779ecb9c9d7f2f1a1ddf3e9ff8
Range begin: 0x49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5e0000000000000000
Items number set to #67108864
APP VERSION: 1.2
Found 1 Cuda device.
Cuda device:NVIDIA GeForce GTX 1050(4095Mb)
Device have: MP:5 Cores+320
Shared memory total:49152
Constant memory total:65536
---------------
GiantSUBvalue:0000000000000000000000000000000000000000000000000000000008000000
GiantSUBpubkey:(a94c6524bd40d2bbdac85c056236a79da78bc61fd5bdec9d2bf26bd84b2438e84adfe0266d069d7f0286de6afafe61c581a2c39f5f1c64d43d1d37230e799a3b)
*******************************
Total GPU Memory Need: 1584.000Mb
*******************************
Generate Giants Buffer: 8912896 items
Load BIN file:512_68_256_67108864_g2.BIN
[0] chunk:570425344b
Done in 00:00:00s
Load BIN file:79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798_67108864_b.BIN
[0] chunk:536870912b
Done in 00:00:00s
Verify baby array...ok
Load BIN file:79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798_67108864_s.BIN
[0] chunk:536870912b
Done in 00:00:00s
Verify sorted array...ok
Load BIN file:79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798_67108864_ht.BIN
[0] chunk:805306368b
Verify packed HT items...ok
Verify packed HT items sorting...ok
Total removed items: 0, freed memory: 1312.000 MB
GPU count #1
START RANGE= 49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5e0000000000000000
SUBpoint= (3c52f78892c8c2f5c51d7249951bbb1c302a8ed4d37561724e68e8d22db14a69, e0ba5063f64117bccd7fc6c1d5b97df4f0bdc5a6ba481f21e69da330ed9750ae)
FINDpubkey= (59a3bfdad718c9d3fac7c187f1139f0815ac5d923910d516e186afda28b221dc, 994327554ced887aae5d211a2407cdd025cfc3779ecb9c9d7f2f1a1ddf3e9ff8)
NewFINDpubkey= (de84b4334e87f1d1466f8c382c279ab7ac0e20d3510cec74abfd4b6b94fc7833, 9d2e496386ca9fafd5e806ddeba50e875b3a56fd1bde9711581957f5229d0663)
***************************
GPU #0 launched
GPU #0 TotalBuff: 1584.000Mb
GPU#0 Cnt:0000000000000000000000000000000000000000000000000000000000000001
GPU#0 Cnt:0000000000000000000000000000000000000000000000000066000000000001 99MKey/s x67108864 2^26.63 x2^27=2^53.63
GPU#0 Cnt:00000000000000000000000000000000000000000000000000cc000000000001 100MKey/s x67108864 2^26.65 x2^27=2^53.65


Result
Code:
GPU#0 Cnt:000000000000000000000000000000000000000000000000ba78000000000001 98MKey/s x67108864 2^26.62 x2^27=2^53.62
GPU#0 Cnt:000000000000000000000000000000000000000000000000bade000000000001 98MKey/s x67108864 2^26.62 x2^27=2^53.62
***********GPU#0************
Total solutions: 1
KEY!!>49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5ebb3ef3883c1866d4
Pub: 59a3bfdad718c9d3fac7c187f1139f0815ac5d923910d516e186afda28b221dc994327554ced887aae5d211a2407cdd025cfc3779ecb9c9d7f2f1a1ddf3e9ff8
****************************
Found in 972 seconds
GPU #0 finished
Total time 00:16:21s
cuda finished ok

Press Enter to exit
sr. member
Activity: 616
Merit: 312
October 11, 2021, 01:25:25 AM
#14
How many bytes of memory do you need to store one babystep? Hashtable uses GPU memory or global ram?
each baby step used 8 bytes memory. HT stored in GPU memory.
with -w 26 and -htsz 25(default), app generate 2^26 babysteps that stored in HT with size (2^25 + 2^26 )*8 bytes
full member
Activity: 1050
Merit: 219
Shooters Shoot...
October 10, 2021, 07:00:11 PM
#13
I ran the same test as Etar and JLP, with 16 pubkeys:

Code:
0459A3BFDAD718C9D3FAC7C187F1139F0815AC5D923910D516E186AFDA28B221DC994327554CED887AAE5D211A2407CDD025CFC3779ECB9C9D7F2F1A1DDF3E9FF8
04A50FBBB20757CC0E9C41C49DD9DF261646EE7936272F3F68C740C9DA50D42BCD3E48440249D6BC78BC928AA52B1921E9690EBA823CBC7F3AF54B3707E6A73F34
0404A49211C0FE07C9F7C94695996F8826E09545375A3CF9677F2D780A3EB70DE3BD05357CAF8340CB041B1D46C5BB6B88CD9859A083B0804EF63D498B29D31DD1
040B39E3F26AF294502A5BE708BB87AEDD9F895868011E60C1D2ABFCA202CD7A4D1D18283AF49556CF33E1EA71A16B2D0E31EE7179D88BE7F6AA0A7C5498E5D97F
04837A31977A73A630C436E680915934A58B8C76EB9B57A42C3C717689BE8C0493E46726DE04352832790FD1C99D9DDC2EE8A96E50CAD4DCC3AF1BFB82D51F2494
040ECDB6359D41D2FD37628C718DDA9BE30E65801A88A00C3C5BDF36E7EE6ADBBAD71A2A535FCB54D56913E7F37D8103BA33ED6441D019D0922AC363FCC792C29A
0422DD52FCFA3A4384F0AFF199D019E481D335923D8C00BADAD42FFFC80AF8FCF038F139D652842243FC841E7C5B3E477D901F88C5AB0B88EE13D80080E413F2ED
04DB4F1B249406B8BD662F78CBA46F5E90E20FE27FC69D0FBAA2F06E6E50E536695DF83B68FD0F396BB9BFCF6D4FE312F32A43CF3FA1FE0F81DF70C877593B64E0
043BD0330D7381917F8860F1949ACBCCFDC7863422EEE2B6DB7EDD551850196687528B6D2BC0AA7A5855D168B26C6BAF9DDCD04B585D42C7B9913F60421716D37A
04332A02CA42C481EAADB7ADB97DF89033B23EA291FDA809BEA3CE5C3B73B20C49C410D1AD42A9247EB8FF217935C9E28411A08B325FBF28CC2AF8182CE2B5CE38
04513981849DE1A1327DEF34B51F5011C5070603CA22E6D868263CB7C908525F0C19EBA6BD2A8DCF651E4342512EDEACB6EA22DA323A194E25C6A1614ABD259BC0
04D4E6FA664BD75A508C0FF0ED6F2C52DA2ADD7C3F954D9C346D24318DBD2ECFC6805511F46262E10A25F252FD525AF1CBCC46016B6CD0A7705037364309198DA1
0456B468963752924DBF56112633DC57F07C512E3671A16CD7375C58469164599D1E04011D3E9004466C814B144A9BCB7E47D5BACA1B90DA0C4752603781BF5873
04D5BE7C653773CEE06A238020E953CFCD0F22BE2D045C6E5B4388A3F11B4586CBB4B177DFFD111F6A15A453009B568E95798B0227B60D8BEAC98AF671F31B0E2B
04B1985389D8AB680DEDD67BBA7CA781D1A9E6E5974AAD2E70518125BAD5783EB5355F46E927A030DB14CF8D3940C1BED7FB80624B32B349AB5A05226AF15A2228
0455B95BEF84A6045A505D015EF15E136E0A31CC2AA00FA4BCA62E5DF215EE981B3B4D6BCE33718DC6CF59F28B550648D7E8B2796AC36F25FF0C01F8BC42A16FD9

Total time:
Code:
GPU #4 finished
GPU #1 finished
GPU #5 finished
GPU #2 finished
GPU #3 finished
Total time 00:15:11s
cuda finished ok

Press Enter to exit

For comparison, JLP with CPU only took 3 hours and 35 minutes.
full member
Activity: 1050
Merit: 219
Shooters Shoot...
October 10, 2021, 06:20:54 PM
#12
RTX 3070 = 1,000 MKey/s

Code:
KEY!!>49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5ebb3ef3883c1866d4
Pub: 59a3bfdad718c9d3fac7c187f1139f0815ac5d923910d516e186afda28b221dc994327554ced887aae5d211a2407cdd025cfc3779ecb9c9d7f2f1a1ddf3e9ff8
****************************
Found in 34 seconds
GPU #1 finished
GPU #3 finished
GPU #5 finished
GPU #2 finished
GPU #4 finished

Default settings. Have not tinkered with settings to see if GPUs can gain any speed.
a.a
member
Activity: 126
Merit: 36
October 10, 2021, 05:54:00 PM
#11
COBRAS, then how about you start testing and benchmarking? Or should others do that for you too?


I suppose that was sarcasm:D

Yeah something like sarcasm. COBRAS is a lazy lurker. And you can see, that his last post does not make any sense.
member
Activity: 107
Merit: 61
October 10, 2021, 05:11:25 PM
#10
How many bytes of memory do you need to store one babystep? Hashtable uses GPU memory or global ram?
jr. member
Activity: 38
Merit: 34
October 10, 2021, 04:04:18 PM
#9
Ok, and how fast it would be with interval
000000000....00000000
Ffffffffffff......fffffffffffff

?
sr. member
Activity: 616
Merit: 312
October 10, 2021, 03:12:18 PM
#8
with v1.2 and single 2080ti i solve example pubkeys in range:
start: 49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5e0000000000000000
end: 49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5effffffffffffffff
in 28minutes with params -w 26:
Here is pubkeys for searching:
Code:
0459A3BFDAD718C9D3FAC7C187F1139F0815AC5D923910D516E186AFDA28B221DC994327554CED887AAE5D211A2407CDD025CFC3779ECB9C9D7F2F1A1DDF3E9FF8
04A50FBBB20757CC0E9C41C49DD9DF261646EE7936272F3F68C740C9DA50D42BCD3E48440249D6BC78BC928AA52B1921E9690EBA823CBC7F3AF54B3707E6A73F34
0404A49211C0FE07C9F7C94695996F8826E09545375A3CF9677F2D780A3EB70DE3BD05357CAF8340CB041B1D46C5BB6B88CD9859A083B0804EF63D498B29D31DD1
040B39E3F26AF294502A5BE708BB87AEDD9F895868011E60C1D2ABFCA202CD7A4D1D18283AF49556CF33E1EA71A16B2D0E31EE7179D88BE7F6AA0A7C5498E5D97F
04837A31977A73A630C436E680915934A58B8C76EB9B57A42C3C717689BE8C0493E46726DE04352832790FD1C99D9DDC2EE8A96E50CAD4DCC3AF1BFB82D51F2494
040ECDB6359D41D2FD37628C718DDA9BE30E65801A88A00C3C5BDF36E7EE6ADBBAD71A2A535FCB54D56913E7F37D8103BA33ED6441D019D0922AC363FCC792C29A
0422DD52FCFA3A4384F0AFF199D019E481D335923D8C00BADAD42FFFC80AF8FCF038F139D652842243FC841E7C5B3E477D901F88C5AB0B88EE13D80080E413F2ED
04DB4F1B249406B8BD662F78CBA46F5E90E20FE27FC69D0FBAA2F06E6E50E536695DF83B68FD0F396BB9BFCF6D4FE312F32A43CF3FA1FE0F81DF70C877593B64E0
043BD0330D7381917F8860F1949ACBCCFDC7863422EEE2B6DB7EDD551850196687528B6D2BC0AA7A5855D168B26C6BAF9DDCD04B585D42C7B9913F60421716D37A
04332A02CA42C481EAADB7ADB97DF89033B23EA291FDA809BEA3CE5C3B73B20C49C410D1AD42A9247EB8FF217935C9E28411A08B325FBF28CC2AF8182CE2B5CE38
04513981849DE1A1327DEF34B51F5011C5070603CA22E6D868263CB7C908525F0C19EBA6BD2A8DCF651E4342512EDEACB6EA22DA323A194E25C6A1614ABD259BC0
04D4E6FA664BD75A508C0FF0ED6F2C52DA2ADD7C3F954D9C346D24318DBD2ECFC6805511F46262E10A25F252FD525AF1CBCC46016B6CD0A7705037364309198DA1
0456B468963752924DBF56112633DC57F07C512E3671A16CD7375C58469164599D1E04011D3E9004466C814B144A9BCB7E47D5BACA1B90DA0C4752603781BF5873
04D5BE7C653773CEE06A238020E953CFCD0F22BE2D045C6E5B4388A3F11B4586CBB4B177DFFD111F6A15A453009B568E95798B0227B60D8BEAC98AF671F31B0E2B
04B1985389D8AB680DEDD67BBA7CA781D1A9E6E5974AAD2E70518125BAD5783EB5355F46E927A030DB14CF8D3940C1BED7FB80624B32B349AB5A05226AF15A2228
0455B95BEF84A6045A505D015EF15E136E0A31CC2AA00FA4BCA62E5DF215EE981B3B4D6BCE33718DC6CF59F28B550648D7E8B2796AC36F25FF0C01F8BC42A16FD9
it is 6 times faster then original bsgs from JLP based on CPU.
a.a
member
Activity: 126
Merit: 36
October 10, 2021, 12:23:47 PM
#7
COBRAS, then how about you start testing and benchmarking? Or should others do that for you too?
member
Activity: 846
Merit: 22
$$P2P BTC BRUTE.JOIN NOW ! https://uclck.me/SQPJk
October 10, 2021, 08:12:12 AM
#6
It is my implementation of BigStepGiantStep algorithm for Nvidia card (Cuda and Windows x64 only)
https://github.com/Etayson/BSGS-cuda
Let me know of your speed results.

I tested your BSGS on GTX 1660s, the speed was significantly slower than JeanLucPons Kangaroo:
BSGS-cuda => 330 Mkey/s
Kangaroo 2.2 => 450 Mkey/s

Need real tests on how many time need for find exaple pprivkey, what code find faste.
newbie
Activity: 25
Merit: 14
October 10, 2021, 06:35:50 AM
#5
It is my implementation of BigStepGiantStep algorithm for Nvidia card (Cuda and Windows x64 only)
https://github.com/Etayson/BSGS-cuda
Let me know of your speed results.

I tested your BSGS on GTX 1660s, the speed was significantly slower than JeanLucPons Kangaroo:
BSGS-cuda => 330 Mkey/s
Kangaroo 2.2 => 450 Mkey/s
member
Activity: 846
Merit: 22
$$P2P BTC BRUTE.JOIN NOW ! https://uclck.me/SQPJk
October 10, 2021, 01:12:57 AM
#4
It is my implementation of BigStepGiantStep algorithm for Nvidia card (Cuda and Windows x64 only)
https://github.com/Etayson/BSGS-cuda
Let me know of your speed results.


Great. I thin your project will be more usable then JLP cangaro.

Tuning JLP kangaroo is a real big shit !!!!
full member
Activity: 1050
Merit: 219
Shooters Shoot...
October 09, 2021, 04:40:45 PM
#3
awesome. will let you know speed on various GPUs once I run it
jr. member
Activity: 38
Merit: 34
October 09, 2021, 03:15:17 PM
#2
It seems that you know a bit bsgs algo and x86 assembler....

So I would like to ask you one question. I already modified Jean's bsgs for curve "r1" (btc uses k1)

What is the meaning of start value? Jean even need start and stop values for k1 and k2.

Does the searched k must lie in this interval?
sr. member
Activity: 616
Merit: 312
October 09, 2021, 01:46:55 PM
#1
It is my implementation of BigStepGiantStep algorithm for Nvidia card (Cuda and Windows x64 only)
https://github.com/Etayson/BSGS-cuda
Let me know of your speed results.
Pages:
Jump to: