BSGS solver for cuda - page 5. | Bitcointalksearch.org

arulbero

legendary

Activity: 1968

Merit: 2130

Do you use the negation map to speed up the algorithm ? --> pag. 8-9 https://eprint.iacr.org/2015/605.pdf

You need to compute only:

sqrt(n) / 2 baby steps : for example, n = 2^60 -> 2^29 baby steps

sqrt(n) giant steps : for example, n = 2^60 -> 2^30 giant steps

It is like to shift the public key of 2^59 steps, and search in the interval [1,..., 2^59] instead of [1,...., 2^60] exploiting the symmetrie.

Besides, in order to compute a batch of k steps, you need to calculate only k/2 (instead of k) elements x^-1 mod p , at the cost of 1 inversions and 3*(k/2 - 1) multiplications mod p.

Etar

sr. member

Activity: 654

Merit: 316

Quote from: ssxb on October 23, 2021, 10:30:22 PM

just a quick question, do you have any plan to enhanced it for 120bit or more to perform better than JLK?

Bsgs will never faster then kangaroo on big ranges.
Example puzzle #80, width 79bit
Kanagaroo need in average 2^40.5 op to solve key
2080ti can serach DPs with speed 1500Mkey/s = 2^30.48
So you need in average 2^10 second to find key.

Bsgscuda can search 2^61/s, so 2^79 / 2^61 = 2^18 second to check whole 79bit range. Ofcourse key can be close to the begining and you can found very fast but it is not 100%.
we can devide pub to 64 pubkey and try to find one of them in range 1..3ffffffffffffffffff fortunately, the key we need is the first one on the list but it was just lucky.
by the way privkey will be 3a869719b73046d6b46 = 2^73.87
so we need 2^73.87 / 2^61/s = 2^12.87 seconds to find key. It is 7 more timer then need for kangaroo

ssxb

jr. member

Activity: 81

Merit: 2

Quote from: Etar on October 23, 2021, 02:16:04 PM

Quote from: mamuu on October 23, 2021, 02:13:47 PM

Hi,
Please , can you write a little tutorial on usage?

did you look at readme.md on github?
first of all set -t parametr as 256 or 512
next set -b equil to SM count of your card
next set -p start from 128
then set -w as max as possible to your gpu memory, check -htsz params
-w 31 -htsz 29 need around 64GB of RAM to generate all arrays
-w 30 -htsz 28 need around 32GB of RAM to generate all arrays
-w 29 -htsz 28
-w 28 -htsz 27
-w 27 -htsz 25
if you will have free gpu memory you can increase -p or -b or -t (all params multiple of 2)

some great work from your side , appreciate

just a quick question, do you have any plan to enhanced it for 120bit or more to perform better than JLK?

Etar

sr. member

Activity: 654

Merit: 316

Quote from: mamuu on October 23, 2021, 02:13:47 PM

Hi,
Please , can you write a little tutorial on usage?

did you look at readme.md on github?
first of all set -t parametr as 256 or 512
next set -b equil to SM count of your card
next set -p start from 128
then set -w as max as possible to your gpu memory, check -htsz params
-w 31 -htsz 29 need around 64GB of RAM to generate all arrays
-w 30 -htsz 28 need around 32GB of RAM to generate all arrays
-w 29 -htsz 28
-w 28 -htsz 27
-w 27 -htsz 25
if you will have free gpu memory you can increase -p or -b or -t (all params multiple of 2)

mamuu

member

Activity: 73

Merit: 19

Quote from: Etar on October 23, 2021, 12:13:51 PM

Code:

KEY[15]: 0x49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5e2452dd26bc983cd5
Pub: 02b1985389d8ab680dedd67bba7ca781d1a9e6e5974aad2e70518125bad5783eb5
****************************
Found in 1 seconds
GPU#0 job finished
Working time 00:00:55s
FINDpubkey= (55b95bef84a6045a505d015ef15e136e0a31cc2aa00fa4bca62e5df215ee981b, 3b4d6bce33718dc6cf59f28b550648d7e8b2796ac36f25ff0c01f8bc42a16fd9)
GPU#0 Cnt:0000000000000000000000000000000000000000000000000000000000000001
GPU#0 Cnt:00000000000000000000000000000000000000000000000045ba000000000001 1110MKey/s x1073741824 2^30.12 x2^31=2^61.12
***********GPU#0************
KEY[16]: 0x49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5e7ad38337c7f173c7
Pub: 0355b95bef84a6045a505d015ef15e136e0a31cc2aa00fa4bca62e5df215ee981b
****************************
Found in 4 seconds
GPU#0 job finished
Working time 00:00:59s

double giant step defeated.
v1.7.1 released with maximum perfomance.

Hi,
Please , can you write a little tutorial on usage?

Etar

sr. member

Activity: 654

Merit: 316

Code:

KEY[15]: 0x49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5e2452dd26bc983cd5
Pub: 02b1985389d8ab680dedd67bba7ca781d1a9e6e5974aad2e70518125bad5783eb5
****************************
Found in 1 seconds
GPU#0 job finished
Working time 00:00:55s
FINDpubkey= (55b95bef84a6045a505d015ef15e136e0a31cc2aa00fa4bca62e5df215ee981b, 3b4d6bce33718dc6cf59f28b550648d7e8b2796ac36f25ff0c01f8bc42a16fd9)
GPU#0 Cnt:0000000000000000000000000000000000000000000000000000000000000001
GPU#0 Cnt:00000000000000000000000000000000000000000000000045ba000000000001 1110MKey/s x1073741824 2^30.12 x2^31=2^61.12
***********GPU#0************
KEY[16]: 0x49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5e7ad38337c7f173c7
Pub: 0355b95bef84a6045a505d015ef15e136e0a31cc2aa00fa4bca62e5df215ee981b
****************************
Found in 4 seconds
GPU#0 job finished
Working time 00:00:59s

double giant step defeated.
v1.7.1 released with maximum perfomance.

math09183

member

Activity: 170

Merit: 58

Quote from: Etar on October 23, 2021, 02:01:46 AM

Quote from: math09183 on October 23, 2021, 01:53:32 AM

LOL

That's what happens when you use ad hoc written code, without proper testing. I guess you still did not prepare any set of unit tests to proof your code works?
Good luck for the future releases, maybe somewhere around version 20 it will be stable Grin

Most of code have bugs. Are you a great programmer who does everything without mistakes?
I found this bug and solved it, what's your problem?

Relax, haters gonna hate Grin

Anyway, good job, I appreciate your work. Sh*t happens.

Etar

sr. member

Activity: 654

Merit: 316

Quote from: _Counselor on October 22, 2021, 09:09:15 AM

What the kind of problem was?
I think you exploited symmetry to double size of giant steps?
Why it did not find some keys?

if we talk about doubled GS (Giant Step)
For ex, option -p 8 -w 4 mean baby array 2^4 =16
each giant step (doubled) is 16*2=32
let say we should find pubkey with privkey=32
program substruct GS from public key and look to a Baby array to check overlap.
but if you substruct 32-32 then you get 64 and this value is not present in the baby array.
But if we used the usual GS
32-16=16 and 16 is present in baby array - pubkey solved.

So with doubled GS not finded every (baby array size)*2 keys.

Etar

sr. member

Activity: 654

Merit: 316

Quote from: math09183 on October 23, 2021, 01:53:32 AM

LOL

That's what happens when you use ad hoc written code, without proper testing. I guess you still did not prepare any set of unit tests to proof your code works?
Good luck for the future releases, maybe somewhere around version 20 it will be stable Grin

Most of code have bugs. Are you a great programmer who does everything without mistakes?
I found this bug and solved it, what's your problem?

math09183

member

Activity: 170

Merit: 58

Quote from: Etar on October 21, 2021, 07:47:06 AM

STOP using BSGScuda, i found a bug that not all public keys found. I can`t say now from which version this bug apear, so don`t use programm while i am do not solve issue.

LOL

That's what happens when you use ad hoc written code, without proper testing. I guess you still did not prepare any set of unit tests to proof your code works?
Good luck for the future releases, maybe somewhere around version 20 it will be stable Grin

_Counselor

member

Activity: 111

Merit: 61

Quote from: Etar on October 22, 2021, 07:55:03 AM

The problem was a double giant step.
Now I have removed the double giant step and in my opinion everything works as it should.
I run several tests with different small -w -p options with 1024 pubkeys file and all keys are founded.
True, now the total indicator is 2 times less, due to the fact that the step is normal.
You can run all sorts of tests with keys and check. If there are any bugs, let me know.
release 1.7.0 available on github.

What the kind of problem was?
I think you exploited symmetry to double size of giant steps?
Why it did not find some keys?

Etar

sr. member

Activity: 654

Merit: 316

The problem was a double giant step.
Now I have removed the double giant step and in my opinion everything works as it should.
I run several tests with different small -w -p options with 1024 pubkeys file and all keys are founded.
True, now the total indicator is 2 times less, due to the fact that the step is normal.
You can run all sorts of tests with keys and check. If there are any bugs, let me know.
release 1.7.0 available on github.

studyroom1

jr. member

Activity: 40

Merit: 7

Quote from: Etar on October 21, 2021, 07:47:06 AM

STOP using BSGScuda, i found a bug that not all public keys found. I can`t say now from which version this bug apear, so don`t use programm while i am do not solve issue.

oh when can we see next update Sad

Etar

sr. member

Activity: 654

Merit: 316

STOP using BSGScuda, i found a bug that not all public keys found. I can`t say now from which version this bug apear, so don`t use programm while i am do not solve issue.

Etar

sr. member

Activity: 654

Merit: 316

Quote from: lostrelic on October 21, 2021, 05:22:37 AM

Hi Etar thanks for your continuing support for this program.
Quick question the fastest I get is 2^60 if I try to get 2^61 it sticks on add baby points to hashtable? I’ve got a 3080 16gb ram and 500gb ssd any ideas on settings to try? or how long should I wait for it to load?
Thanks Relic

Screen what i post in post above it is the latest verion and not yet published(tested).
By the way v1.6.0 shoud works fine for you but in little less perfomance, at v1.6.0 2080ti speed 826MKey/s x1073741824 2^29.69 x2^31=2^60.69
If you have 16gb gpu ram then try -w 31 and -htsz 29
In any case 3080 shoud have better perfomance then 2080ti even with the same size of baby array that i use, try set -t 512 -b 136 -p 512 -w 30 -htsz 28

P.s. Maybe you stick on add baby points to hashtable because have little memory on PC to generate HT in RAM. I generate HT -w 30 on PC that have 32GB of ram.
For -w 31 you need 64gb of ram to creat all arrays.
To launch solver you will need less more memory with already generated arrays.

lostrelic

jr. member

Activity: 32

Merit: 1

Hi Etar thanks for your continuing support for this program.
Quick question the fastest I get is 2^60 if I try to get 2^61 it sticks on add baby points to hashtable? I’ve got a 3080 16gb ram and 500gb ssd any ideas on settings to try? or how long should I wait for it to load?
Thanks Relic

Etar

sr. member

Activity: 654

Merit: 316

Quote from: studyroom1 on October 21, 2021, 03:58:37 AM

impressive Etar . i have question . lets say if you have 1m keys in file and you load in bsgscuda and set scan range only to 64, now my question is if gpu finished whole 64 range scan for key1 than gpu will abandoned search of key1 and move to key2?

your program is doing that or you will impalement this. right?

Use -pk to set start range and -pke to set endrange. if pubkey will not find in this range then seraching will be switched to next pubkey.

studyroom1

jr. member

Activity: 40

Merit: 7

Quote from: Etar on October 20, 2021, 10:54:04 AM

With last ptx optimisation (forgot about simmetry in batch point addition)
solve 16 pubkeys from JLP in 58s

Code:

...
GPU#0 Cnt:0000000000000000000000000000000000000000000000000000000000000001
GPU#0 Cnt:0000000000000000000000000000000000000000000000004673f00000000001 1121MKey/s x1073741824 2^30.13 x2^31=2^61.13
***********GPU#0************
KEY!!>49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5e7ad38337c7f173c7
Pub: 55b95bef84a6045a505d015ef15e136e0a31cc2aa00fa4bca62e5df215ee981b3b4d6bce33718dc6cf59f28b550648d7e8b2796ac36f25ff0c01f8bc42a16fd9
****************************
Found in 4 seconds
GPU#0 job finished
Working time 00:00:58s
Total time 00:06:33s
GPU#0 thread finished
cuda finished ok

Press Enter to exit

Seems like it is the maximum that I can achieve in single 2080ti.
Ofcourse JLP would probably have done it even faster

impressive Etar . i have question . lets say if you have 1m keys in file and you load in bsgscuda and set scan range only to 64, now my question is if gpu finished whole 64 range scan for key1 than gpu will abandoned search of key1 and move to key2?

your program is doing that or you will impalement this. right?

Etar

sr. member

Activity: 654

Merit: 316

With last ptx optimisation (forgot about simmetry in batch point addition)
solve 16 pubkeys from JLP in 58s

Code:

...
GPU#0 Cnt:0000000000000000000000000000000000000000000000000000000000000001
GPU#0 Cnt:0000000000000000000000000000000000000000000000004673f00000000001 1121MKey/s x1073741824 2^30.13 x2^31=2^61.13
***********GPU#0************
KEY!!>49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5e7ad38337c7f173c7
Pub: 55b95bef84a6045a505d015ef15e136e0a31cc2aa00fa4bca62e5df215ee981b3b4d6bce33718dc6cf59f28b550648d7e8b2796ac36f25ff0c01f8bc42a16fd9
****************************
Found in 4 seconds
GPU#0 job finished
Working time 00:00:58s
Total time 00:06:33s
GPU#0 thread finished
cuda finished ok

Press Enter to exit

Seems like it is the maximum that I can achieve in single 2080ti.
Ofcourse JLP would probably have done it even faster

ssxb

jr. member

Activity: 81

Merit: 2

Quote from: bigvito19 on October 20, 2021, 08:28:20 AM

I'm testing with the divisor keys on a smaller range, but its not solving the key with keyhunt. does it work the same with xpoint mode?

you need to adjust K and N as smaller range will be not solved if power of K and N is more than range count or if number of keys will be more or less than power of your hardware.

remember tweak is seriously needed while keeping K and N according to your hardware power as well as adjust K and N according to number of keys you will load in software ~ do the test again and again and again

Topic: BSGS solver for cuda - page 5. (Read 3980 times)