Pages:
Author

Topic: BSGS solver for cuda - page 5. (Read 3975 times)

legendary
Activity: 1948
Merit: 2097
October 24, 2021, 11:32:04 AM
Do you use the negation map to speed up the algorithm ?  --> pag. 8-9  https://eprint.iacr.org/2015/605.pdf

You need to compute only:

sqrt(n) / 2  baby steps : for example, n = 2^60  -> 2^29 baby steps

sqrt(n)  giant steps : for example, n = 2^60  -> 2^30 giant steps

It is like to shift the public key of 2^59 steps, and search in the interval [1,..., 2^59] instead of [1,...., 2^60] exploiting the symmetrie.

Besides, in order to compute a batch of k steps, you need to calculate only k/2 (instead of k) elements x^-1 mod p , at the cost of 1 inversions and 3*(k/2 - 1) multiplications mod p.
sr. member
Activity: 652
Merit: 316
October 24, 2021, 02:55:06 AM

just a  quick question, do you have any plan to enhanced it for 120bit or more to perform better than JLK?


Bsgs will never faster then kangaroo on big ranges.
Example puzzle #80, width 79bit
Kanagaroo need in average 2^40.5 op to solve key
2080ti can serach DPs with speed 1500Mkey/s = 2^30.48
So you need in average 2^10 second to find key.

Bsgscuda can search 2^61/s, so 2^79 / 2^61 = 2^18 second to check whole 79bit range. Ofcourse key can be close to the begining and you can found very fast but it is not 100%.
we can devide pub to 64 pubkey and try to find one of them in range 1..3ffffffffffffffffff fortunately, the key we need is the first one on the list but it was just lucky.
by the way privkey will be 3a869719b73046d6b46 = 2^73.87
so we need  2^73.87 / 2^61/s = 2^12.87 seconds to find key. It is 7 more timer then need for kangaroo
jr. member
Activity: 81
Merit: 2
October 23, 2021, 10:30:22 PM
Hi,
Please , can you write a little tutorial on usage?
did you look at readme.md on github?
first of all set -t parametr as 256 or 512
next set -b equil to SM count of your card
next set -p start from 128
then set -w as max as possible to your gpu memory, check -htsz params
-w 31 -htsz 29 need around 64GB of RAM to generate all arrays
-w 30 -htsz 28 need around 32GB of RAM to generate all arrays
-w 29 -htsz 28
-w 28 -htsz 27
-w 27 -htsz 25
if you will have free gpu memory you can increase -p or -b or -t  (all params multiple of 2)


some great work from your side  , appreciate

just a  quick question, do you have any plan to enhanced it for 120bit or more to perform better than JLK?
sr. member
Activity: 652
Merit: 316
October 23, 2021, 02:16:04 PM
Hi,
Please , can you write a little tutorial on usage?
did you look at readme.md on github?
first of all set -t parametr as 256 or 512
next set -b equil to SM count of your card
next set -p start from 128
then set -w as max as possible to your gpu memory, check -htsz params
-w 31 -htsz 29 need around 64GB of RAM to generate all arrays
-w 30 -htsz 28 need around 32GB of RAM to generate all arrays
-w 29 -htsz 28
-w 28 -htsz 27
-w 27 -htsz 25
if you will have free gpu memory you can increase -p or -b or -t  (all params multiple of 2)
member
Activity: 73
Merit: 19
October 23, 2021, 02:13:47 PM
Code:
KEY[15]: 0x49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5e2452dd26bc983cd5
    Pub: 02b1985389d8ab680dedd67bba7ca781d1a9e6e5974aad2e70518125bad5783eb5
****************************
Found in 1 seconds
GPU#0 job finished
Working time 00:00:55s
FINDpubkey= (55b95bef84a6045a505d015ef15e136e0a31cc2aa00fa4bca62e5df215ee981b, 3b4d6bce33718dc6cf59f28b550648d7e8b2796ac36f25ff0c01f8bc42a16fd9)
GPU#0 Cnt:0000000000000000000000000000000000000000000000000000000000000001
GPU#0 Cnt:00000000000000000000000000000000000000000000000045ba000000000001 1110MKey/s x1073741824 2^30.12 x2^31=2^61.12
***********GPU#0************
KEY[16]: 0x49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5e7ad38337c7f173c7
    Pub: 0355b95bef84a6045a505d015ef15e136e0a31cc2aa00fa4bca62e5df215ee981b
****************************
Found in 4 seconds
GPU#0 job finished
Working time 00:00:59s
double giant step defeated.
v1.7.1 released with maximum perfomance.

Hi,
Please , can you write a little tutorial on usage?
sr. member
Activity: 652
Merit: 316
October 23, 2021, 12:13:51 PM
Code:
KEY[15]: 0x49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5e2452dd26bc983cd5
    Pub: 02b1985389d8ab680dedd67bba7ca781d1a9e6e5974aad2e70518125bad5783eb5
****************************
Found in 1 seconds
GPU#0 job finished
Working time 00:00:55s
FINDpubkey= (55b95bef84a6045a505d015ef15e136e0a31cc2aa00fa4bca62e5df215ee981b, 3b4d6bce33718dc6cf59f28b550648d7e8b2796ac36f25ff0c01f8bc42a16fd9)
GPU#0 Cnt:0000000000000000000000000000000000000000000000000000000000000001
GPU#0 Cnt:00000000000000000000000000000000000000000000000045ba000000000001 1110MKey/s x1073741824 2^30.12 x2^31=2^61.12
***********GPU#0************
KEY[16]: 0x49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5e7ad38337c7f173c7
    Pub: 0355b95bef84a6045a505d015ef15e136e0a31cc2aa00fa4bca62e5df215ee981b
****************************
Found in 4 seconds
GPU#0 job finished
Working time 00:00:59s
double giant step defeated.
v1.7.1 released with maximum perfomance.
member
Activity: 170
Merit: 58
October 23, 2021, 10:17:08 AM

LOL  Cheesy

That's what happens when you use  ad hoc written code, without proper testing. I guess you still did not prepare any set of unit tests to proof your code works?
Good luck for the future releases, maybe somewhere around version 20 it will be stable  Grin

Most of code have bugs. Are you a great programmer who does everything without mistakes?
I found this bug and solved it, what's your problem?

Relax, haters gonna hate  Grin

Anyway, good job, I appreciate your work. Sh*t happens.
sr. member
Activity: 652
Merit: 316
October 23, 2021, 02:02:51 AM
What the kind of problem was?
I think you exploited symmetry to double size of giant steps?
Why it did not find some keys?

if we talk about doubled GS (Giant Step)
For ex, option -p 8 -w 4 mean baby array 2^4 =16
each giant step (doubled) is 16*2=32
let say we should find pubkey with privkey=32
program substruct GS from public key and look to a Baby array to check overlap.
but if you substruct 32-32 then you get 64 and this value is not present in the baby array.
But if we used the usual GS
32-16=16 and 16 is present in baby array - pubkey solved.

So with doubled GS not finded every (baby array size)*2 keys.
sr. member
Activity: 652
Merit: 316
October 23, 2021, 02:01:46 AM

LOL  Cheesy

That's what happens when you use  ad hoc written code, without proper testing. I guess you still did not prepare any set of unit tests to proof your code works?
Good luck for the future releases, maybe somewhere around version 20 it will be stable  Grin

Most of code have bugs. Are you a great programmer who does everything without mistakes?
I found this bug and solved it, what's your problem?
member
Activity: 170
Merit: 58
October 23, 2021, 01:53:32 AM
STOP using BSGScuda, i found a bug that not all public keys found. I can`t say now from which version this bug apear, so don`t use programm while i am do not solve issue.

LOL  Cheesy

That's what happens when you use  ad hoc written code, without proper testing. I guess you still did not prepare any set of unit tests to proof your code works?
Good luck for the future releases, maybe somewhere around version 20 it will be stable  Grin
member
Activity: 110
Merit: 61
October 22, 2021, 09:09:15 AM
The problem was a double giant step.
Now I have removed the double giant step and in my opinion everything works as it should.
I run several tests with different small -w -p options with 1024 pubkeys file and all keys are founded.
True, now the total indicator is 2 times less, due to the fact that the step is normal.
You can run all sorts of tests with keys and check. If there are any bugs, let me know.
release 1.7.0 available on github.
What the kind of problem was?
I think you exploited symmetry to double size of giant steps?
Why it did not find some keys?
sr. member
Activity: 652
Merit: 316
October 22, 2021, 07:55:03 AM
The problem was a double giant step.
Now I have removed the double giant step and in my opinion everything works as it should.
I run several tests with different small -w -p options with 1024 pubkeys file and all keys are founded.
True, now the total indicator is 2 times less, due to the fact that the step is normal.
You can run all sorts of tests with keys and check. If there are any bugs, let me know.
release 1.7.0 available on github.
jr. member
Activity: 40
Merit: 7
October 21, 2021, 08:58:10 AM
STOP using BSGScuda, i found a bug that not all public keys found. I can`t say now from which version this bug apear, so don`t use programm while i am do not solve issue.

oh when can we see next update Sad
sr. member
Activity: 652
Merit: 316
October 21, 2021, 07:47:06 AM
STOP using BSGScuda, i found a bug that not all public keys found. I can`t say now from which version this bug apear, so don`t use programm while i am do not solve issue.
sr. member
Activity: 652
Merit: 316
October 21, 2021, 06:46:14 AM
Hi Etar thanks for your continuing support for this program.
Quick question the fastest I get is 2^60 if I try to get 2^61 it sticks on add baby points to hashtable? I’ve got a 3080 16gb ram and 500gb ssd any ideas on settings to try? or how long should I wait for it to load?
Thanks Relic
Screen what i post in post above it is the latest verion and not yet published(tested).
By the way v1.6.0 shoud works fine for you but in little less perfomance, at v1.6.0 2080ti speed 826MKey/s x1073741824 2^29.69 x2^31=2^60.69
If you have 16gb gpu ram then try -w 31 and -htsz 29
In any case 3080 shoud have better perfomance then 2080ti even with the same size of baby array that i use, try set -t 512 -b 136 -p 512 -w 30 -htsz 28

P.s. Maybe you stick on add baby points to hashtable because have little memory on PC to generate HT in RAM. I generate HT -w 30 on PC that have 32GB of ram.
For -w 31 you need 64gb of ram to creat all arrays.
To launch solver you will need less more memory with already generated arrays.
jr. member
Activity: 32
Merit: 1
October 21, 2021, 05:22:37 AM
Hi Etar thanks for your continuing support for this program.
Quick question the fastest I get is 2^60 if I try to get 2^61 it sticks on add baby points to hashtable? I’ve got a 3080 16gb ram and 500gb ssd any ideas on settings to try? or how long should I wait for it to load?
Thanks Relic
sr. member
Activity: 652
Merit: 316
October 21, 2021, 05:11:16 AM
impressive Etar . i have question . lets say if you have 1m keys in file and you load in bsgscuda and set scan range only to 64, now my question is if gpu finished whole 64 range scan for key1 than gpu will abandoned  search of key1 and move to key2?

your program is doing that or you will impalement this. right?
Use -pk to set start range and -pke to set endrange. if pubkey will not find in this range then seraching will be switched to next pubkey.
jr. member
Activity: 40
Merit: 7
October 21, 2021, 03:58:37 AM
With last ptx optimisation (forgot about simmetry in batch point addition)
solve 16 pubkeys from JLP in 58s
Code:
...
GPU#0 Cnt:0000000000000000000000000000000000000000000000000000000000000001
GPU#0 Cnt:0000000000000000000000000000000000000000000000004673f00000000001 1121MKey/s x1073741824 2^30.13 x2^31=2^61.13
***********GPU#0************
KEY!!>49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5e7ad38337c7f173c7
Pub: 55b95bef84a6045a505d015ef15e136e0a31cc2aa00fa4bca62e5df215ee981b3b4d6bce33718dc6cf59f28b550648d7e8b2796ac36f25ff0c01f8bc42a16fd9
****************************
Found in 4 seconds
GPU#0 job finished
Working time 00:00:58s
Total time 00:06:33s
GPU#0 thread finished
cuda finished ok

Press Enter to exit
Seems like it is the maximum that I can achieve in single 2080ti.
Ofcourse JLP would probably have done it even faster Smiley



impressive Etar . i have question . lets say if you have 1m keys in file and you load in bsgscuda and set scan range only to 64, now my question is if gpu finished whole 64 range scan for key1 than gpu will abandoned  search of key1 and move to key2?

your program is doing that or you will impalement this. right?
sr. member
Activity: 652
Merit: 316
October 20, 2021, 10:54:04 AM
With last ptx optimisation (forgot about simmetry in batch point addition)
solve 16 pubkeys from JLP in 58s
Code:
...
GPU#0 Cnt:0000000000000000000000000000000000000000000000000000000000000001
GPU#0 Cnt:0000000000000000000000000000000000000000000000004673f00000000001 1121MKey/s x1073741824 2^30.13 x2^31=2^61.13
***********GPU#0************
KEY!!>49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5e7ad38337c7f173c7
Pub: 55b95bef84a6045a505d015ef15e136e0a31cc2aa00fa4bca62e5df215ee981b3b4d6bce33718dc6cf59f28b550648d7e8b2796ac36f25ff0c01f8bc42a16fd9
****************************
Found in 4 seconds
GPU#0 job finished
Working time 00:00:58s
Total time 00:06:33s
GPU#0 thread finished
cuda finished ok

Press Enter to exit
Seems like it is the maximum that I can achieve in single 2080ti.
Ofcourse JLP would probably have done it even faster Smiley
jr. member
Activity: 81
Merit: 2
October 20, 2021, 08:46:08 AM
I'm testing with the divisor keys on a smaller range, but its not solving the key with keyhunt. does it work the same with xpoint mode?

you need to adjust K and N as smaller range will be not solved if power of K and N is more than range count or if number of keys will be more or less than power of your hardware.

remember tweak is seriously needed while keeping K and N according to your hardware power as well as adjust K and N according to number of keys you will load in software ~ do the test again and again and again
Pages:
Jump to: