Pages:
Author

Topic: BSGS solver for cuda - page 4. (Read 3975 times)

sr. member
Activity: 652
Merit: 316
November 05, 2021, 02:10:45 PM
3 * RTX 3090 (GPU Ram only use 5 GB ) ...but RTX3090 have 24 GB  DDR6
-snip-
Read what i write above and readme.md file on github.
You need increase -w parameter and set -htsz dependency of -w

The main task is to fill the GPU memory as much as possible with the help of the -w parameter(and -htsz)
And then fill free GPU  memory with the -p parameter -b (not more then x2 of SM) and -t (not more then 512)

Presettings say that for your RTX 3090 good config is:
-t 512 -b 328 -p 530 -w 31 -htsz 29
you fill 20436.750 MB from free 20450.000 but you need around 58GB of host RAM to generate all arrays.
with saved arrays you need much less memory to launch app.
In this case you perfomance will be around 2^62 per card.
sr. member
Activity: 652
Merit: 316
November 05, 2021, 01:48:44 PM
Hi,

Thanks for the great program Etar. I was wondering if there is a way to save work? or if BSGS does not work like that? Or with the hash table, does it continue from where it left off already?

Thanks
released v1.7.2
Current state is saved to file currentwork.txt(file name can`t be changed) every 180s (by default) but you can change this parametr with -wt
If app crash or you stop app, you can start working from the last saved state. if the launch configuration has not been changed.
set parametr -wl in your bat file with file name of state and app will start from this state.
Also added presettings for each card(just showing) but you can try to use this presetings to fill full your GPU memory.
jr. member
Activity: 77
Merit: 7
November 01, 2021, 05:09:28 PM
Hi,

Thanks for the great program Etar. I was wondering if there is a way to save work? or if BSGS does not work like that? Or with the hash table, does it continue from where it left off already?

Thanks
jr. member
Activity: 82
Merit: 8
November 01, 2021, 02:23:26 PM
3 * RTX 3090 (GPU Ram only use 5 GB ) ...but RTX3090 have 24 GB  DDR6
#70  My Computer took 53 seconds  to solve .....


Code:

D:\BTC\cuda_BSGS>bsgscudaHT_1_7_1.exe -d 0,1,2 -t 512 -b 68 -p 256 -pb 90e6900a58d33393bc1097b5aed31f2e4e7cbd3e5466af958665bc0121248483d7319f127105f492fd15e009b103b4a83295722f28f07c95f9a5443ef8e77ce0 -pk 0x0000000000000000000000000000000000000000000000200000000000000000 -pke   0x0000000000000000000000000000000000000000000000400000000000000000 -w 29 -htsz 28
Used GPU devices #0,1,2
Number of GPU threads set to #512
Number of GPU blocks set to #68
Number of pparam set to #256
Pubkey set to 90e6900a58d33393bc1097b5aed31f2e4e7cbd3e5466af958665bc0121248483d7319f127105f492fd15e009b103b4a83295722f28f07c95f9a5443ef8e77ce0
Range begin: 0x0000000000000000000000000000000000000000000000200000000000000000
Range end: 0x0000000000000000000000000000000000000000000000400000000000000000
Items number set to 2^29
HT size number set to 2^28
APP VERSION: 1.7.1
Found 4 Cuda device.
Cuda device:NVIDIA GeForce RTX 3090(24575Mb)
Device have: MP:82 Cores+10496
Shared memory total:49152
Constant memory total:65536
---------------
Cuda device:NVIDIA GeForce RTX 3090(24575Mb)
Device have: MP:82 Cores+10496
Shared memory total:49152
Constant memory total:65536
---------------
Cuda device:NVIDIA GeForce RTX 3090(24575Mb)
Device have: MP:82 Cores+10496
Shared memory total:49152
Constant memory total:65536
---------------
Cuda device:NVIDIA GeForce GT 1030(2047Mb)
Device have: MP:3 Cores+192
Shared memory total:49152
Constant memory total:65536
---------------
GiantSUBvalue:0000000000000000000000000000000000000000000000000000000040000000
GiantSUBpubkey: 02e1efb9cd05adc63bcce10831d9538c479cf1d05fefdd08b2448d70422ede454c
*******************************
Total GPU Memory Need: 4912.000Mb
*******************************
Both HT files exist
Load BIN file:79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798_536870912_268435456_htGPU.BIN
[0] chunk:1073741824b
[1] chunk:1073741824b
[2] chunk:1073741824b
[3] chunk:1073741824b
Generate Giants Buffer: 8912896 items
Load BIN file:512_68_256_536870912_g2.BIN
[0] chunk:570425344b
Done in 00:00:00s
GPU count #3
GPU #0 launched
GPU #1 launched
GPU #2 launched
GPU #0 Free memory: 20450Mb
GPU #0 Total memory: 24575Mb
GPU #0 TotalBuff: 4912.000Mb
GPU #1 Free memory: 20450Mb
GPU #1 Total memory: 24575Mb
GPU #1 TotalBuff: 4912.000Mb
GPU #2 Free memory: 20450Mb
GPU #2 Total memory: 24575Mb
GPU #2 TotalBuff: 4912.000Mb
Load BIN file:79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798_536870912_268435456_htCPU.BIN
[0] chunk:1073741824b
[1] chunk:1073741824b
[2] chunk:1073741824b
[3] chunk:1073741824b
[4] chunk:1073741824b
[5] chunk:1073741824b
START RANGE= 0000000000000000000000000000000000000000000000200000000000000000
  END RANGE= 0000000000000000000000000000000000000000000000400000000000000000
WIDTH RANGE= 0000000000000000000000000000000000000000000000200000000000000000
SUBpoint= (534ccf6b740f9ec036c1861215c8a61f3b89ea46df2e6d96998b90bc1f17fc25, 2a8ea34f6374d224b9d51c22cd2abcaaf51c2d884022d72228e3809030178db9)
FINDpubkey: 0290e6900a58d33393bc1097b5aed31f2e4e7cbd3e5466af958665bc0121248483
GPU#2 Cnt:0000000000000000000000000000000000000000000000000088000000000001
GPU#0 Cnt:0000000000000000000000000000000000000000000000000000000000000001
GPU#1 Cnt:0000000000000000000000000000000000000000000000000044000000000001
GPU#1 Cnt:000000000000000000000000000000000000000000000000aacc000000000001 1818MKey/s x536870912 2^30.83 x2^30=2^60.83
GPU#0 Cnt:000000000000000000000000000000000000000000000000b5b0000000000001 1936MKey/s x536870912 2^30.92 x2^30=2^60.92
GPU#2 Cnt:000000000000000000000000000000000000000000000000bbcc000000000001 1990MKey/s x536870912 2^30.96 x2^30=2^60.96
GPU#1 Cnt:0000000000000000000000000000000000000000000000017864000000000001 2189MKey/s x536870912 2^31.10 x2^30=2^61.10
.....
.....
.....
[Delete]
.....
.....
.....
***********GPU#1************
KEY[1]: 0x0000000000000000000000000000000000000000000000349b84b6431a6c4ef1
   Pub: 0290e6900a58d33393bc1097b5aed31f2e4e7cbd3e5466af958665bc0121248483
****************************
Found in 52 seconds
GPU#1 job finished
GPU#2 job finished
GPU#0 job finished
Working time 00:00:53s
Total time 00:01:40s
GPU#1 thread finished
GPU#2 thread finished
GPU#0 thread finished
cuda finished ok

Press Enter to exit
sr. member
Activity: 652
Merit: 316
October 29, 2021, 07:38:42 AM
Quote
with bsgscuda and single 2080ti found in 1s.

Good....
Use my NVIDIA GeForce RTX 3090 Founders Edition
#65  only spent 38 seconds to solved
I have 3  RTX 3090 Founders Edition graphics cards 
-snip-
you need to tune -w params (D:\BTC\cuda_BSBG>SET items_size=26) set at least 30 and -htsz  29
and you will solve this puzzle #64 16 time faster
jr. member
Activity: 82
Merit: 8
October 29, 2021, 05:51:42 AM
Quote
with bsgscuda and single 2080ti found in 1s.

Good....
Use my NVIDIA GeForce RTX 3090 Founders Edition
#65  only spent 38 seconds to solved
I have 3  RTX 3090 Founders Edition graphics cards 

Code:
D:\BTC\cuda_BSBG>SET pub=30210c23b1a047bc9bdbb13448e67deddc108946de6de639bcc75d47c0216b1be383c4a8ed4fac77c0d2ad737d8499a362f483f8fe39d1e86aaed578a9455dfc
D:\BTC\cuda_BSBG>SET rangestart=0x0000000000000000000000000000000000000000000000010000000000000000
D:\BTC\cuda_BSBG>SET rangeend=  0x0000000000000000000000000000000000000000000000020000000000000000
D:\BTC\cuda_BSBG>SET thread_size=512
D:\BTC\cuda_BSBG>SET block_size=68
D:\BTC\cuda_BSBG>SET pparam_size=256
D:\BTC\cuda_BSBG>SET items_size=26
D:\BTC\cuda_BSBG>bsgscudaHT_1_7_1.exe -d 0 -t 512 -b 68 -p 256 -pb 30210c23b1a047bc9bdbb13448e67deddc108946de6de639bcc75d47c0216b1be383c4a8ed4fac77c0d2ad737d8499a362f483f8fe39d1e86aaed578a9455dfc -pk 0x0000000000000000000000000000000000000000000000010000000000000000 -pke   0x0000000000000000000000000000000000000000000000020000000000000000 -w 26
Used GPU devices #0
Number of GPU threads set to #512
Number of GPU blocks set to #68
Number of pparam set to #256
Pubkey set to 30210c23b1a047bc9bdbb13448e67deddc108946de6de639bcc75d47c0216b1be383c4a8ed4fac77c0d2ad737d8499a362f483f8fe39d1e86aaed578a9455dfc
Range begin: 0x0000000000000000000000000000000000000000000000010000000000000000
Range end: 0x0000000000000000000000000000000000000000000000020000000000000000
Items number set to 2^26
APP VERSION: 1.7.1
Found 4 Cuda device.
Cuda device:NVIDIA GeForce RTX 3090(24575Mb)
Device have: MP:82 Cores+10496
Shared memory total:49152
Constant memory total:65536
---------------
Cuda device:NVIDIA GeForce RTX 3090(24575Mb)
Device have: MP:82 Cores+10496
Shared memory total:49152
Constant memory total:65536
---------------
Cuda device:NVIDIA GeForce RTX 3090(24575Mb)
Device have: MP:82 Cores+10496
Shared memory total:49152
Constant memory total:65536
---------------
Cuda device:NVIDIA GeForce GT 1030(2047Mb)
Device have: MP:3 Cores+192
Shared memory total:49152
Constant memory total:65536
---------------
GiantSUBvalue:0000000000000000000000000000000000000000000000000000000008000000
GiantSUBpubkey: 03a94c6524bd40d2bbdac85c056236a79da78bc61fd5bdec9d2bf26bd84b2438e8
*******************************
Total GPU Memory Need: 1328.000Mb
*******************************
Both HT files exist
Load BIN file:79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798_67108864_33554432_htGPU.BIN
[0] chunk:536870912b
Generate Giants Buffer: 8912896 items
Load BIN file:512_68_256_67108864_g2.BIN
[0] chunk:570425344b
Done in 00:00:00s
GPU count #1
GPU #0 launched
GPU #0 Free memory: 20450Mb
GPU #0 Total memory: 24575Mb
GPU #0 TotalBuff: 1328.000Mb
Load BIN file:79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798_67108864_33554432_htCPU.BIN
[0] chunk:805306368b
START RANGE= 0000000000000000000000000000000000000000000000010000000000000000
  END RANGE= 0000000000000000000000000000000000000000000000020000000000000000
WIDTH RANGE= 0000000000000000000000000000000000000000000000010000000000000000
SUBpoint= (3322d401243c4e2582a2147c104d6ecbf774d163db0f5e5313b7e0e742d0e6bd, a918f8681699b10a404fe643b2250648d7fa09c15d78c509db0c5d1593d7498f)
FINDpubkey: 0230210c23b1a047bc9bdbb13448e67deddc108946de6de639bcc75d47c0216b1b
GPU#0 Cnt:0000000000000000000000000000000000000000000000000000000000000001
GPU#0 Cnt:00000000000000000000000000000000000000000000000008b3000000000001 2225MKey/s x67108864 2^31.12 x2^27=2^58.12
GPU#0 Cnt:00000000000000000000000000000000000000000000000011b2800000000001 2302MKey/s x67108864 2^31.17 x2^27=2^58.17
GPU#0 Cnt:0000000000000000000000000000000000000000000000001a98800000000001 2271MKey/s x67108864 2^31.15 x2^27=2^58.15
GPU#0 Cnt:000000000000000000000000000000000000000000000000234b800000000001 2222MKey/s x67108864 2^31.12 x2^27=2^58.12
GPU#0 Cnt:0000000000000000000000000000000000000000000000002c07000000000001 2234MKey/s x67108864 2^31.13 x2^27=2^58.13
GPU#0 Cnt:00000000000000000000000000000000000000000000000034ba000000000001 2223MKey/s x67108864 2^31.12 x2^27=2^58.12
GPU#0 Cnt:0000000000000000000000000000000000000000000000003da8800000000001 2279MKey/s x67108864 2^31.15 x2^27=2^58.15
GPU#0 Cnt:0000000000000000000000000000000000000000000000004697000000000001 2279MKey/s x67108864 2^31.15 x2^27=2^58.15
GPU#0 Cnt:0000000000000000000000000000000000000000000000004f39000000000001 2206MKey/s x67108864 2^31.11 x2^27=2^58.11
GPU#0 Cnt:00000000000000000000000000000000000000000000000057fd000000000001 2238MKey/s x67108864 2^31.13 x2^27=2^58.13
GPU#0 Cnt:00000000000000000000000000000000000000000000000060c1000000000001 2237MKey/s x67108864 2^31.13 x2^27=2^58.13
GPU#0 Cnt:00000000000000000000000000000000000000000000000069c0800000000001 2297MKey/s x67108864 2^31.17 x2^27=2^58.17
GPU#0 Cnt:00000000000000000000000000000000000000000000000072c0000000000001 2302MKey/s x67108864 2^31.17 x2^27=2^58.17
GPU#0 Cnt:0000000000000000000000000000000000000000000000007b84000000000001 2238MKey/s x67108864 2^31.13 x2^27=2^58.13
GPU#0 Cnt:0000000000000000000000000000000000000000000000008404000000000001 2169MKey/s x67108864 2^31.08 x2^27=2^58.08
GPU#0 Cnt:0000000000000000000000000000000000000000000000008d0c000000000001 2305MKey/s x67108864 2^31.17 x2^27=2^58.17
GPU#0 Cnt:00000000000000000000000000000000000000000000000095bf000000000001 2219MKey/s x67108864 2^31.12 x2^27=2^58.12
GPU#0 Cnt:0000000000000000000000000000000000000000000000009e61000000000001 2205MKey/s x67108864 2^31.11 x2^27=2^58.11
GPU#0 Cnt:000000000000000000000000000000000000000000000000a714000000000001 2224MKey/s x67108864 2^31.12 x2^27=2^58.12
***********GPU#0************
KEY[1]: 0x000000000000000000000000000000000000000000000001a838b13505b26867
   Pub: 0230210c23b1a047bc9bdbb13448e67deddc108946de6de639bcc75d47c0216b1b
****************************
Found in 38 seconds
GPU#0 job finished
Working time 00:00:38s
Total time 00:00:43s
GPU#0 thread finished
cuda finished ok

Press Enter to exit
sr. member
Activity: 652
Merit: 316
October 29, 2021, 12:20:18 AM
-snip-
Code:
============== KEYFOUND ============== 1
BSGS FOUND PrivateKey  0x6abe1f9b67e114
======================================
Start 1  :  2021-10-29 11:34:19
Start 1_0:  2021-10-29 11:34:45
END   1  :  2021-10-29 11:34:59

1 CPU speed:
BSGS Check 0x38D7EA4C68000  key / second


with bsgscuda and single 2080ti found in 1s.
Code:
FINDpubkey: 0385a30d8413af4f8f9e6312400f2d194fe14f02e719b24c3f83bf1fd233a8f963
GPU#0 Cnt:0000000000000000000000000000000000000000000000000000000000000001
***********GPU#0************
KEY[1]: 0x000000000000000000000000000000000000000000000000006abe1f9b67e114
   Pub: 0385a30d8413af4f8f9e6312400f2d194fe14f02e719b24c3f83bf1fd233a8f963
****************************
Found in 1 seconds
GPU#0 job finished
Working time 00:00:01s
bsgs cuda can be used in a rig, for example with 6-8 cards.
jr. member
Activity: 82
Merit: 8
October 28, 2021, 10:42:12 PM
based on above table you can increase speed if you will utilize both bloom+bp https://github.com/iceland2k14/bsgs
so CPU cores are less powerful than cuda and i was thinking [not sure possible or not] if we load all bp in RAM and use some bloom in GPU memory perhaps their will be some dramatic speed boost

I had try iceland2k14's BSGS
Intel i7-7800X  + 24 GB  DDR4-2400

Code:
D:\python\BSGS_ice>python bsgs_dll_secp256k1.py -p 0385a30d8413af4f8f9e6312400f2d194fe14f02e719b24c3f83bf1fd233a8f963 -b bPfile.bin -bl bloomfile.bin -n 1000000000000000 -keyspace 40000000000000:80000000000000

[+] Starting Program : BSGS mode     Version [ 13072021 ]
[+] Search Started for the Public key:  0485a30d8413af4f8f9e6312400f2d194fe14f02e719b24c3f83bf1fd233a8f9630eb400323654cec63999b56f4ba44e8b21ab92d9d697fabe4666df3678585669
[+] Search Mode: Sequential search in the given range
[+] Reading bloom filter from file complete in : 26.70087 sec
[+] Reading Baby table from file complete in : 0.03731 sec
[+] seq value: 1000000000000000    m value : 3999999999
[+] Search Range: 0x40000000000000  to  0x80000000000000
                                                                [+] k1: 0x40000000000000
PVK not found. 1000.00000 Trillion scanned in 1.14 sec. New range [+] k1: 0x438d7ea4c68001
PVK not found. 1000.00000 Trillion scanned in 1.18 sec. New range [+] k1: 0x471afd498d0002
PVK not found. 1000.00000 Trillion scanned in 1.03 sec. New range [+] k1: 0x4aa87bee538003
PVK not found. 1000.00000 Trillion scanned in 1.05 sec. New range [+] k1: 0x4e35fa931a0004
PVK not found. 1000.00000 Trillion scanned in 0.99 sec. New range [+] k1: 0x51c37937e08005
PVK not found. 1000.00000 Trillion scanned in 1.01 sec. New range [+] k1: 0x5550f7dca70006
PVK not found. 1000.00000 Trillion scanned in 1.00 sec. New range [+] k1: 0x58de76816d8007
PVK not found. 1000.00000 Trillion scanned in 1.01 sec. New range [+] k1: 0x5c6bf526340008
PVK not found. 1000.00000 Trillion scanned in 1.09 sec. New range [+] k1: 0x5ff973cafa8009
PVK not found. 1000.00000 Trillion scanned in 0.93 sec. New range [+] k1: 0x6386f26fc1000a
PVK not found. 1000.00000 Trillion scanned in 0.96 sec. New range [+] k1: 0x6714711487800b
PVK not found. 1000.00000 Trillion scanned in 1.00 sec. New range [+] k1: 0x6aa1efb94e000c
============== KEYFOUND ============== 1
BSGS FOUND PrivateKey  0x6abe1f9b67e114
======================================
Start 1  :  2021-10-29 11:34:19
Start 1_0:  2021-10-29 11:34:45
END   1  :  2021-10-29 11:34:59

1 CPU speed:
BSGS Check 0x38D7EA4C68000  key / second

sr. member
Activity: 652
Merit: 316
October 24, 2021, 02:05:38 PM

I compute as max as possible baby array size dependency of GPU memory
Baby array is 1G,2G,3G...
So this array computed only first time and then redesigned to hashtable.
It is one HT for any ranges.
Giant array is computed with doubled value of Baby array size, for ex if Baby array have size 2^30 then Giant Array have value G*(2^31), G*(2^32), G*(2^33)...
All arrays computed only one time if not changed settings.
So you can easy used all arays for different ranges and pubkeys without recompute.

You can choose:

1) 2^30 baby-steps: 1*G, 2*G, ..., 2^30*G
and
2^29 giant steps: P-1*2^31*G, P-2*2^31*G,..,P -a*2^31*G where a is in [1,2,.., 2^29 - 1]

or

2) 2^29 baby-steps: 1*G, 2*G, ..., 2^29*G  
and
2^30 giant steps: P-1*2^30*G, P-2*2^30*G,..., P -a*2^30*G  where a is in [1,2,..., 2^30 - 1]

I don't understand why:  "G*(2^31), G*(2^32), G*(2^33)": in this way you compute only 30 giant steps




it is universal arrays for any ranges. You not need to recompute array if you change range or pubkeys.
size of giant array is equil to thread number * block number * pparam
for 2080ti i use 512 thread 138 blocks and pparam = 480
so totaly i have 33914880 doubled giant values
So each cuda kernel call calculate 33914880 * 2(due to +y/-y in batch additions) giat steps
legendary
Activity: 1948
Merit: 2097
October 24, 2021, 01:57:39 PM

I compute as max as possible baby array size dependency of GPU memory
Baby array is 1G,2G,3G...
So this array computed only first time and then redesigned to hashtable.
It is one HT for any ranges.
Giant array is computed with doubled value of Baby array size, for ex if Baby array have size 2^30 then Giant Array have value G*(2^31), G*(2^32), G*(2^33)...
All arrays computed only one time if not changed settings.
So you can easy used all arays for different ranges and pubkeys without recompute.

You can choose:

1) 2^30 baby-steps: 1*G, 2*G, ..., 2^30*G
and
2^29 giant steps: P-1*2^31*G, P-2*2^31*G,..,P -a*2^31*G where a is in [1,2,.., 2^29 - 1]

or

2) 2^29 baby-steps: 1*G, 2*G, ..., 2^29*G 
and
2^30 giant steps: P-1*2^30*G, P-2*2^30*G,..., P -a*2^30*G  where a is in [1,2,..., 2^30 - 1]

I don't understand why:  "G*(2^31), G*(2^32), G*(2^33)": in this way you compute only 30 giant steps



sr. member
Activity: 652
Merit: 316
October 24, 2021, 01:46:12 PM
Also used double giant step, so we substuct double giant value from pub.

So, instead of computing

sqrt(n) / 2  baby steps
and
sqrt(n) giant steps

you compute

sqrt(n) baby steps
and
sqrt(n) / 2 giant steps


then, for example, for n = 2^60:

P - a * (2^31*G) = b*G  where  'a' lies in [1, ..., 2^29] and 'b' lies in [1,...,2^30]

means:  

1) P - a*(2^31*G) = b*G  -->   P = [a*(2^31) + b] * G  --> priv key = a*(2^31) + b
or
2) P - a*(2^31*G) = -b*G  -->   P = [a*(2^31) - b] * G  --> priv key = a*(2^31) - b

to save 2^30 giant steps.

It seems to me that the program is already optimized.
I compute as max as possible baby array size dependency of GPU memory
Baby array is 1G,2G,3G...
So this array computed only first time and then redesigned to hashtable.
It is one HT for any ranges.
Giant array is computed with doubled value of Baby array size, for ex if Baby array have size 2^30 then Giant Array have value G*(2^31), G*(2^32), G*(2^33)...
All arrays computed only one time if not changed settings.
So you can easy used all arays for different ranges and pubkeys without recompute.
legendary
Activity: 1948
Merit: 2097
October 24, 2021, 01:33:05 PM
Also used double giant step, so we substuct double giant value from pub.

So, instead of computing

sqrt(n) / 2  baby steps
and
sqrt(n) giant steps

you compute

sqrt(n) baby steps
and
sqrt(n) / 2 giant steps


then, for example, for n = 2^60:

P - a * (2^31*G) = b*G  where  'a' lies in [1, ..., 2^29] and 'b' lies in [1,...,2^30]

means:  

1) P - a*(2^31*G) = b*G  -->   P = [a*(2^31) + b] * G  --> priv key = a*(2^31) + b
or
2) P - a*(2^31*G) = -b*G  -->   P = [a*(2^31) - b] * G  --> priv key = a*(2^31) - b

to save 2^30 giant steps.

It seems to me that the program is already optimized.
sr. member
Activity: 652
Merit: 316
October 24, 2021, 12:57:01 PM

Ok.  
If you have a batch of 100 points, you don't need to compute 100 inversions but only 50 inversions.
Because  for example P+2G and P-2G use the same inverse.

 
Exacly all this used in bsgs cuda(batch addition and symmetry)
befor using symmetry in addition speed was 800Mkeys with -w 30, after using symmetry speed grow to 1150Mkeys

Ok. The square "a**2 mod p" is optimized like here ?
No. used mult mod P
i use optimized square mod P in PB https://github.com/Etayson/BSGS-cuda/blob/e41fff517b8de153b6bf9846ee7abb47524fe43e/lib/Curve64.pb#L2161
but need buffer 512bytes for this, so i did not transfer it to cuda ptx
Also used double giant step, so we substuct double giant value from pub.
legendary
Activity: 1948
Merit: 2097
October 24, 2021, 12:52:59 PM

Ok.  
If you have a batch of 100 points, you don't need to compute 100 inversions but only 50 inversions.
Because  for example P+2G and P-2G use the same inverse.

 
Exacly all this used in bsgs cuda(batch addition and symmetry)
befor using symmetry in addition speed was 800Mkeys with -w 30, after using symmetry speed grow to 1150Mkeys

Ok. The square "a**2 mod p" is optimized like here ?
sr. member
Activity: 652
Merit: 316
October 24, 2021, 12:44:48 PM

Ok.  
If you have a batch of 100 points, you don't need to compute 100 inversions but only 50 inversions.
Because  for example P+2G and P-2G use the same inverse.

 
Exacly all this used in bsgs cuda(batch addition and symmetry)
befor using symmetry in addition speed was 800Mkeys with -w 30, after using symmetry speed grow to 1150Mkeys
legendary
Activity: 1948
Merit: 2097
October 24, 2021, 12:40:09 PM
Besides, in order to compute a batch of k steps, you need to calculate only k/2 (instead of k) elements x^-1 mod p , at the cost of 1 inversions and 3*(k/2 - 1) multiplications mod p.

No, don`t use even don`t know about this.
Will try to understand this tweak, thanks.

I mean: how do you compute a batch of 'consecutive' keys ?

Like P, P+G, P+2G, P+3G, P+4G, P+5G, ...  ?
with group inverse, calculate inverse for batch and use it in addition. Sorry if i misunderstud question.

Ok.  
If you have a batch of 100 points, you don't need to compute 100 inversions but only 50 inversions.

If you have to compute A + B:




If you have to compute A - B, since -B = (xB, n-yB)

1/(xb-xa) is the same as in A + B.
Because  for example P+2G and P-2G use the same inverse.

 
sr. member
Activity: 652
Merit: 316
October 24, 2021, 12:31:37 PM
Besides, in order to compute a batch of k steps, you need to calculate only k/2 (instead of k) elements x^-1 mod p , at the cost of 1 inversions and 3*(k/2 - 1) multiplications mod p.

No, don`t use even don`t know about this.
Will try to understand this tweak, thanks.

I mean: how do you compute a batch of 'consecutive' keys ?

Like P, P+G, P+2G, P+3G, P+4G, P+5G, ...  ?
with group inverse, calculate inverse for batch and use it in addition. Sorry if i misunderstud question.
In the same way as in bitcrack https://github.com/brichard19/BitCrack/blob/6bf8059ef075eb1622298395866b0bd02375e1d9/cudaMath/secp256k1.cuh#L642
and then https://github.com/brichard19/BitCrack/blob/6bf8059ef075eb1622298395866b0bd02375e1d9/cudaMath/secp256k1.cuh#L656
legendary
Activity: 1948
Merit: 2097
October 24, 2021, 12:18:43 PM
Besides, in order to compute a batch of k steps, you need to calculate only k/2 (instead of k) elements x^-1 mod p , at the cost of 1 inversions and 3*(k/2 - 1) multiplications mod p.

No, don`t use even don`t know about this.
Will try to understand this tweak, thanks.

I mean: how do you compute a batch of 'consecutive' keys ?

Like P, P+G, P+2G, P+3G, P+4G, P+5G, ...  ?
legendary
Activity: 1948
Merit: 2097
October 24, 2021, 12:16:14 PM
As you already explained few days ago, cuda BSGS is searching the keys one after another, not in parallel. I know that, I read carefully Wink. With my previous post I meant that even if it would run in parallel it would slow down as described. So read my post in conjunctive and if someone would modify it to process in parallel I expect that behaviours.

BSGS works in this way:

suppose we know that P = k*G, and the private key k is in [1,...., 2^60] range

1) precompute 2^29 baby steps (they are simple public keys): 1*G, 2*G, 3*G, ...., 2^29 * G

2) split the public key P in many other public keys (they are called giant steps, 2^30 public keys):
P, P - 1*(2^30*G), P - 2*(2^30*G), P - 3*(2^30)G, ..., P - (2^30 - 1)*(2^30*G)

3) for each giant steps, you check if it lies in [1, ..., 2^30] range (if it is equal to a 'baby step' public key)

4) if    P - a*(2^30*G) = +-b*G   then  P = (a*2^30 +- b)*G   then the private key is k = a*2^30 +- b

If you want to search 2 public keys, P1 and P2, you can use the same baby steps, but you need to generate 2^31 giant steps instead of 2^30.
sr. member
Activity: 652
Merit: 316
October 24, 2021, 12:02:06 PM
Do you use the negation map to speed up the algorithm ?  --> pag. 8-9  https://eprint.iacr.org/2015/605.pdf

You need to compute only:

sqrt(n) / 2  baby steps : for example, n = 2^60  -> 2^29 baby steps

sqrt(n)  giant steps : for example, n = 2^60  -> 2^30 giant steps

It is like to shift the public key of 2^59 steps, and search in the interval [1,..., 2^59] instead of [1,...., 2^60] exploiting the symmetrie.

Besides, in order to compute a batch of k steps, you need to calculate only k/2 (instead of k) elements x^-1 mod p , at the cost of 1 inversions and 3*(k/2 - 1) multiplications mod p.

No, don`t use even don`t know about this.
Will try to understand this tweak, thanks.
Pages:
Jump to: