VanitySearch (Yet another address prefix finder) - page 56.

arulbero

legendary

Activity: 1968

Merit: 2130

Quote from: Jean_Luc on March 20, 2019, 04:00:00 AM

A new release of VanitySearch (1.9) is out:

Code:

Added -b option (Search compressed or uncompressed addresses)
Improved performance for loading large prefix list
Fixed difficulty calculation bug for prefix containing only '1'

New version is slower on my pc (132 MKeys/s against 162 MKeys/s).

Jean_Luc

sr. member

Activity: 462

Merit: 701

Hello,

A new release of VanitySearch (1.9) is out:

Code:

Added -b option (Search compressed or uncompressed addresses)
Improved performance for loading large prefix list
Fixed difficulty calculation bug for prefix containing only '1'

Windows binaries: https://github.com/JeanLucPons/VanitySearch/releases/tag/1.9
Linux binaries: http://zelda38.free.fr/VanitySearch/ (Experimental)

Tanks to test it !

TryNinja

legendary

Activity: 2758

Merit: 6830

Quote from: Lisa Finn on March 19, 2019, 01:11:55 PM

Is this really legil in asian countries like India Huh

Why would a Bitcoin address generator be ilegal anywhere?

Lisa Finn

newbie

Activity: 4

Merit: 0

Quote from: Jean_Luc on February 20, 2019, 09:31:36 AM

Hello,

I would like to present a new bitcoin prefix address finder called VanitySearch. It is very similar to Vanitygen.
The main differences with Vanitygen are that VanitySearch is not using the heavy OpenSSL for CPU calculation and that the kernel is written in Cuda in order to take full advantage of inline PTX assembly.
On my Intel Core i7-4770, VanitySearch runs ~4 times faster than vanitygen64. (1.32 Mkey/s -> 5.27 MK/s)
On my GeForce GTX 645, VanitySearch runs ~1.5 times faster than oclvanitygen. (9.26 Mkey/s -> 14.548 MK/s)
If you want to compare VanitySearch and Vanitygen result, use the -u option for searching uncompressed address.
VanitySearch may not compute a good gridsize for your GPU, so make several tries using -g options in order to find best performances.
Using compressed addresses is roughly 20% faster.

VanitySearch is available from https://github.com/JeanLucPons/VanitySearch

There is still lots of improvement to do.
Feel free to test it and to submit issue.

Thanks.
Sorry for my bad English.
Jean-Luc

Is this really legil in asian countries like India Huh

Jean_Luc

sr. member

Activity: 462

Merit: 701

Linux binary are available for download here (experimental).
They are compiled with CUDA SDK10.
Thanks to test them Wink

http://zelda38.free.fr/VanitySearch/

Jean_Luc

sr. member

Activity: 462

Merit: 701

Hello,

Quote from: stortz on March 17, 2019, 05:43:52 PM

it ran, but just closed after finding it
did it generate the private keys into a file?
I am confused

To output the key in a file, use the -o option.

Code:

VanitySearch -stop -gpu -o key.txt 1stortz

Many thanks stivensons for the report

stivensons

jr. member

Activity: 82

Merit: 1

Quote from: Jean_Luc on March 17, 2019, 01:28:46 PM

Quote from: stivensons on March 17, 2019, 01:04:15 PM

if you post a release windows , I can test it too

You can test with the release you have.
You can try:

Code:

VanitySearch -gpuId 0 -check 
VanitySearch -gpuId 6 -check (On the 3GB)

Thanks

Tomorow, I will try to set up cuda sdk 10 on a recent hardware (Linux) and see If I can reproduce the issue.

cuda 10

Code:

G:\vanitysearch>vanitysearch   -gpuId 0 -check
GetBase10() Results OK
Add() Results OK : 567.189 MegaAdd/sec
Mult() Results OK : 38.169 MegaMult/sec
Div() Results OK : 4.410 MegaDiv/sec
ModInv()/ModExp() Results OK
ModInv() : 281.352 KiloInv/sec
IntGroup.ModInv() : 8.365 MegaInv/sec
ModMulK1() : 10.770 MegaMult/sec
ModSqrt() OK !
Check Generator :OK
Check Double :OK
Check Add :OK
Check GenKey :OK
Adress : 15t3Nt1zyMETkHbjJTTshxLnqPzQvAtdCe OK!
Adress : 1BoatSLRHtKNngkdXEeobR76b53LETtpyT OK!
Adress : 1JeanLucgidKHxfY5gkqGmoVjo1yaU4EDt OK(comp)!
Adress : 1Test6BNjSJC5qwYXsjwKVLvz7DpfLehy OK!
Adress : 1BitcoinP7vnLpsUHWbzDALyJKnNo16Qms OK(comp)!
Check Calc PubKey (full) 1ViViGLEawN27xRzGrEhhYPQrZiTKvKLo :OK
Check Calc PubKey (even) 1Gp7rQ4GdooysEAEJAS2o4Ktjvf1tZCihp:OK
Check Calc PubKey (odd) 18aPiLmTow7Xgu96msrDYvSSWweCvB9oBA:OK
GPU: GPU #0 GeForce GTX 1060 6GB (10x128 cores) Grid(80x128)
Seed: 1853432973
296.742 MegaKey/sec
ComputeKeys() found 1947 items , CPU check...
GPU/CPU check OK

Code:

G:\vanitysearch>vanitysearch   -gpuId 6 -check
GetBase10() Results OK
Add() Results OK : 556.067 MegaAdd/sec
Mult() Results OK : 35.273 MegaMult/sec
Div() Results OK : 4.104 MegaDiv/sec
ModInv()/ModExp() Results OK
ModInv() : 260.561 KiloInv/sec
IntGroup.ModInv() : 7.773 MegaInv/sec
ModMulK1() : 9.881 MegaMult/sec
ModSqrt() OK !
Check Generator :OK
Check Double :OK
Check Add :OK
Check GenKey :OK
Adress : 15t3Nt1zyMETkHbjJTTshxLnqPzQvAtdCe OK!
Adress : 1BoatSLRHtKNngkdXEeobR76b53LETtpyT OK!
Adress : 1JeanLucgidKHxfY5gkqGmoVjo1yaU4EDt OK(comp)!
Adress : 1Test6BNjSJC5qwYXsjwKVLvz7DpfLehy OK!
Adress : 1BitcoinP7vnLpsUHWbzDALyJKnNo16Qms OK(comp)!
Check Calc PubKey (full) 1ViViGLEawN27xRzGrEhhYPQrZiTKvKLo :OK
Check Calc PubKey (even) 1Gp7rQ4GdooysEAEJAS2o4Ktjvf1tZCihp:OK
Check Calc PubKey (odd) 18aPiLmTow7Xgu96msrDYvSSWweCvB9oBA:OK
GPU: GPU #6 GeForce GTX 1060 3GB (9x128 cores) Grid(72x128)
Seed: 2205931314
260.131 MegaKey/sec
ComputeKeys() found 1752 items , CPU check...
GPU/CPU check OK

stortz

jr. member

Activity: 40

Merit: 15

I tried your program with the parameters as shown in the sample + my username

Code:

-stop -gpu 1stortz

it ran, but just closed after finding it
did it generate the private keys into a file?
I am confused

Jean_Luc

sr. member

Activity: 462

Merit: 701

Quote from: stivensons on March 17, 2019, 01:04:15 PM

if you post a release windows , I can test it too

You can test with the release you have.
You can try:

Code:

VanitySearch -gpuId 0 -check 
VanitySearch -gpuId 6 -check (On the 3GB)

Thanks

Tomorow, I will try to set up cuda sdk 10 on a recent hardware (Linux) and see If I can reproduce the issue.

stivensons

jr. member

Activity: 82

Merit: 1

Quote from: Jean_Luc on March 17, 2019, 11:55:04 AM

Ok Thanks, could you try to run cuda-memcheck on the release version.

if you post a release windows , I can test it too

arulbero

legendary

Activity: 1968

Merit: 2130

Quote from: Jean_Luc on March 17, 2019, 11:55:04 AM

Ok Thanks, could you try to run cuda-memcheck on the release version.

Code:

~/VanitySearch-1.8$ /usr/local/cuda-8.0/bin/cuda-memcheck --tool memcheck VanitySearch -g 1 -check
========= CUDA-MEMCHECK
GetBase10() Results OK
Add() Results OK : 123.457 MegaAdd/sec
Mult() Results OK : 23.148 MegaMult/sec
Div() Results OK : 5.208 MegaDiv/sec
ModInv()/ModExp() Results OK
ModInv() : 341.317 KiloInv/sec
IntGroup.ModInv() : 9.130 MegaInv/sec
ModMulK1() : 12.968 MegaMult/sec
ModSqrt() OK !
Check Generator :OK
Check Double :OK
Check Add :OK
Check GenKey :OK
Adress : 15t3Nt1zyMETkHbjJTTshxLnqPzQvAtdCe OK!
Adress : 1BoatSLRHtKNngkdXEeobR76b53LETtpyT OK!
Adress : 1JeanLucgidKHxfY5gkqGmoVjo1yaU4EDt OK(comp)!
Adress : 1Test6BNjSJC5qwYXsjwKVLvz7DpfLehy OK!
Adress : 1BitcoinP7vnLpsUHWbzDALyJKnNo16Qms OK(comp)!
Check Calc PubKey (full) 1ViViGLEawN27xRzGrEhhYPQrZiTKvKLo :OK
Check Calc PubKey (even) 1Gp7rQ4GdooysEAEJAS2o4Ktjvf1tZCihp:OK
Check Calc PubKey (odd) 18aPiLmTow7Xgu96msrDYvSSWweCvB9oBA:OK
GPU: GPU #0 Quadro M2200 (8x128 cores) Grid(1x128)
Seed: 223215
95.697 KiloKey/sec
ComputeKeys() found 26 items , CPU check...
Expected item not found 3412bb65 cb39a716 67dcd486 209b19df c65e364c
Expected item not found fefea644 d535267a 46308e46 c579e91b 0aad3ee2
Expected item not found 3412726b 9830f325 9c5f0d95 a99e2a9b 6c473922
Expected item not found 341292e1 b4a39d2c 59e34f3d 38725b42 dfc2e801
Expected item not found fefeba57 c1209e3d 1b79200c b9529018 de0e35e4
Expected item not found fefe4aaa 34f02402 4ed76c83 a1d60efc 8c79f7a6
Expected item not found fefe8742 63e9b7bc b13a08f1 28229fd8 30987ed3
CPU found 22 items
========= ERROR SUMMARY: 0 errors

Jean_Luc

sr. member

Activity: 462

Merit: 701

Ok Thanks, could you try to run cuda-memcheck on the release version.

arulbero

legendary

Activity: 1968

Merit: 2130

Quote from: Jean_Luc on March 17, 2019, 10:47:15 AM

I committed a new Makefile with debug option.

Code:

make clean
make gpu=1 debug=1 all

In debug mode no inlining is done.

But, obviously it is much slower.
So launch

Code:

pons@linpons:~/VanitySearch$ ./VanitySearch -g 1 -check

Code:

./VanitySearch -g 1 -check
GetBase10() Results OK
Add() Results OK : 108.696 MegaAdd/sec
Mult() Results OK : 10.684 MegaMult/sec
Div() Results OK : 1.656 MegaDiv/sec
ModInv()/ModExp() Results OK
ModInv() Results OK : 132.041 KiloInv/sec
IntGroup.ModInv() Results OK : 2.222 MegaInv/sec
ModMulK1() Results OK : 3.661 MegaMult/sec
ModMulK1order() Results OK : 1.700 MegaMult/sec
ModSqrt() Results OK !
Check Generator :OK
Check Double :OK
Check Add :OK
Check GenKey :OK
Adress : 15t3Nt1zyMETkHbjJTTshxLnqPzQvAtdCe OK!
Adress : 1BoatSLRHtKNngkdXEeobR76b53LETtpyT OK!
Adress : 1JeanLucgidKHxfY5gkqGmoVjo1yaU4EDt OK(comp)!
Adress : 1Test6BNjSJC5qwYXsjwKVLvz7DpfLehy OK!
Adress : 1BitcoinP7vnLpsUHWbzDALyJKnNo16Qms OK(comp)!
Check Calc PubKey (full) 1ViViGLEawN27xRzGrEhhYPQrZiTKvKLo :OK
Check Calc PubKey (even) 1Gp7rQ4GdooysEAEJAS2o4Ktjvf1tZCihp:OK
Check Calc PubKey (odd) 18aPiLmTow7Xgu96msrDYvSSWweCvB9oBA:OK
GPU: GPU #0 Quadro M2200 (8x128 cores) Grid(1x128)
Seed: 888394
193.110 KiloKey/sec
ComputeKeys() found 26 items , CPU check...
GPU/CPU check OK

Code:

~/VanitySearch$ /usr/local/cuda-8.0/bin/cuda-memcheck --tool memcheck VanitySearch -g 1 -check
========= CUDA-MEMCHECK
GetBase10() Results OK
Add() Results OK : 109.890 MegaAdd/sec
Mult() Results OK : 10.695 MegaMult/sec
Div() Results OK : 1.818 MegaDiv/sec
ModInv()/ModExp() Results OK
ModInv() Results OK : 130.572 KiloInv/sec
IntGroup.ModInv() Results OK : 2.182 MegaInv/sec
ModMulK1() Results OK : 3.602 MegaMult/sec
ModMulK1order() Results OK : 1.684 MegaMult/sec
ModSqrt() Results OK !
Check Generator :OK
Check Double :OK
Check Add :OK
Check GenKey :OK
Adress : 15t3Nt1zyMETkHbjJTTshxLnqPzQvAtdCe OK!
Adress : 1BoatSLRHtKNngkdXEeobR76b53LETtpyT OK!
Adress : 1JeanLucgidKHxfY5gkqGmoVjo1yaU4EDt OK(comp)!
Adress : 1Test6BNjSJC5qwYXsjwKVLvz7DpfLehy OK!
Adress : 1BitcoinP7vnLpsUHWbzDALyJKnNo16Qms OK(comp)!
Check Calc PubKey (full) 1ViViGLEawN27xRzGrEhhYPQrZiTKvKLo :OK
Check Calc PubKey (even) 1Gp7rQ4GdooysEAEJAS2o4Ktjvf1tZCihp:OK
Check Calc PubKey (odd) 18aPiLmTow7Xgu96msrDYvSSWweCvB9oBA:OK
GPU: GPU #0 Quadro M2200 (8x128 cores) Grid(1x128)
Seed: 781110
15.061 KiloKey/sec
ComputeKeys() found 26 items , CPU check...
GPU/CPU check OK
========= ERROR SUMMARY: 0 errors

Code:

~/VanitySearch$ /usr/local/cuda-8.0/bin/cuda-memcheck --tool memcheck VanitySearch -g 32 -check
========= CUDA-MEMCHECK
GetBase10() Results OK
Add() Results OK : 80.000 MegaAdd/sec
Mult() Results OK : 10.030 MegaMult/sec
Div() Results OK : 1.883 MegaDiv/sec
ModInv()/ModExp() Results OK
ModInv() Results OK : 130.924 KiloInv/sec
IntGroup.ModInv() Results OK : 2.221 MegaInv/sec
ModMulK1() Results OK : 3.659 MegaMult/sec
ModMulK1order() Results OK : 1.704 MegaMult/sec
ModSqrt() Results OK !
Check Generator :OK
Check Double :OK
Check Add :OK
Check GenKey :OK
Adress : 15t3Nt1zyMETkHbjJTTshxLnqPzQvAtdCe OK!
Adress : 1BoatSLRHtKNngkdXEeobR76b53LETtpyT OK!
Adress : 1JeanLucgidKHxfY5gkqGmoVjo1yaU4EDt OK(comp)!
Adress : 1Test6BNjSJC5qwYXsjwKVLvz7DpfLehy OK!
Adress : 1BitcoinP7vnLpsUHWbzDALyJKnNo16Qms OK(comp)!
Check Calc PubKey (full) 1ViViGLEawN27xRzGrEhhYPQrZiTKvKLo :OK
Check Calc PubKey (even) 1Gp7rQ4GdooysEAEJAS2o4Ktjvf1tZCihp:OK
Check Calc PubKey (odd) 18aPiLmTow7Xgu96msrDYvSSWweCvB9oBA:OK
GPU: GPU #0 Quadro M2200 (8x128 cores) Grid(32x128)
Seed: 639838
59.308 KiloKey/sec
ComputeKeys() found 721 items , CPU check...
GPU/CPU check OK
========= ERROR SUMMARY: 0 errors

Jean_Luc

sr. member

Activity: 462

Merit: 701

I committed a new Makefile with debug option.

Code:

make clean
make gpu=1 debug=1 all

In debug mode no inlining is done.

But, obviously it is much slower.
So launch

Code:

pons@linpons:~/VanitySearch$ ./VanitySearch -g 1 -check

Jean_Luc

sr. member

Activity: 462

Merit: 701

Could you try this:

Code:

pons@linpons:~/VanitySearch$ /usr/local/cuda/bin/cuda-memcheck --tool memcheck VanitySearch -g 1 -check

On my Linux it does not work (too old hardware) but on windows it ends like this.

Code:

C:\C++\VanitySearch\x64\ReleaseSM30>cuda-memcheck --tool memcheck VanitySearch.exe -g 1 -check
...
Check Calc PubKey (odd) 18aPiLmTow7Xgu96msrDYvSSWweCvB9oBA:OK
GPU: GPU #0 GeForce GTX 645 (3x192 cores) Grid(1x128)
Endianness: Little
Seed: 1006346800
401.220 KiloKey/sec
ComputeKeys() found 46 items , CPU check...
GPU/CPU check OK
========= ERROR SUMMARY: 0 errors

arulbero

legendary

Activity: 1968

Merit: 2130

Quote from: Jean_Luc on March 17, 2019, 08:14:52 AM

Just to try.
Try to reduce the number of thread per block from 128 to 64.
And if it works to double the number of block per grid using -g

GPUEngine.h:28

Code:

#define NB_TRHEAD_PER_GROUP 64

There is a typo in the code Wink

The errors remain.

Jean_Luc

sr. member

Activity: 462

Merit: 701

Just to try.
Try to reduce the number of thread per block from 128 to 64.
And if it works to double the number of block per grid using -g

GPUEngine.h:28

Code:

#define NB_TRHEAD_PER_GROUP 64

There is a typo in the code Wink

Jean_Luc

sr. member

Activity: 462

Merit: 701

OK it confirms what I'm thinking.
It seems that this code is now near the limit of what CUDA (or nvcc) can do.
May be CUDA SDK 10 can help.
I'll try (for other users also) to make things work for CUDA 10 under Linux.
I'll try also to reduce the code size.

arulbero

legendary

Activity: 1968

Merit: 2130

Quote from: Jean_Luc on March 17, 2019, 04:48:34 AM

I would try:
__noinline__ _ModMult,
__noinline__ ModNeg256

Problem solved!!!

Code:

__device__ __noinline__ void ModNeg256(uint64_t *r, uint64_t *a) {
__device__ __noinline__ void ModNeg256(uint64_t *r) {
__device__ __noinline__ void ModSub256(uint64_t *r, uint64_t *a, uint64_t *b) {
__device__ __noinline__ void ModAdd256(uint64_t *r, uint64_t *b) {
__device__ __noinline__ void ModSub256(uint64_t *r, uint64_t *b) {
__device__ __noinline__ void _ModMult(uint64_t *r, uint64_t *a, uint64_t *b) {
__device__ __noinline__ void _ModMult(uint64_t *r, uint64_t *a) {

Code:

./VanitySearch -g 16 -check
GetBase10() Results OK
Add() Results OK : 256.410 MegaAdd/sec
Mult() Results OK : 21.186 MegaMult/sec
Div() Results OK : 4.785 MegaDiv/sec
ModInv()/ModExp() Results OK
ModInv() Results OK : 327.826 KiloInv/sec
IntGroup.ModInv() Results OK : 8.977 MegaInv/sec
ModMulK1() Results OK : 12.876 MegaMult/sec
ModMulK1order() Results OK : 6.280 MegaMult/sec
ModSqrt() Results OK !
Check Generator :OK
Check Double :OK
Check Add :OK
Check GenKey :OK
Adress : 15t3Nt1zyMETkHbjJTTshxLnqPzQvAtdCe OK!
Adress : 1BoatSLRHtKNngkdXEeobR76b53LETtpyT OK!
Adress : 1JeanLucgidKHxfY5gkqGmoVjo1yaU4EDt OK(comp)!
Adress : 1Test6BNjSJC5qwYXsjwKVLvz7DpfLehy OK!
Adress : 1BitcoinP7vnLpsUHWbzDALyJKnNo16Qms OK(comp)!
Check Calc PubKey (full) 1ViViGLEawN27xRzGrEhhYPQrZiTKvKLo :OK
Check Calc PubKey (even) 1Gp7rQ4GdooysEAEJAS2o4Ktjvf1tZCihp:OK
Check Calc PubKey (odd) 18aPiLmTow7Xgu96msrDYvSSWweCvB9oBA:OK
GPU: GPU #0 Quadro M2200 (8x128 cores) Grid(16x128)
Endianness: Little
Seed: 120744
85.474 MegaKey/sec
ComputeKeys() found 394 items , CPU check...
Expected item not found fefea433 b7c86941 0c9e4746 90f5216a 5c48b7db (thread=1534, incr=510, endo=1)
CPU found 395 items
GPU: point correct [70/70]
GPU: endo #1 correct [75/76]
GPU: endo #2 correct [69/69]
GPU: sym/point correct [58/58]
GPU: sym/endo #1 correct [58/58]
GPU: sym/endo #2 correct [64/64]

The speed now is about 145 MKeys/s vs 162 Mkeys/s.

Thanks!!!

EDIT:

I think I have to add some other __noinline__...

Code:

ComputeKeys() found 380 items , CPU check...
Expected item not found 3412f1c5 0b1d320d 010f9de1 08deea41 d42a2b22 (thread=1070, incr=-514, endo=0)
Expected item not found fefe61da f2af1a6e c20ea91b 56ebc050 be432b01 (thread=1922, incr=511, endo=1)
CPU found 378 items
GPU: point correct [67/67]
GPU: endo #1 correct [54/55]
GPU: endo #2 correct [56/56]
GPU: sym/point correct [67/68]
GPU: sym/endo #1 correct [69/69]
GPU: sym/endo #2 correct [63/63]

Jean_Luc

sr. member

Activity: 462

Merit: 701

After the mark, calculation are 50% wrong.
On my 2 configs, all is working fine.
It really looks like the weird problem I had last time.

The _GetHash160Comp is ok, it is also tested alone by the check function.
The _ModMult is heavily used during ecc calculation.
The CHECK_POINT() works 100% the in first case.

I would try:
__noinline__ _ModMult,
__noinline__ ModNeg256
Remove the whole lookup32 test in CheckPoint() (not used here)

I will add more info...

Code:

__device__ __noinline__ void CheckHashComp(prefix_t *prefix, uint64_t *px, uint64_t *py,
int32_t incr, uint32_t tid, uint32_t *lookup32, uint32_t *out) {

uint32_t h[20];
uint64_t pe1x[4];
uint64_t pe2x[4];

_GetHash160Comp(px, py, (uint8_t *)h);
CHECK_POINT(h, incr, 0); <-- 100% Ok up to here, means that (px,py) is good
_ModMult(pe1x, px, _beta);
_GetHash160Comp(pe1x, py, (uint8_t *)h); <-- 50% Wrong from here
CHECK_POINT(h, incr, 1);
_ModMult(pe2x, px, _beta2);
_GetHash160Comp(pe2x, py, (uint8_t *)h);
CHECK_POINT(h, incr, 2);

ModNeg256(py);

_GetHash160Comp(px, py, (uint8_t *)h);
CHECK_POINT(h, -incr, 0);
_GetHash160Comp(pe1x, py, (uint8_t *)h);
CHECK_POINT(h, -incr, 1);
_GetHash160Comp(pe2x, py, (uint8_t *)h);
CHECK_POINT(h, -incr, 2);

}

Topic: VanitySearch (Yet another address prefix finder) - page 56. (Read 33086 times)