Pages:
Author

Topic: VanitySearch (Yet another address prefix finder) - page 54. (Read 31159 times)

legendary
Activity: 1914
Merit: 2071

I tried your function on my Linux config but it does bring significant performance increase.
Mainly due to the fact that adding temporary variable add more spill move which are slower, sometimes it is better to recompute.
On your hardware you have much more available registers, performance increase should be more significant.

A tip, May be you can try to play with the maxregister in the makefile, for compute cap 5.0, nvcc cuda 10, use 120 registers.
The random problem you have may also be due to wrong register sharing between thread, it can explain the strange and random behavior. Reducing the number of used register by inlining also reduce the probability that this happens.
It might be an explanation...

With "-maxrregcount=50" I got 188 MKeys/s speed (but there are are still errors).
sr. member
Activity: 462
Merit: 696
Already tried wit "LD_LIBRARY_PATH",  the problem is the driver. I have Ubuntu 17.04, I cannot install a new driver on it.

Ok, That's too bad that the driver is not compatible.

I tried your function on my Linux config but it does bring significant performance increase.
Mainly due to the fact that adding temporary variable add more spill move which are slower, sometimes it is better to recompute.
On your hardware you have much more available registers, performance increase should be more significant.

A tip, May be you can try to play with the maxregister in the makefile, for compute cap 5.0, nvcc cuda 10, use 120 registers.
The random problem you have may also be due to wrong register sharing between thread, it can explain the strange and random behavior. Reducing the number of used register by inlining also reduce the probability that this happens.
It might be an explanation...

legendary
Activity: 1914
Merit: 2071
Many thanks for the tips Wink
I will try this.

You don't want to try binary ? The libcudart.so.10.0 is also available from the given link. You do not need to set up cuda sdk 10 (unless a driver problem appears but this may work without installing anything).
You can just copy VanitySearch50 and the libcudart.so.10.0 in a directory and set the LD_LIBRARY_PATH.
Code:
export LD_LIBRARY_PATH=.
./VanitySearch50 ...

This is mainly to see if the problem is solved with CUDA 10 or if it comes from elsewhere.


Already tried wit "LD_LIBRARY_PATH",  the problem is the driver. I have Ubuntu 17.04, I cannot install a new driver on it.
sr. member
Activity: 462
Merit: 696
(I'm not sure what C means, I suppose means with carry)

Yes,
ADD0 is the initial add without carry and set carry flag
ADDC is add with carry and set carry flag
ADD is add with carry and do no set carry flag
Same for SUB
Function may be have a 1 suffix for unary function.
sr. member
Activity: 462
Merit: 696
Many thanks for the tips Wink
I will try this.

You don't want to try binary ? The libcudart.so.10.0 is also available from the given link. You do not need to set up cuda sdk 10 (unless a driver problem appears but this may work without installing anything).
You can just copy VanitySearch50 and the libcudart.so.10.0 in a directory and set the LD_LIBRARY_PATH.
Code:
export LD_LIBRARY_PATH=.
./VanitySearch50 ...

This is mainly to see if the problem is solved with CUDA 10 or if it comes from elsewhere.
legendary
Activity: 1914
Merit: 2071
Another sub function, if you want to test it:


Code:
__device__ void ModSub256(uint64_t *rp, uint64_t *ap, uint64_t *bp) {

 
  uint64_t a0, a1, a2, a3, b0, b1, b2, b3, r0, r1, r2, r3;
  int8_t c0, c1, c2, c3;


  a0 = ap[0];
  a1 = ap[1];
  a2 = ap[2];
  a3 = ap[3];

  b0 = bp[0];
  b1 = bp[1];
  b2 = bp[2];
  b3 = bp[3];
 
  /*
  r0 = a0 - b0;
  c0 = (a0 < b0) ? 1 : -1;
  c0 = (r0 == 0) ? 0 : c0;
 
  r1 = a1 - b1;
  c1 = (a1 < b1) ? 1 : -1;
  c1 = (r1 == 0) ? c0 : c1;
  r1 = r1 - (c0 == 1);
  
  r2 = a2 - b2;
  c2 = (a2 < b2) ? 1 : -1;
  c2 = (r2 == 0) ? c1 : c2;
  r2 = r2 - (c1 == 1);

  r3 = a3 - b3;
  c3 = (a3 < b3) ? 1 : -1;
  c3 = (r3 == 0) ? c2 : c3;
  r3 = r3 - (c2 == 1);
  */


  
  c0 = a0 < b0;
  r0 = a0 - b0;
  
  c1 = a1 < b1;
  r1 = a1 - b1;
  if(r1 == 0){ c1 = c0;}
  if(c0) {r1 = r1 - 1;}
  

  c2 = a2 < b2;
  r2 = a2 - b2;
  if(r2 == 0){ c2 = c1;}
  if(c1) {r2 = r2 - 1;}

  c3 = a3 < b3;
  r3 = a3 - b3;
  if(r3 == 0){ c3 = c2;}
  if(c2) {r3 = r3 - 1;}

  
  if(c3 == 1){


if(r0 > 0x1000003d0){  //almost always --> no borrow
                
r0 = r0 - 0x1000003d1;

}
else{
                    
   //c[0] = (r0 < 0x1000003d1) ? 1 : -1;
   //c0 = (r0 == 0x1000003d1) ? 0 : 1;
                //c0 = 1; // for sure r0 < 0x1000003d1

                r0 = r0 - 0x1000003d1;
                r1 = r1  - 1;  //c0 is 1
      

                c1 = (r1 == 0xffffffffffffffff) ? 1 : -1;
                c2 = (r2 == 0) ? c1 : -1;

if(c1 == 1) r2 = r2 - 1;
if(c2 == 1) r3 = r3 - 1;

              
};
   };
  
  
  
  rp[0] = r0;
  rp[1] = r1;
  rp[2] = r2;
  rp[3] = r3;


  return;
 
}


legendary
Activity: 1914
Merit: 2071
New version is slower on my pc (132 MKeys/s against 162 MKeys/s).

On my Windows, performance are the same than the previous release (Cuda 10).
Slightly slower on Linux (Cuda 8.0), from 39.5MK/s to 37.9MK/s.

Anyway,
Do you compile or do you use Linux binaries ?
Do you solved your problem ? I didn't manage to reproduce the issue yet.


I compile the source myself. No, my problem is not solved. I have only Cuda 8.0.


Some ideas for (maybe) a little speed improvement:


1) in __device__ void ComputeKeys (GPUCompute.h) instead of doing HSIZE times

Code:
ModNeg256(dy,Gy[i]);  <--
ModSub256(dy, py);

you could do:

Code:
ModSub256(dy, pyn, Gy[i]);

and you compute only once pyn:

Code:
ModNeg256(pyn,py);

2) instead of

Code:
ModAdd256(py, Gy[i]);

Code:
ModSub256(py, sy);

To sum up:

Code:
ModSub256(dy, pyn, Gy[i]);

_ModMult(_s, dy, dx[i]);      //  s = (p2.y-p1.y)*inverse(p2.x-p1.x)
 _ModMult(_p2, _s, _s);        // _p = pow2(s)

ModSub256(px, _p2, px);
ModSub256(px, Gx[i]);         // px = pow2(s) - p1.x - p2.x;

ModSub256(py, sx, px);
 _ModMult(py, _s);             // py = - s*(ret.x-p2.x)
 ModSub256(py, sy);         // py = - p2.y - s*(ret.x-p2.x);  


3) in __device__ void ModSub256 instead of

Code:
     if ((int64_t)t < 0) {
    UADDO1(r[0], _P[0]);
    UADDC1(r[1], _P[1]);
    UADDC1(r[2], _P[2]);
    UADD1(r[3], _P[3]);
  }

it would be better something like that:

Code:
  if ((int64_t)t < 0) {
    USUBO1(r[0], 0x01000003d1);
    USUBC1(r[1], 0ULL);
    USUBC1(r[2], 0ULL);
    USUBC1(r[3], 0ULL);
  }

(I'm not sure what C means, I suppose means with carry)
sr. member
Activity: 462
Merit: 696
New version is slower on my pc (132 MKeys/s against 162 MKeys/s).

On my Windows, performance are the same than the previous release (Cuda 10).
Slightly slower on Linux (Cuda 8.0), from 39.5MK/s to 37.9MK/s.

Anyway,
Do you compile or do you use Linux binaries ?
Do you solved your problem ? I didn't manage to reproduce the issue yet.
legendary
Activity: 1914
Merit: 2071
A new release of VanitySearch (1.9) is out:

Code:
Added -b option (Search compressed or uncompressed addresses)
Improved performance for loading large prefix list
Fixed difficulty calculation bug for prefix containing only '1'


New version is slower on my pc (132 MKeys/s against 162 MKeys/s).
sr. member
Activity: 462
Merit: 696
Hello,

A new release of VanitySearch (1.9) is out:

Code:
Added -b option (Search compressed or uncompressed addresses)
Improved performance for loading large prefix list
Fixed difficulty calculation bug for prefix containing only '1'

Windows binaries: https://github.com/JeanLucPons/VanitySearch/releases/tag/1.9
Linux binaries: http://zelda38.free.fr/VanitySearch/ (Experimental)

Tanks to test it !
Smiley
legendary
Activity: 2758
Merit: 6830
Is this really legil in  asian countries like India HuhHuhHuh
Why would a Bitcoin address generator be ilegal anywhere?
newbie
Activity: 4
Merit: 0
Hello,

I would like to present a new bitcoin prefix address finder called VanitySearch. It is very similar to Vanitygen.
The main differences with Vanitygen are that VanitySearch is not using the heavy OpenSSL for CPU calculation and that the kernel is written in Cuda in order to take full advantage of inline PTX assembly.
On my Intel Core i7-4770, VanitySearch runs ~4 times faster than vanitygen64. (1.32 Mkey/s -> 5.27  MK/s)
On my  GeForce GTX 645, VanitySearch runs ~1.5 times faster than oclvanitygen. (9.26 Mkey/s -> 14.548 MK/s)
If you want to compare VanitySearch and Vanitygen result, use the -u option for searching uncompressed address.
VanitySearch may not compute a good gridsize for your GPU, so make several tries using -g options in order to find best performances.
Using compressed addresses is roughly 20% faster.

VanitySearch is available from https://github.com/JeanLucPons/VanitySearch

There is still lots of improvement to do.
Feel free to test it and to submit issue.

Thanks.
Sorry for my bad English.
Jean-Luc

Is this really legil in  asian countries like India HuhHuhHuh
sr. member
Activity: 462
Merit: 696
Linux binary are available for download here (experimental).
They are compiled with CUDA SDK10.
Thanks to test them Wink

http://zelda38.free.fr/VanitySearch/
sr. member
Activity: 462
Merit: 696
Hello,

it ran, but just closed after finding it
did it generate the private keys into a file?
I am confused

To output the key in a file, use the -o option.
Code:
VanitySearch -stop -gpu -o key.txt 1stortz

Many thanks stivensons for the report Smiley
jr. member
Activity: 82
Merit: 1
if you post a release windows , I can test it too  Smiley

You can test with the release you have.
You can try:
Code:
VanitySearch -gpuId 0 -check 
VanitySearch -gpuId 6 -check (On the 3GB)
Thanks Wink


Tomorow, I will try to set up cuda sdk 10 on a recent hardware (Linux) and see If I can reproduce the issue.





cuda 10

Code:
G:\vanitysearch>vanitysearch   -gpuId 0 -check
GetBase10() Results OK
Add() Results OK : 567.189 MegaAdd/sec
Mult() Results OK : 38.169 MegaMult/sec
Div() Results OK : 4.410 MegaDiv/sec
ModInv()/ModExp() Results OK
ModInv() : 281.352 KiloInv/sec
IntGroup.ModInv() : 8.365 MegaInv/sec
ModMulK1() : 10.770 MegaMult/sec
ModSqrt() OK !
Check Generator :OK
Check Double :OK
Check Add :OK
Check GenKey :OK
Adress : 15t3Nt1zyMETkHbjJTTshxLnqPzQvAtdCe OK!
Adress : 1BoatSLRHtKNngkdXEeobR76b53LETtpyT OK!
Adress : 1JeanLucgidKHxfY5gkqGmoVjo1yaU4EDt OK(comp)!
Adress : 1Test6BNjSJC5qwYXsjwKVLvz7DpfLehy OK!
Adress : 1BitcoinP7vnLpsUHWbzDALyJKnNo16Qms OK(comp)!
Check Calc PubKey (full) 1ViViGLEawN27xRzGrEhhYPQrZiTKvKLo :OK
Check Calc PubKey (even) 1Gp7rQ4GdooysEAEJAS2o4Ktjvf1tZCihp:OK
Check Calc PubKey (odd) 18aPiLmTow7Xgu96msrDYvSSWweCvB9oBA:OK
GPU: GPU #0 GeForce GTX 1060 6GB (10x128 cores) Grid(80x128)
Seed: 1853432973
296.742 MegaKey/sec
ComputeKeys() found 1947 items , CPU check...
GPU/CPU check OK

Code:
G:\vanitysearch>vanitysearch   -gpuId 6 -check
GetBase10() Results OK
Add() Results OK : 556.067 MegaAdd/sec
Mult() Results OK : 35.273 MegaMult/sec
Div() Results OK : 4.104 MegaDiv/sec
ModInv()/ModExp() Results OK
ModInv() : 260.561 KiloInv/sec
IntGroup.ModInv() : 7.773 MegaInv/sec
ModMulK1() : 9.881 MegaMult/sec
ModSqrt() OK !
Check Generator :OK
Check Double :OK
Check Add :OK
Check GenKey :OK
Adress : 15t3Nt1zyMETkHbjJTTshxLnqPzQvAtdCe OK!
Adress : 1BoatSLRHtKNngkdXEeobR76b53LETtpyT OK!
Adress : 1JeanLucgidKHxfY5gkqGmoVjo1yaU4EDt OK(comp)!
Adress : 1Test6BNjSJC5qwYXsjwKVLvz7DpfLehy OK!
Adress : 1BitcoinP7vnLpsUHWbzDALyJKnNo16Qms OK(comp)!
Check Calc PubKey (full) 1ViViGLEawN27xRzGrEhhYPQrZiTKvKLo :OK
Check Calc PubKey (even) 1Gp7rQ4GdooysEAEJAS2o4Ktjvf1tZCihp:OK
Check Calc PubKey (odd) 18aPiLmTow7Xgu96msrDYvSSWweCvB9oBA:OK
GPU: GPU #6 GeForce GTX 1060 3GB (9x128 cores) Grid(72x128)
Seed: 2205931314
260.131 MegaKey/sec
ComputeKeys() found 1752 items , CPU check...
GPU/CPU check OK
jr. member
Activity: 40
Merit: 15
I tried your program with the parameters as shown in the sample + my username
Code:
-stop -gpu 1stortz

it ran, but just closed after finding it
did it generate the private keys into a file?
I am confused
sr. member
Activity: 462
Merit: 696
if you post a release windows , I can test it too  Smiley

You can test with the release you have.
You can try:
Code:
VanitySearch -gpuId 0 -check 
VanitySearch -gpuId 6 -check (On the 3GB)
Thanks Wink


Tomorow, I will try to set up cuda sdk 10 on a recent hardware (Linux) and see If I can reproduce the issue.



jr. member
Activity: 82
Merit: 1
Ok Thanks, could you try to run cuda-memcheck on the release version.


if you post a release windows , I can test it too  Smiley

legendary
Activity: 1914
Merit: 2071
Ok Thanks, could you try to run cuda-memcheck on the release version.



Code:
~/VanitySearch-1.8$ /usr/local/cuda-8.0/bin/cuda-memcheck --tool memcheck VanitySearch -g 1 -check
========= CUDA-MEMCHECK
GetBase10() Results OK
Add() Results OK : 123.457 MegaAdd/sec
Mult() Results OK : 23.148 MegaMult/sec
Div() Results OK : 5.208 MegaDiv/sec
ModInv()/ModExp() Results OK
ModInv() : 341.317 KiloInv/sec
IntGroup.ModInv() : 9.130 MegaInv/sec
ModMulK1() : 12.968 MegaMult/sec
ModSqrt() OK !
Check Generator :OK
Check Double :OK
Check Add :OK
Check GenKey :OK
Adress : 15t3Nt1zyMETkHbjJTTshxLnqPzQvAtdCe OK!
Adress : 1BoatSLRHtKNngkdXEeobR76b53LETtpyT OK!
Adress : 1JeanLucgidKHxfY5gkqGmoVjo1yaU4EDt OK(comp)!
Adress : 1Test6BNjSJC5qwYXsjwKVLvz7DpfLehy OK!
Adress : 1BitcoinP7vnLpsUHWbzDALyJKnNo16Qms OK(comp)!
Check Calc PubKey (full) 1ViViGLEawN27xRzGrEhhYPQrZiTKvKLo :OK
Check Calc PubKey (even) 1Gp7rQ4GdooysEAEJAS2o4Ktjvf1tZCihp:OK
Check Calc PubKey (odd) 18aPiLmTow7Xgu96msrDYvSSWweCvB9oBA:OK
GPU: GPU #0 Quadro M2200 (8x128 cores) Grid(1x128)
Seed: 223215
95.697 KiloKey/sec
ComputeKeys() found 26 items , CPU check...
Expected item not found 3412bb65 cb39a716 67dcd486 209b19df c65e364c
Expected item not found fefea644 d535267a 46308e46 c579e91b 0aad3ee2
Expected item not found 3412726b 9830f325 9c5f0d95 a99e2a9b 6c473922
Expected item not found 341292e1 b4a39d2c 59e34f3d 38725b42 dfc2e801
Expected item not found fefeba57 c1209e3d 1b79200c b9529018 de0e35e4
Expected item not found fefe4aaa 34f02402 4ed76c83 a1d60efc 8c79f7a6
Expected item not found fefe8742 63e9b7bc b13a08f1 28229fd8 30987ed3
CPU found 22 items
========= ERROR SUMMARY: 0 errors
sr. member
Activity: 462
Merit: 696
Ok Thanks, could you try to run cuda-memcheck on the release version.
Pages:
Jump to: