Pages:
Author

Topic: VanitySearch (Yet another address prefix finder) - page 51. (Read 32072 times)

legendary
Activity: 1932
Merit: 2077
Another nice option would be adding the split-key generator, in this way could be solved these works :

https://vanitypool.appspot.com/availableWork

Yes that could be interesting.
But as we start with an offset (the given public key), can we still apply symmetry and endomorpshim optimization ?


Yes.

Let P be the public key (offset). Let be 'secret' the private key of P.

Let 's' be the start random private key you generate from the seed, and s*G = Q the current start point.

Instead of starting from P, you start from Q+P. Let's say Q' = Q + P

Now, you find a public key R' such that --> ripemd160(sha256(R')) = address with a certain prefix.

Your program works exactly like now, it finds only the private key of Q, not of Q'. Simply you get a partial key of Q instead of the final private key of Q'.


Now there are 3 possibilities:

if R' is a standard point (no symmetry / endo), then you have found R' = R + P and you know only the private key k of R (k*G = R), not the private key of R'. Then k is the correct partial private key.
The final user needs to add the k found to the secret private key of P (that he only knows) to get the final private key:

final private key = (k + secret) mod n  (explanation:  (k + secret)*G = k*G + secret*G = R + P = R')


if R' is a simmetric point, then R' = -(R + P). You found the private key k of -R (k*G = -R), nothing more.

The final user needs to build the final private key this way:

final private key = (k - secret) mod n  (explanation:  (k -secret)*G = k*G - secret*G = -R - P = R')



if R' is a endo point, then R' = lambda1*(R + P).  (or lambda2)

You found the private key k of lambda1*R (k*G = lambda1*R), then:

final private key = (  k  + lambda1*secret)) mod n
(explanation:  (k + lambda1*secret)*G = k*G + lambda1*secret*G = lambda1*R + lambda1*P = R')
sr. member
Activity: 462
Merit: 701
Another nice option would be adding the split-key generator, in this way could be solved these works :

https://vanitypool.appspot.com/availableWork

Yes that could be interesting.
But as we start with an offset (the given public key), can we still apply symmetry and endomorpshim optimization ?
legendary
Activity: 1932
Merit: 2077
Another nice option would be adding the split-key generator, in this way could be solved these works :

https://vanitypool.appspot.com/availableWork
legendary
Activity: 1382
Merit: 1122
Very cool. I'm glad to see a compressed key option, but of course would love native segwit/P2SH as well (as I see has already been brought up).

I'll test it out when I have a chance! Thanks Jean_Luc for taking this on!
member
Activity: 117
Merit: 32
Hello, it seems to me faster

works perfectly

Code:
VanitySearchCUDA8..exe -gpu 1Testrrr
VanitySearch v1.10
Difficulty: 51529903411245
Search: 1Testrrr [Compressed]
Start Sat Mar 30 12:38:01 2019
Base Key:57AA37BF4DB32E5668B51B6599428CF30FB675996C0A10FDA84761AE0B3C3BE9
Number of CPU thread: 3
GPU: GPU #0 GeForce GT 520M (1x48 cores) Grid(8x128)
10.233 MK/s (GPU 7.026 MK/s) (2^27.89) [P 0.00%][50.00% in 40.4d][0]
sr. member
Activity: 462
Merit: 701
I updated the GitHub release and I added the CUDA8 binary.
I managed to work around the compiler issue.
No both CPU and GPU runs at nominal speed.

https://github.com/JeanLucPons/VanitySearch/releases/tag/1.10

Thanks to test Wink
sr. member
Activity: 462
Merit: 701
Thank you very much for testing Smiley
I'm starting to fight with VS2015. It seems that this time the problem comes from sub_borrow Cheesy
member
Activity: 117
Merit: 32
good evening Jean_Luc thank you for the great work here is the result of my tests on my configuration NVIDIA GeForce GT520M Intel core I5-2430M

Code:
VanitySearchCUDA8.exe -check
VanitySearch v1.10
GetBase10() Results OK
Add() Results OK : 61.690 MegaAdd/sec
Mult() Results OK : 6.542 MegaMult/sec
Div() Results OK : 834.376 KiloDiv/sec
ModInv()/ModExp() Results OK
ModInv() Results OK : 103.923 KiloInv/sec
IntGroup.ModInv() Results OK : 1.565 MegaInv/sec
ModMulK1() Results OK : 1.939 MegaMult/sec
ModSquareK1() Results OK : 1.922 MegaMult/sec
ModMulK1order() Results OK : 1.031 MegaMult/sec
ModSqrt() Results OK !
Check Generator :OK
Check Double :OK
Check Add :OK
Check GenKey :OK
Adress : 15t3Nt1zyMETkHbjJTTshxLnqPzQvAtdCe OK!
Adress : 1BoatSLRHtKNngkdXEeobR76b53LETtpyT OK!
Adress : 1JeanLucgidKHxfY5gkqGmoVjo1yaU4EDt OK(comp)!
Adress : 1Test6BNjSJC5qwYXsjwKVLvz7DpfLehy OK!
Adress : 1BitcoinP7vnLpsUHWbzDALyJKnNo16Qms OK(comp)!
Adress : 16S5PAsGZ8VFM1CRGGLqm37XHrp46f6CTn OK(comp)!
Adress : 1Tst2RwMxZn9cYY5mQhCdJic3JJrK7Fq7 OK(comp)!
Check Calc PubKey (full) 1ViViGLEawN27xRzGrEhhYPQrZiTKvKLo :OK
Check Calc PubKey (even) 1Gp7rQ4GdooysEAEJAS2o4Ktjvf1tZCihp:OK
Check Calc PubKey (odd) 18aPiLmTow7Xgu96msrDYvSSWweCvB9oBA:OK
GPU: GPU #0 GeForce GT 520M (1x48 cores) Grid(8x128)
Seed: 1024278713
7.191 MegaKey/sec
ComputeKeys() found 176 items , CPU check...
GPU/CPU check OK

Code:
VanitySearchCUDA8.exe -l
VanitySearch v1.10
GPU #0 GeForce GT 520M (1x48 cores) (Cap 2.1) (1024.0 MB) (Multiple host threads)

Code:
VanitySearchCUDA8.exe -t 4 -gpu 1Testrrr
VanitySearch v1.10
Difficulty: 51529903411245
Search: 1Testrrr [Compressed]
Start Sat Mar 30 01:58:45 2019
Base Key:EB4A2ED806A60F7C834D62E4CA6C12DFA10C57FDD8075060EADC02E849168E02
Number of CPU thread: 4
GPU: GPU #0 GeForce GT 520M (1x48 cores) Grid(8x128)
7.731 MK/s (GPU 6.929 MK/s) (2^27.07) [P 0.00%][50.00% in 53.4d][0]

Code:
VanitySearchCUDA8.exe -gpu 1Testrrr
VanitySearch v1.10
Difficulty: 51529903411245
Search: 1Testrrr [Compressed]
Start Sat Mar 30 01:45:44 2019
Base Key:497A23E01DE91DCC1958FF3111AC7158E295766E266BEBD85879A53689C3CEF3
Number of CPU thread: 3
GPU: GPU #0 GeForce GT 520M (1x48 cores) Grid(8x128)
7.770 MK/s (GPU 7.058 MK/s) (2^28.94) [P 0.00%][50.00% in 53.1d][0]

only CPU - much slow

Code:
1Testrrr
VanitySearch v1.10
Difficulty: 51529903411245
Search: 1Testrrr [Compressed]
Start Sat Mar 30 02:06:38 2019
Base Key:1CDE92D7AB47587457BFE29138D7F766885CACEF828B2EBF2C27008857B3F019
Number of CPU thread: 4
0.840 MK/s (GPU 0.000 MK/s) (2^24.02) [P 0.00%][50.00% in 1.3y][0]

 Smiley

sr. member
Activity: 462
Merit: 701
Hi,

I compiled VanitySearch for Windows with CUDA8 (Only for 2.0 compute cap).
Unfortunately, VS2015 performs wrong optimizations Sad so I had to disable optimization for CPU code.
But the GPU code seems to work as expected with the normal speed.
I will see if I can find where the compiler fails and if I can find a work around.
Thanks to test it.

http://zelda38.free.fr/VanitySearch/
sr. member
Activity: 462
Merit: 701
Is there a theoretical model that would allow to calculate the maximum performance for a given hardware?

This would give an idea on how much more optimization you can achieve.

It is difficult to say.
Concerning VanitySearch I think we are near to the maximum if we keep present algorithms.
There is still few things to do but to get a significant performance increase it would need new algorithms such as 'partial' SHA or RIPE reversing in order to stop some calculation before getting the complete result or other thing like that...

legendary
Activity: 1484
Merit: 1491
I forgot more than you will ever know.
Better than nothing.

Is there a theoretical model that would allow to calculate the maximum performance for a given hardware?

This would give an idea on how much more optimization you can achieve.
sr. member
Activity: 462
Merit: 701
Hello,

I set up the GTX 1050 Ti and I implemented the funnelshit for SHA and RIPE rotation (not yet for ModInv)
I was waiting for a more significant performance increase (I got only a little bit less than 3%).
Better than nothing.

Code:
C:\C++\VanitySearch\x64\ReleaseSM30>VanitySearch.exe -t 0 -gpu 1Testtttt
VanitySearch v1.11
Difficulty: 2988734397852221
Search: 1Testtttt [Compressed]
Start Thu Mar 28 14:48:27 2019
Base Key:3ECA27E3A98E4267E3D308CAA7E66B8972C31C4C02A7D16616BA46C32C59AFAC
Number of CPU thread: 0
GPU: GPU #0 GeForce GTX 1050 Ti (6x128 cores) Grid(48x128)
220.180 MK/s (GPU 220.180 MK/s) (2^32.76) [P 0.00%][50.00% in 109.4d][0]

Code:
C:\C++\VanitySearch\x64\ReleaseSM30>VanitySearch.exe -t 0 -gpu 1Testtttt
VanitySearch v1.11
Difficulty: 2988734397852221
Search: 1Testtttt [Compressed]
Start Thu Mar 28 14:51:10 2019
Base Key:7B8EEDDA6E7E418C9639AB5BBF0C14D2487D676ADDE6FC494F2504D3A026EF3B
Number of CPU thread: 0
GPU: GPU #0 GeForce GTX 1050 Ti (6x128 cores) Grid(48x128)
226.483 MK/s (GPU 226.483 MK/s) (2^32.85) [P 0.00%][50.00% in 106.4d][0]
sr. member
Activity: 462
Merit: 701
Thanks for adding the version number!  Grin

You're welcome. Wink

Anyway, I managed to get back a used GTX 1050ti and I should be able to implement the funnel shift (for compute cap>3.5) which should speed up hashing and ModInv 62bit shift (unless nvcc is smart enough to use funnel shift alone when it sees something like ((x>>(32-n))|(x<

donator
Activity: 4760
Merit: 4323
Leading Crypto Sports Betting & Casino Platform
Hello,

I published a new release (1.10):

-Support for compressed private key (Tested with Electrum 3.3.4)
-Slight performance increase

Thanks to test it
Have fun Wink

Thanks for adding the version number!  Grin
sr. member
Activity: 462
Merit: 701

Have you a _ModSqrMontgomery function?


No.

On the CPU:
The DRS62 ModInv cost ~160 ModSquareK1(), however the DRS62 works for all odd prime.
An optimization can also be done for SecpK1 prime as there is 2 mul by P.
DRS62: 362.696 KiloI/sec
ModSquareK1: 58.717 MegaS/sec

On the GPU, the 62bit right shift can also be optimized by the funnel shift.
legendary
Activity: 1932
Merit: 2077

Have you a _ModSqrMontgomery function?

I would try to compute the inverse this way:

Code:
__device__  void _ModInv(uint64_t* a) {

  uint64_t x2[4], x3[4], x6[4], x9[4], x11[4], x22[4], x44[4], x88[4], x176[4], x220[4], x223[4], t1[4];
  uint8_t j;

  /** The binary representation of (p - 2) has 5 blocks of 1s, with lengths in
    *  { 1, 2, 22, 223 }. Use an addition chain to calculate 2^n - 1 for each block:
    *  [1], [2], 3, 6, 9, 11, [22], 44, 88, 176, 220, [223]
    */

  _ModSqr(x2, a);
  _ModMult(x2, a);

  _ModSqr(x3, x2);
  _ModMult(x3, a);

  memcpy(x6,x3,32);
  _ModSqr(x6);
  _ModSqr(x6);
  _ModSqr(x6);
  _ModMult(x6, x3);


  memcpy(x9,x6,32);
  _ModSqr(x9);
  _ModSqr(x9);
  _ModSqr(x9);
  _ModMult(x9, x3);

  memcpy(x11,x9,32);
  _ModSqr(x11);
  _ModSqr(x11);
  _ModMult(x11, x2);

  memcpy(x22,x11,32);
  for (j=0; j<11; j++) {
    _ModSqr(x22);
  }
  _ModMult(x22, x11);

  memcpy(x44,x22,32);
  for (j=0; j<22; j++) {
    _ModSqr(x44);
  }
  _ModMult(x44, x22);

  memcpy(x88,x44,32);
  for (j=0; j<44; j++) {
    _ModSqr(x88);
  }
  _ModMult(x88, x44);

  memcpy(x176,x88,32);
  for (j=0; j<88; j++) {
    _ModSqr(x176);
  }
  _ModMult(x176, x88);

  memcpy(x220,x176,32);
  for (j=0; j<44; j++) {
    _ModSqr(x220);
  }
  _ModMult(x220, x44);

  memcpy(x223,x220,32);
  _ModSqr(x223);
  _ModSqr(x223);
  _ModSqr(x223);
  _ModMult(x223, x3);

/* The final result is then assembled using a sliding window over the blocks. */

  memcpy(t1,x223,32);
  for (j=0; j<23; j++) {
    _ModSqr(t1);
  }
  _ModMult(t1, x22);
  _ModSqr(t1);
  _ModSqr(t1);
  _ModSqr(t1);
  _ModSqr(t1);
  _ModSqr(t1);
 
  _ModMult(t1, a);
  _ModSqr(t1);
  _ModSqr(t1);
  _ModSqr(t1);
 
  _ModMult(t1, t1, x2);
  _ModSqr(t1);
  _ModSqr(t1);
 
  _ModMult(a, t1);

}
sr. member
Activity: 462
Merit: 701
Hello,

I published a new release (1.10):

-Support for compressed private key (Tested with Electrum 3.3.4)
-Slight performance increase

Thanks to test it
Have fun Wink
legendary
Activity: 1932
Merit: 2077
gcc version 7.0.1 20170407 (experimental) [trunk revision 246759] (Ubuntu 7-20170407-0ubuntu2)

Ok. I observed the issue with gcc 6 but with my gcc 7.3.0 it worked. It seems that this optimization bug is still here with 7.0.1. mmm... I will add a test for minor version and let the volatile up to gcc < 7.3. I tried with gcc 8.2 and it also works.
Thanks for the report.



With -O0 in the makefile
Code:
CXXFLAGS   =  -DWITHGPU -m64 -mssse3 -Wno-write-strings -O0 -I. -I$(CUDA)/include

it works without "volatile".
sr. member
Activity: 462
Merit: 701
gcc version 7.0.1 20170407 (experimental) [trunk revision 246759] (Ubuntu 7-20170407-0ubuntu2)

Ok. I observed the issue with gcc 6 but with my gcc 7.3.0 it worked. It seems that this optimization bug is still here with 7.0.1. mmm... I will add a test for minor version and let the volatile up to gcc < 7.3. I tried with gcc 8.2 and it also works.
Thanks for the report.

legendary
Activity: 1932
Merit: 2077
Hi, I've just downloaded the VanitySearch Master, it works perfectly if I add "volatile" in this piece of code:

OK, which release of gcc are you using for compiling VanitySearch (not the CUDA code) ?


gcc version 7.0.1 20170407 (experimental) [trunk revision 246759] (Ubuntu 7-20170407-0ubuntu2)
Pages:
Jump to: