VanitySearch (Yet another address prefix finder) - page 53.

arulbero

legendary

Activity: 1968

Merit: 2130

Have you a _ModSqrMontgomery function?

I would try to compute the inverse this way:

Code:

__device__ void _ModInv(uint64_t* a) {

uint64_t x2[4], x3[4], x6[4], x9[4], x11[4], x22[4], x44[4], x88[4], x176[4], x220[4], x223[4], t1[4];
uint8_t j;

/** The binary representation of (p - 2) has 5 blocks of 1s, with lengths in
* { 1, 2, 22, 223 }. Use an addition chain to calculate 2^n - 1 for each block:
* [1], [2], 3, 6, 9, 11, [22], 44, 88, 176, 220, [223]
*/

_ModSqr(x2, a);
_ModMult(x2, a);

_ModSqr(x3, x2);
_ModMult(x3, a);

memcpy(x6,x3,32);
_ModSqr(x6);
_ModSqr(x6);
_ModSqr(x6);
_ModMult(x6, x3);

memcpy(x9,x6,32);
_ModSqr(x9);
_ModSqr(x9);
_ModSqr(x9);
_ModMult(x9, x3);

memcpy(x11,x9,32);
_ModSqr(x11);
_ModSqr(x11);
_ModMult(x11, x2);

memcpy(x22,x11,32);
for (j=0; j<11; j++) {
_ModSqr(x22);
}
_ModMult(x22, x11);

memcpy(x44,x22,32);
for (j=0; j<22; j++) {
_ModSqr(x44);
}
_ModMult(x44, x22);

memcpy(x88,x44,32);
for (j=0; j<44; j++) {
_ModSqr(x88);
}
_ModMult(x88, x44);

memcpy(x176,x88,32);
for (j=0; j<88; j++) {
_ModSqr(x176);
}
_ModMult(x176, x88);

memcpy(x220,x176,32);
for (j=0; j<44; j++) {
_ModSqr(x220);
}
_ModMult(x220, x44);

memcpy(x223,x220,32);
_ModSqr(x223);
_ModSqr(x223);
_ModSqr(x223);
_ModMult(x223, x3);

/* The final result is then assembled using a sliding window over the blocks. */

memcpy(t1,x223,32);
for (j=0; j<23; j++) {
_ModSqr(t1);
}
_ModMult(t1, x22);
_ModSqr(t1);
_ModSqr(t1);
_ModSqr(t1);
_ModSqr(t1);
_ModSqr(t1);

_ModMult(t1, a);
_ModSqr(t1);
_ModSqr(t1);
_ModSqr(t1);

_ModMult(t1, t1, x2);
_ModSqr(t1);
_ModSqr(t1);

_ModMult(a, t1);

}

Jean_Luc

sr. member

Activity: 462

Merit: 701

Hello,

I published a new release (1.10):

-Support for compressed private key (Tested with Electrum 3.3.4)
-Slight performance increase

Thanks to test it
Have fun Wink

arulbero

legendary

Activity: 1968

Merit: 2130

Quote from: Jean_Luc on March 26, 2019, 02:27:29 PM

Quote from: arulbero on March 26, 2019, 02:19:31 PM

gcc version 7.0.1 20170407 (experimental) [trunk revision 246759] (Ubuntu 7-20170407-0ubuntu2)

Ok. I observed the issue with gcc 6 but with my gcc 7.3.0 it worked. It seems that this optimization bug is still here with 7.0.1. mmm... I will add a test for minor version and let the volatile up to gcc < 7.3. I tried with gcc 8.2 and it also works.
Thanks for the report.

With -O0 in the makefile

Code:

CXXFLAGS = -DWITHGPU -m64 -mssse3 -Wno-write-strings -O0 -I. -I$(CUDA)/include

it works without "volatile".

Jean_Luc

sr. member

Activity: 462

Merit: 701

Quote from: arulbero on March 26, 2019, 02:19:31 PM

gcc version 7.0.1 20170407 (experimental) [trunk revision 246759] (Ubuntu 7-20170407-0ubuntu2)

Ok. I observed the issue with gcc 6 but with my gcc 7.3.0 it worked. It seems that this optimization bug is still here with 7.0.1. mmm... I will add a test for minor version and let the volatile up to gcc < 7.3. I tried with gcc 8.2 and it also works.
Thanks for the report.

arulbero

legendary

Activity: 1968

Merit: 2130

Quote from: Jean_Luc on March 26, 2019, 01:58:15 PM

Quote from: arulbero on March 26, 2019, 01:39:09 PM

Hi, I've just downloaded the VanitySearch Master, it works perfectly if I add "volatile" in this piece of code:

OK, which release of gcc are you using for compiling VanitySearch (not the CUDA code) ?

gcc version 7.0.1 20170407 (experimental) [trunk revision 246759] (Ubuntu 7-20170407-0ubuntu2)

Jean_Luc

sr. member

Activity: 462

Merit: 701

Quote from: arulbero on March 26, 2019, 01:39:09 PM

Hi, I've just downloaded the VanitySearch Master, it works perfectly if I add "volatile" in this piece of code:

OK, which release of gcc are you using for compiling VanitySearch (not the CUDA code) ?

arulbero

legendary

Activity: 1968

Merit: 2130

Hi, I've just downloaded the VanitySearch Master, it works perfectly if I add "volatile" in this piece of code:

Code:

void Int::ModSquareK1(Int *a) {

#ifndef WIN64
#if __GNUC__ <= 6
  #warning "GCC lass than 7 detected, upgrade gcc to get best perfromance"
  volatile unsigned char c; <--
#else
  volatile unsigned char c; <--
#endif
#else
  unsigned char c;
#endif

Jean_Luc

sr. member

Activity: 462

Merit: 701

Yes, today the default is to free only one core when GPU is enabled, it will change this to number of GPU.

OgNasty

donator

Activity: 4760

Merit: 4323

Leading Crypto Sports Betting & Casino Platform

Quote from: Jean_Luc on March 24, 2019, 11:57:42 AM

Yes, This is because with -t 8, your CPU become a bottleneck and cannot handle GPU/CPU exchange.
When having good GPU keyrate, it is generally better to free 1 CPU core per GPU.

I think most users with newer GPUs would benefit from the power efficiency gains of running with -t 0. I would even argue that should be the default when a GPU is detected instead of the other way around where you have to enable GPUs.

Edit: I'd also like to see the version number shown with the startup information.

Jean_Luc

sr. member

Activity: 462

Merit: 701

Quote from: Telariust on March 24, 2019, 07:25:46 AM

Do you recognize this crash error?

No I never experienced this crash. Thanks for the infos Wink

Quote from: asche on March 24, 2019, 07:55:12 AM

Is this included in your roadmap?

Salut

I'm not yet familiar with P2SH addresses, I have to learn in detail. May be for 1-to-1 multisig P2SH.

Quote from: asche on March 24, 2019, 07:55:12 AM

Nice work anyway!

Thanks

Quote from: RobertPaulig on March 24, 2019, 09:36:56 AM

It is very strange with the process slower than without it.

Yes, This is because with -t 8, your CPU become a bottleneck and cannot handle GPU/CPU exchange.
When having good GPU keyrate, it is generally better to free 1 CPU core per GPU.

Quote from: RobertPaulig on March 24, 2019, 09:36:56 AM

Jean_Luc, thank you for your hard work. If you break execution? Whether to keep VanitySearch a result?

If you are using a passphrase, and if you want to restart a search, you have to change your passphrase (1 character is enough) otherwise you will recompute exactly the same thing. If you're using the default random seed, the seed will change so you won't recompute the same thing, no need to save anything.
But I recommend to use a passphrase in order to generate safe private keys.

RobertPaulig

newbie

Activity: 7

Merit: 1

Win10, Cuda 10
i7 3700k, 8 Gb RAM

Code:

vanitysearch -stop -t 0 -gpu -gpuId 0 -i input_addres.txt -o output_file.txt
Search: 1Testtttt [Compressed]
Start Sun Mar 24 17:22:35 2019
Base Key:E50C09A69B313FCC6480B3390C47BBD55D6FFFEEBBC36D3881E011AE0330275
Number of CPU thread: 0
GPU: GPU #0 GeForce GTX 1080 Ti (28x128 cores) Grid(224x128)
967.926 MK/s (GPU 967.926 MK/s) (2^32.44) [P 0.00%][50.00% in 24.9d][0]0]

Code:

vanitysearch -stop -t 8 -gpu -gpuId 0 -i input_addres.txt -o output_file.txt
Difficulty: 2988734397852221
Search: 1Testtttt [Compressed]
Start Sun Mar 24 17:26:34 2019
Base Key:912441F08928FCEF7B5D6F9A1232221AF9FF3F6E653586F9146625C436060099
Number of CPU thread: 8
GPU: GPU #0 GeForce GTX 1080 Ti (28x128 cores) Grid(224x128)
914.418 MK/s (GPU 896.216 MK/s) (2^33.38) [P 0.00%][50.00% in 26.3d][0]0]

It is very strange with the process slower than without it.

Jean_Luc, thank you for your hard work. If you break execution? Whether to keep VanitySearch a result?

asche

legendary

Activity: 1484

Merit: 1491

I forgot more than you will ever know.

Salut Jean-Luc

Do you plan to add support for P2SH (segwit starting with 3) adresses anytime soon to your tool? That would be a nice to have.

For instance this project by nullios implemented both P2SH and bech32 addies.

Is this included in your roadmap?

Nice work anyway!

Telariust

jr. member

Activity: 38

Merit: 18

Quote from: Jean_Luc on March 24, 2019, 12:32:04 AM

Here is the code of oclvanitygen to perform an addition with carry:

Code:

#define bn_addc_word(r, a, b, t, c) do { \
t = a + b + c; \
c = (t < a) ? 1 : ((c & (t == a)) ? 1 : 0); \
r = t; \
} while (0)

This code maybe have problem, look

post moved to https://bitcointalksearch.org/topic/m.52110068

Jean_Luc

sr. member

Activity: 462

Merit: 701

Quote from: Lolo54 on March 23, 2019, 03:57:09 PM

for the moment only on linux but it seems to me that jean_luc try or will try to adjust for windows also .... it is more difficult than for linux I think

Yes , on Windows no way to set up CUDA SDK 8.0 if a recent compiler (VC2017) is installed, even if the good one (VC2013) is also installed. The SDK setup fails. So the only solution is to start from a fresh install without VC2017 installed.

Quote from: DaveF on March 23, 2019, 09:00:24 PM

Also, any chance of OpenCL or is it going to only be only CUDA?
Thanks,
Dave

The problem with OpenCL is that I don't know how to access to the carry flag and how to perform a wide 64bit multiplication (i64xi64=>i128).

For instance:

Here is the code of oclvanitygen to perform an addition with carry:

Code:

#define bn_addc_word(r, a, b, t, c) do { \
t = a + b + c; \
c = (t < a) ? 1 : ((c & (t == a)) ? 1 : 0); \
r = t; \
} while (0)

This can be reduced to a single adc instruction with CUDA (and also with Visual C++, gcc, etc...) !
Some OpenCL driver compilers are smart enough to understand this code and reduce it to a single adc instruction but not all !

For the wide 64bit multiplication (i64xi64=>i128), CUDA offer the needed instructions (mul.lo.u64 and mul.hi.u64), but with OpenCL is seems that the only way is to use 32bit integer and to use 64bit integer to perform the multiplication (i32xi32=>i64).

If an OpenCL expert know how to perform this efficiently, it would be great.

DaveF

legendary

Activity: 3500

Merit: 6320

Crypto Swap Exchange

Also, any chance of OpenCL or is it going to only be only CUDA?
Thanks,
Dave

Lolo54

member

Activity: 131

Merit: 32

Quote from: DaveF on March 23, 2019, 03:19:55 PM

Quote from: Jean_Luc on March 22, 2019, 09:10:25 AM

An other report from a user using CUDA 8 and gcc 4.8 on a GeForce GTX 460. It works.

Does CUDA 8 work on Windows or only Linux?

-Dave

for the moment only on linux but it seems to me that jean_luc try or will try to adjust for windows also .... it is more difficult than for linux I think

DaveF

legendary

Activity: 3500

Merit: 6320

Crypto Swap Exchange

Quote from: Jean_Luc on March 22, 2019, 09:10:25 AM

An other report from a user using CUDA 8 and gcc 4.8 on a GeForce GTX 460. It works.

Does CUDA 8 work on Windows or only Linux?

-Dave

arulbero

legendary

Activity: 1968

Merit: 2130

Quote from: Jean_Luc on March 23, 2019, 12:51:01 PM

Quote from: arulbero on March 23, 2019, 12:42:24 PM

How it is possible?? Huh

Found by the CPU ? Try with -t 0...

Ok! Mystery solved!

Jean_Luc

sr. member

Activity: 462

Merit: 701

Quote from: arulbero on March 23, 2019, 12:42:24 PM

How it is possible?? Huh

Found by the CPU ? Try with -t 0...

arulbero

legendary

Activity: 1968

Merit: 2130

Very strange error.

If I mod the function __device__ void _ModMult(uint64_t *r, uint64_t *a, uint64_t *b) in any way, for example like this:

Code:

  // Reduce from 320 to 256 
  UADD1(t[4],0ULL);
  UMULLO(al,t[4], 0x1000003D1ULL);
  UMULHI(ah,t[4], 0x1000003D1ULL);
  UADDO(r[0],r512[0], al);
  UADDC(r[1],r512[1], ah);
  UADDC(r[2],r512[2], 0ULL);
  UADD(r[3],r512[3], 0ULL);

  UADD1(r[3],0x07ULL);  <-- error!!!

I got all errors like it should be with the check option:

Code:

CPU found 1539 items
GPU: point   correct [0/271]
GPU: endo #1 correct [0/248]
GPU: endo #2 correct [0/260]
GPU: sym/point   correct [0/255]
GPU: sym/endo #1 correct [0/265]
GPU: sym/endo #2 correct [0/240]
GPU/CPU check Failed !

but I got instead the correct result with the standard command:

Code:

~/VanitySearch$ ./VanitySearch -stop -t 7 -gpu 1111
Difficulty: 16777216
Search: 1111 [Compressed]
Start Sat Mar 23 18:39:22 2019
Base Key:12FF1E3D528DC8068438E8ED181E1F2505E877A7543869B0B38E500F5FA284F9
Number of CPU thread: 7
GPU: GPU #0 Quadro M2200 (8x128 cores) Grid(64x128)

Pub Addr: 1111Cf8ucVbgUtANTRGwQsWVpXVZvqFT6
Prv Addr: 5HxepgskWZ53AokCCvk8d1ZZGinupSX4Sm7tNQygZ9zQpkftRQJ
Prv Key : 0x12FF1E3D528DC8068438E8ED181E1F2505E877A7543869B5B38E500F5FA4D5D3
Check   : 1DFm6mzxxKqFo9bysKC9x1TxEz5Z9d9uAb
Check   : 1111Cf8ucVbgUtANTRGwQsWVpXVZvqFT6 (comp)

How it is possible?? Huh

Topic: VanitySearch (Yet another address prefix finder) - page 53. (Read 33086 times)