Author

Topic: Vanitygen: Vanity bitcoin address generator/miner [v0.22] - page 186. (Read 1153743 times)

member
Activity: 67
Merit: 130
Just built up a script (pywallet.py 1.0) allowing export/import private keys in shortened format (mostly as a lightweight alternative to showwallet for those who didn't manage to compile the branch). Requires only openssl libs (for elliptic curve cryptography). URL: https://github.com/joric/pywallet

jr. member
Activity: 42
Merit: 1000
pc
sr. member
Activity: 253
Merit: 250
Indeed!  Try running two instances at four threads each.  If the OS X scheduler is smart, it will isolate each to a processor package to minimize the cost of contention.

Fascinating. Running two instances at four threads each gives me each instance running about 260000–275000 K/s or so, and each taking up a bit under 400% (probably about as much as they can with the other programs I have running here).
full member
Activity: 140
Merit: 430
Firstbits: 1samr7
Well, I'm pretty sure it's still faster than it was on the old version, even with the old running at 8 threads, but I'd need to recompile the older version if I wanted to compare. Just running a case-insensitive prefix:

Code:
cebu:~% nice ./Applications/vanitygen -i -t 1 1abcdefg
Difficulty: 13628644118
[80020 K/s][total 501760][Prob 0.0%][50% in 1.4d]

That's oddly slow, you should be getting about twice that key rate on that CPU.

Quote
Code:
cebu:~% nice ./Applications/vanitygen -i -t 4 1abcdefg
Difficulty: 13628644118
[299808 K/s][total 2408448][Prob 0.0%][50% in 8.8h]

Up to this point, CPU usage in Activity Monitor is about what one expect, being roughly 100% times the number of threads.

Code:
cebu:~% nice ./Applications/vanitygen -i -t 5 1abcdefg
Difficulty: 13628644118
[262992 K/s][total 4264960][Prob 0.0%][50% in 10.0h]

5 threads was having CPU hovering between 420% and 440%, and a lower keygen rate, which makes me think that there's some kind of contention for something that's not CPU-bound.

Indeed!  Try running two instances at four threads each.  If the OS X scheduler is smart, it will isolate each to a processor package to minimize the cost of contention.
hero member
Activity: 756
Merit: 502
I don't think I was clear before: I have 8 physical cores, and hyperthreading is on, so I see 16 logical CPUs in Activity Monitor. I wasn't surprised with the older version when it maxed out performance at 8 as opposed to 16, but maxing out at 4 seems a little weird.

Sorry, I tend to go into denial mode if someone has better hardware than I do.
pc
sr. member
Activity: 253
Merit: 250
Using 8 threads somehow brings even more contention, with CPU hovering just around 400%.

I believe the contention might be caused by the pooling of EC_POINT objects before calling that make_affine function.
This might spill the contents of your L1/L2 caches now. So it may be more efficient to not run hyperthreaded in this version.

There are some profiling tools by Intel Corp that would permit to figure this out. Haven't used any of them yet.

You could also play with that pool size.

I don't think I was clear before: I have 8 physical cores, and hyperthreading is on, so I see 16 logical CPUs in Activity Monitor. I wasn't surprised with the older version when it maxed out performance at 8 as opposed to 16, but maxing out at 4 seems a little weird.

It's so awesome to churning through billions of addresses. Amusing how this is even less useful than mining is, and yet somehow is more fun.
hero member
Activity: 756
Merit: 502
Using 8 threads somehow brings even more contention, with CPU hovering just around 400%.

I believe the contention might be caused by the pooling of EC_POINT objects before calling that make_affine function.
This might spill the contents of your L1/L2 caches now. So it may be more efficient to not run hyperthreaded in this version.

There are some profiling tools by Intel Corp that would permit to figure this out. Haven't used any of them yet.

You could also play with that pool size.
pc
sr. member
Activity: 253
Merit: 250
Great, negative scalability.  Are you using regular expressions?  How fast does it run with just one thread?

Well, I'm pretty sure it's still faster than it was on the old version, even with the old running at 8 threads, but I'd need to recompile the older version if I wanted to compare. Just running a case-insensitive prefix:

Code:
cebu:~% nice ./Applications/vanitygen -i -t 1 1abcdefg
Difficulty: 13628644118
[80020 K/s][total 501760][Prob 0.0%][50% in 1.4d]
                         
cebu:~% nice ./Applications/vanitygen -i -t 2 1abcdefg
Difficulty: 13628644118
[162979 K/s][total 1505280][Prob 0.0%][50% in 16.1h]

cebu:~% nice ./Applications/vanitygen -i -t 3 1abcdefg
Difficulty: 13628644118
[237562 K/s][total 903168][Prob 0.0%][50% in 11.0h]

cebu:~% nice ./Applications/vanitygen -i -t 4 1abcdefg
Difficulty: 13628644118
[299808 K/s][total 2408448][Prob 0.0%][50% in 8.8h]

Up to this point, CPU usage in Activity Monitor is about what one expect, being roughly 100% times the number of threads.

Code:
cebu:~% nice ./Applications/vanitygen -i -t 5 1abcdefg
Difficulty: 13628644118
[262992 K/s][total 4264960][Prob 0.0%][50% in 10.0h]

5 threads was having CPU hovering between 420% and 440%, and a lower keygen rate, which makes me think that there's some kind of contention for something that's not CPU-bound.

Code:
cebu:~% nice ./Applications/vanitygen -i -t 6 1abcdefg
Difficulty: 13628644118
[261357 K/s][total 9182592][Prob 0.1%][50% in 10.0h]

cebu:~% nice ./Applications/vanitygen -i -t 7 1abcdefg
Difficulty: 13628644118
[245618 K/s][total 1705984][Prob 0.0%][50% in 10.7h]

Using 6 and 7 threads was roughly the same as 5, with CPU slightly higher, maybe between 425% and 445%.

Code:
cebu:~% nice ./Applications/vanitygen -i -t 8 1abcdefg
Difficulty: 13628644118
[200385 K/s][total 2358272][Prob 0.0%][50% in 13.1h]

Using 8 threads somehow brings even more contention, with CPU hovering just around 400%.

I'm not remembering exactly what speeds I was getting before on v0.8, but when I ran 8 threads it was using about 800% CPU, and I'm pretty sure it was well south of 200000 K/s, probably more like 100000, but I really don't remember so I wouldn't rely on that number at all.

And just for completeness, here's my hardware configuration:
Code:
  Model Name:	Mac Pro
  Model Identifier: MacPro4,1
  Processor Name: Quad-Core Intel Xeon
  Processor Speed: 2.26 GHz
  Number Of Processors: 2
  Total Number Of Cores: 8
  L2 Cache (per core): 256 KB
  L3 Cache (per processor): 8 MB
  Memory: 32 GB
  Processor Interconnect Speed: 5.86 GT/s
  Boot ROM Version: MP41.0081.B07
  SMC Version (system): 1.39f5
  SMC Version (processor tray): 1.39f5

Thanks again!
hero member
Activity: 756
Merit: 502
New version 0.10 is up. This version is approx. 6X (!!) faster at prefix matching

Congratulations for this optimization! I've profiled vanitygen 0.9 before and also noticed the issue with the inversion taking so much time. But you found a solution already. Are you very familiar with OpenSSL internals? It certainly seems so.

If someone can port two important functions to the GPU, one being the EC_POINT_add() and the other being EC_POINTs_make_affine(), this thing will fly. Even more so when also the SHA256 and MD160 hashes are done on the GPU.  Here is the blurb of relevant profiler output. The number in the second column is seconds spent inside the function and its children. The total execution time was about 25 seconds in this test run.

Code:
-----------------------------------------------
[3]     99.9    0.01   24.92       1         vg_thread_loop(_vg_context_s*) [3]
                0.00   12.55  249406/250471      EC_POINT_add [7]
                0.00    7.91     932/941         EC_POINTs_make_affine [9]
                0.00    1.82  219546/219548      EC_POINT_point2oct [16]
                0.00    1.53  272770/272775      SHA256 [19]
                0.01    0.71  226973/226974      RIPEMD160 [26]

Anything else is peanuts in comparison, including the prefix matching.

legendary
Activity: 1974
Merit: 1030
If you have a sec, give some details.  Are you using prefixes?  How many cores, how fast, and how fast with a single thread?

Intel(R) Xeon(R) CPU E5420  @ 2.50GHz (8 cores):

Code:
$ ./vg-0.6 -it1 1Loaners & sleep 10; kill $!
[1] 30177
Difficulty: 28173812690
[28363 K/s][total 280000][Prob 0.0%][50% in 8.0d]
$ ./vg-0.6 -i 1Loaners & sleep 10; kill $!
[1] 30179
Difficulty: 28173812690
[174878 K/s][total 1520000][Prob 0.0%][50% in 1.3d]
$ ./vg-0.10 -it1 1Loaners & sleep 10; kill $!
[1] 30188
Difficulty: 28173812690
[164485 K/s][total 1605696][Prob 0.0%][50% in 1.4d]
$ ./vg-0.10 -i 1Loaners & sleep 10; kill $!
[1] 30190
Difficulty: 28173812690
[884067 K/s][total 8430080][Prob 0.0%][50% in 6.1h]

v0.6 single thread to v0.6 8 threads: 174878/28363 = 6.1657x (expect 8x)
v0.6 single thread to v0.10 single thread: 164485/28363 = 5.7992x (expect 6x as announced)
v0.6 8 threads to v0.10 8 threads: 884067/174878 = 5.0553x (expect 6x as announced)

Oops, my fault, it's not 4x but 5x. I stopped vanitygen v0.6 8 hours ago and started v0.10 some minutes ago. I judged the improvement not by the rate but by the time remaining, and I suspect I didn't take into account the fact that when I stopped v0.6 this morning, it had been running for some hours and the time remaining was, of course, less than at the start Smiley.
full member
Activity: 140
Merit: 430
Firstbits: 1samr7
I have a dual-quad-core Mac Pro with hyperthreading, and on previous versions if I ran at 8 threads I got optimal performance, but I noticed with the new version that at 8 threads I was still only using "400%" of a cpu, so I tried running at 4 threads instead and got up to 300000 K/s instead of around 200000 K/s. So, I don't know if others have a similar configuration, but it might be good to play around with the number of threads to try to hit the optimal rate for your platform.

Great, negative scalability.  Are you using regular expressions?  How fast does it run with just one thread?

I'm seeing 4x increase. I don't care not getting 6x, 4x is an amazing improvement in any case Smiley.

an0therlr3, you might be noticing some scalability issues as well.  If you have a sec, give some details.  Are you using prefixes?  How many cores, how fast, and how fast with a single thread?
legendary
Activity: 1974
Merit: 1030
I'm seeing 4x increase. I don't care not getting 6x, 4x is an amazing improvement in any case Smiley.
pc
sr. member
Activity: 253
Merit: 250
I have a dual-quad-core Mac Pro with hyperthreading, and on previous versions if I ran at 8 threads I got optimal performance, but I noticed with the new version that at 8 threads I was still only using "400%" of a cpu, so I tried running at 4 threads instead and got up to 300000 K/s instead of around 200000 K/s. So, I don't know if others have a similar configuration, but it might be good to play around with the number of threads to try to hit the optimal rate for your platform.

Thank you very much for this.
sr. member
Activity: 252
Merit: 250
New version 0.10 is up.

This version is approx. 6X (!!) faster at prefix matching, thanks to an OpenSSL optimization for quickly computing batches of modular inverses.  This optimization also makes the cost of regular expressions much more acute.  The search rate for matching a single regular expression only improved by about 3X, and overall is approx. 1/3 the speed of a prefix match.

Congratz!

But... any news about entropy import?
full member
Activity: 140
Merit: 430
Firstbits: 1samr7
New version 0.10 is up.

This version is approx. 6X (!!) faster at prefix matching, thanks to an OpenSSL optimization for quickly computing batches of modular inverses.  This optimization also makes the cost of regular expressions much more acute.  The search rate for matching a single regular expression only improved by about 3X, and overall is approx. 1/3 the speed of a prefix match.
jr. member
Activity: 42
Merit: 1000
I have got this estimation for my pattern : 9.47e+33y
What exactly this means in decimals ?   Huh
Maybe 9.47*2.7183^33 years ?  Undecided
full member
Activity: 134
Merit: 102
I have got this estimation for my pattern : 9.47e+33y
What exactly this means in decimals ?   Huh
Maybe 9.47*2.7183^33 years ?  Undecided

It means 9.47*10^33. I assume years.

http://en.wikipedia.org/wiki/Scientific_notation#E_notation
donator
Activity: 2772
Merit: 1019
Found my username Smiley
My computer said it would take 7 years to find your username? Would you be interested in finding a couple vanity addresses for me?

Ohoh. You'd have to trust the guy pretty badly.
hero member
Activity: 518
Merit: 500
Found my username Smiley
My computer said it would take 7 years to find your username? Would you be interested in finding a couple vanity addresses for me?
Really, 7 years? Is that case sensitive? I searched case-insensitive and got lucky with just a capital at the start. I've been searching for 7 prefixes ranging from 6 to 8 characters long for the past couple of days. I've found 6 of them so far. My computer is almost three years old and not anything special.

I can search an address for you if you'd like.
full member
Activity: 140
Merit: 430
Firstbits: 1samr7
Can I make a feature request.
That it saves the info to a txt file when it finds a prefix but it keeps trying for other combination.

Good call!

Version 0.9 is up now.  Use the "-o" option to specify an output file for matches.  Use the "-k" option to keep patterns after finding matches.
Jump to: