Vanitygen: Vanity bitcoin address generator/miner [v0.22] - page 186.

Joric

member

Activity: 67

Merit: 130

Just built up a script (pywallet.py 1.0) allowing export/import private keys in shortened format (mostly as a lightweight alternative to showwallet for those who didn't manage to compile the branch). Requires only openssl libs (for elliptic curve cryptography). URL: https://github.com/joric/pywallet

Ukigo

jr. member

Activity: 42

Merit: 1000

Quote from: EricJ2190 on July 11, 2011, 11:35:21 PM

It means 9.47*10^33. I assume years.

http://en.wikipedia.org/wiki/Scientific_notation#E_notation

Thanks. It's even better than i thought )

pc

sr. member

Activity: 253

Merit: 250

Quote from: samr7 on July 12, 2011, 03:29:01 PM

Indeed! Try running two instances at four threads each. If the OS X scheduler is smart, it will isolate each to a processor package to minimize the cost of contention.

Fascinating. Running two instances at four threads each gives me each instance running about 260000–275000 K/s or so, and each taking up a bit under 400% (probably about as much as they can with the other programs I have running here).

samr7

full member

Activity: 140

Merit: 430

Firstbits: 1samr7

Quote from: pc on July 12, 2011, 10:10:28 AM

Well, I'm pretty sure it's still faster than it was on the old version, even with the old running at 8 threads, but I'd need to recompile the older version if I wanted to compare. Just running a case-insensitive prefix:

Code:

cebu:~% nice ./Applications/vanitygen -i -t 1 1abcdefg
Difficulty: 13628644118
[80020 K/s][total 501760][Prob 0.0%][50% in 1.4d]

That's oddly slow, you should be getting about twice that key rate on that CPU.

Quote

Code:

cebu:~% nice ./Applications/vanitygen -i -t 4 1abcdefg
Difficulty: 13628644118
[299808 K/s][total 2408448][Prob 0.0%][50% in 8.8h]

Up to this point, CPU usage in Activity Monitor is about what one expect, being roughly 100% times the number of threads.

Code:

cebu:~% nice ./Applications/vanitygen -i -t 5 1abcdefg
Difficulty: 13628644118
[262992 K/s][total 4264960][Prob 0.0%][50% in 10.0h]

5 threads was having CPU hovering between 420% and 440%, and a lower keygen rate, which makes me think that there's some kind of contention for something that's not CPU-bound.

Indeed! Try running two instances at four threads each. If the OS X scheduler is smart, it will isolate each to a processor package to minimize the cost of contention.

cbuchner1

hero member

Activity: 756

Merit: 502

Quote from: pc on July 12, 2011, 11:29:40 AM

I don't think I was clear before: I have 8 physical cores, and hyperthreading is on, so I see 16 logical CPUs in Activity Monitor. I wasn't surprised with the older version when it maxed out performance at 8 as opposed to 16, but maxing out at 4 seems a little weird.

Sorry, I tend to go into denial mode if someone has better hardware than I do.

pc

sr. member

Activity: 253

Merit: 250

Quote from: cbuchner1 on July 12, 2011, 11:13:59 AM

Quote from: pc on July 12, 2011, 10:10:28 AM

Using 8 threads somehow brings even more contention, with CPU hovering just around 400%.

I believe the contention might be caused by the pooling of EC_POINT objects before calling that make_affine function.
This might spill the contents of your L1/L2 caches now. So it may be more efficient to not run hyperthreaded in this version.

There are some profiling tools by Intel Corp that would permit to figure this out. Haven't used any of them yet.

You could also play with that pool size.

I don't think I was clear before: I have 8 physical cores, and hyperthreading is on, so I see 16 logical CPUs in Activity Monitor. I wasn't surprised with the older version when it maxed out performance at 8 as opposed to 16, but maxing out at 4 seems a little weird.

It's so awesome to churning through billions of addresses. Amusing how this is even less useful than mining is, and yet somehow is more fun.

cbuchner1

hero member

Activity: 756

Merit: 502

Quote from: pc on July 12, 2011, 10:10:28 AM

Using 8 threads somehow brings even more contention, with CPU hovering just around 400%.

I believe the contention might be caused by the pooling of EC_POINT objects before calling that make_affine function.
This might spill the contents of your L1/L2 caches now. So it may be more efficient to not run hyperthreaded in this version.

There are some profiling tools by Intel Corp that would permit to figure this out. Haven't used any of them yet.

You could also play with that pool size.

pc

sr. member

Activity: 253

Merit: 250

Quote from: samr7 on July 12, 2011, 08:54:26 AM

Great, negative scalability. Are you using regular expressions? How fast does it run with just one thread?

Well, I'm pretty sure it's still faster than it was on the old version, even with the old running at 8 threads, but I'd need to recompile the older version if I wanted to compare. Just running a case-insensitive prefix:

Code:

cebu:~% nice ./Applications/vanitygen -i -t 1 1abcdefg
Difficulty: 13628644118
[80020 K/s][total 501760][Prob 0.0%][50% in 1.4d]
                         
cebu:~% nice ./Applications/vanitygen -i -t 2 1abcdefg
Difficulty: 13628644118
[162979 K/s][total 1505280][Prob 0.0%][50% in 16.1h]

cebu:~% nice ./Applications/vanitygen -i -t 3 1abcdefg
Difficulty: 13628644118
[237562 K/s][total 903168][Prob 0.0%][50% in 11.0h]

cebu:~% nice ./Applications/vanitygen -i -t 4 1abcdefg
Difficulty: 13628644118
[299808 K/s][total 2408448][Prob 0.0%][50% in 8.8h]

Up to this point, CPU usage in Activity Monitor is about what one expect, being roughly 100% times the number of threads.

Code:

cebu:~% nice ./Applications/vanitygen -i -t 5 1abcdefg
Difficulty: 13628644118
[262992 K/s][total 4264960][Prob 0.0%][50% in 10.0h]

5 threads was having CPU hovering between 420% and 440%, and a lower keygen rate, which makes me think that there's some kind of contention for something that's not CPU-bound.

Code:

cebu:~% nice ./Applications/vanitygen -i -t 6 1abcdefg
Difficulty: 13628644118
[261357 K/s][total 9182592][Prob 0.1%][50% in 10.0h]

cebu:~% nice ./Applications/vanitygen -i -t 7 1abcdefg
Difficulty: 13628644118
[245618 K/s][total 1705984][Prob 0.0%][50% in 10.7h]

Using 6 and 7 threads was roughly the same as 5, with CPU slightly higher, maybe between 425% and 445%.

Code:

cebu:~% nice ./Applications/vanitygen -i -t 8 1abcdefg
Difficulty: 13628644118
[200385 K/s][total 2358272][Prob 0.0%][50% in 13.1h]

Using 8 threads somehow brings even more contention, with CPU hovering just around 400%.

I'm not remembering exactly what speeds I was getting before on v0.8, but when I ran 8 threads it was using about 800% CPU, and I'm pretty sure it was well south of 200000 K/s, probably more like 100000, but I really don't remember so I wouldn't rely on that number at all.

And just for completeness, here's my hardware configuration:

Code:

  Model Name:	Mac Pro
  Model Identifier:	MacPro4,1
  Processor Name:	Quad-Core Intel Xeon
  Processor Speed:	2.26 GHz
  Number Of Processors:	2
  Total Number Of Cores:	8
  L2 Cache (per core):	256 KB
  L3 Cache (per processor):	8 MB
  Memory:	32 GB
  Processor Interconnect Speed:	5.86 GT/s
  Boot ROM Version:	MP41.0081.B07
  SMC Version (system):	1.39f5
  SMC Version (processor tray):	1.39f5

Thanks again!

cbuchner1

hero member

Activity: 756

Merit: 502

Quote from: samr7 on July 12, 2011, 04:52:30 AM

New version 0.10 is up. This version is approx. 6X (!!) faster at prefix matching

Congratulations for this optimization! I've profiled vanitygen 0.9 before and also noticed the issue with the inversion taking so much time. But you found a solution already. Are you very familiar with OpenSSL internals? It certainly seems so.

If someone can port two important functions to the GPU, one being the EC_POINT_add() and the other being EC_POINTs_make_affine(), this thing will fly. Even more so when also the SHA256 and MD160 hashes are done on the GPU. Here is the blurb of relevant profiler output. The number in the second column is seconds spent inside the function and its children. The total execution time was about 25 seconds in this test run.

Code:

-----------------------------------------------
[3]     99.9    0.01   24.92       1         vg_thread_loop(_vg_context_s*) [3]
                0.00   12.55  249406/250471      EC_POINT_add [7]
                0.00    7.91     932/941         EC_POINTs_make_affine [9]
                0.00    1.82  219546/219548      EC_POINT_point2oct [16]
                0.00    1.53  272770/272775      SHA256 [19]
                0.01    0.71  226973/226974      RIPEMD160 [26]

Anything else is peanuts in comparison, including the prefix matching.

dserrano5

legendary

Activity: 1974

Merit: 1030

Quote from: samr7 on July 12, 2011, 08:54:26 AM

If you have a sec, give some details. Are you using prefixes? How many cores, how fast, and how fast with a single thread?

Intel(R) Xeon(R) CPU E5420 @ 2.50GHz (8 cores):

Code:

$ ./vg-0.6 -it1 1Loaners & sleep 10; kill $!
[1] 30177
Difficulty: 28173812690
[28363 K/s][total 280000][Prob 0.0%][50% in 8.0d]
$ ./vg-0.6 -i 1Loaners & sleep 10; kill $!
[1] 30179
Difficulty: 28173812690
[174878 K/s][total 1520000][Prob 0.0%][50% in 1.3d]
$ ./vg-0.10 -it1 1Loaners & sleep 10; kill $!
[1] 30188
Difficulty: 28173812690
[164485 K/s][total 1605696][Prob 0.0%][50% in 1.4d]
$ ./vg-0.10 -i 1Loaners & sleep 10; kill $!
[1] 30190
Difficulty: 28173812690
[884067 K/s][total 8430080][Prob 0.0%][50% in 6.1h]

v0.6 single thread to v0.6 8 threads: 174878/28363 = 6.1657x (expect 8x)
v0.6 single thread to v0.10 single thread: 164485/28363 = 5.7992x (expect 6x as announced)
v0.6 8 threads to v0.10 8 threads: 884067/174878 = 5.0553x (expect 6x as announced)

Oops, my fault, it's not 4x but 5x. I stopped vanitygen v0.6 8 hours ago and started v0.10 some minutes ago. I judged the improvement not by the rate but by the time remaining, and I suspect I didn't take into account the fact that when I stopped v0.6 this morning, it had been running for some hours and the time remaining was, of course, less than at the start

.

samr7

full member

Activity: 140

Merit: 430

Firstbits: 1samr7

Quote from: pc on July 12, 2011, 06:38:23 AM

I have a dual-quad-core Mac Pro with hyperthreading, and on previous versions if I ran at 8 threads I got optimal performance, but I noticed with the new version that at 8 threads I was still only using "400%" of a cpu, so I tried running at 4 threads instead and got up to 300000 K/s instead of around 200000 K/s. So, I don't know if others have a similar configuration, but it might be good to play around with the number of threads to try to hit the optimal rate for your platform.

Great, negative scalability. Are you using regular expressions? How fast does it run with just one thread?

Quote from: dserrano5 on July 12, 2011, 08:33:22 AM

I'm seeing 4x increase. I don't care not getting 6x, 4x is an amazing improvement in any case

.

an0therlr3, you might be noticing some scalability issues as well. If you have a sec, give some details. Are you using prefixes? How many cores, how fast, and how fast with a single thread?

dserrano5

legendary

Activity: 1974

Merit: 1030

I'm seeing 4x increase. I don't care not getting 6x, 4x is an amazing improvement in any case

.

pc

sr. member

Activity: 253

Merit: 250

I have a dual-quad-core Mac Pro with hyperthreading, and on previous versions if I ran at 8 threads I got optimal performance, but I noticed with the new version that at 8 threads I was still only using "400%" of a cpu, so I tried running at 4 threads instead and got up to 300000 K/s instead of around 200000 K/s. So, I don't know if others have a similar configuration, but it might be good to play around with the number of threads to try to hit the optimal rate for your platform.

Thank you very much for this.

Shevek

sr. member

Activity: 252

Merit: 250

Quote from: samr7 on July 12, 2011, 04:52:30 AM

New version 0.10 is up.

This version is approx. 6X (!!) faster at prefix matching, thanks to an OpenSSL optimization for quickly computing batches of modular inverses. This optimization also makes the cost of regular expressions much more acute. The search rate for matching a single regular expression only improved by about 3X, and overall is approx. 1/3 the speed of a prefix match.

Congratz!

But... any news about entropy import?

samr7

full member

Activity: 140

Merit: 430

Firstbits: 1samr7

New version 0.10 is up.

This version is approx. 6X (!!) faster at prefix matching, thanks to an OpenSSL optimization for quickly computing batches of modular inverses. This optimization also makes the cost of regular expressions much more acute. The search rate for matching a single regular expression only improved by about 3X, and overall is approx. 1/3 the speed of a prefix match.

Ukigo

jr. member

Activity: 42

Merit: 1000

I have got this estimation for my pattern : 9.47e+33y
What exactly this means in decimals ? Huh

Maybe 9.47*2.7183^33 years ? Undecided

EricJ2190

full member

Activity: 134

Merit: 102

Quote from: Ukigo on July 11, 2011, 11:31:28 PM

I have got this estimation for my pattern : 9.47e+33y
What exactly this means in decimals ? Huh

Maybe 9.47*2.7183^33 years ? Undecided

It means 9.47*10^33. I assume years.

http://en.wikipedia.org/wiki/Scientific_notation#E_notation

molecular

donator

Activity: 2772

Merit: 1019

Quote from: Ryland R. Taylor-Almanza on July 10, 2011, 11:23:30 AM

Quote from: brendio on July 10, 2011, 09:02:43 AM

Found my username

My computer said it would take 7 years to find your username? Would you be interested in finding a couple vanity addresses for me?

Ohoh. You'd have to trust the guy pretty badly.

brendio

hero member

Activity: 518

Merit: 500

Quote from: Ryland R. Taylor-Almanza on July 10, 2011, 11:23:30 AM

Quote from: brendio on July 10, 2011, 09:02:43 AM

Found my username

My computer said it would take 7 years to find your username? Would you be interested in finding a couple vanity addresses for me?

Really, 7 years? Is that case sensitive? I searched case-insensitive and got lucky with just a capital at the start. I've been searching for 7 prefixes ranging from 6 to 8 characters long for the past couple of days. I've found 6 of them so far. My computer is almost three years old and not anything special.

I can search an address for you if you'd like.

samr7

full member

Activity: 140

Merit: 430

Firstbits: 1samr7

Quote from: bmgjet on July 10, 2011, 03:44:47 AM

Can I make a feature request.
That it saves the info to a txt file when it finds a prefix but it keeps trying for other combination.

Good call!

Version 0.9 is up now. Use the "-o" option to specify an output file for matches. Use the "-k" option to keep patterns after finding matches.

Topic: Vanitygen: Vanity bitcoin address generator/miner [v0.22] - page 186. (Read 1153876 times)