long time lurker; not much of a poster...
With simple C code, on a single thread, my cpu gets ~20K keys/second.
roughly broken down as:
EC_POINT_add: 19808 ticks
EC_POINT_point2oct: 120116 ticks
sha256: 27840 ticks
ripemd160: 3152 ticks
Scanned 763188 keys in 40 seconds, 19079 keys/second
i.e. as noted in this thread, the EC point conversion takes most of the time.
This can be further optimised in several ways - ignore the Y component or as noted in the VanityGen code; batch up a set of results and
use EC_POINTs_make_affine to simplify this operation.
I then patched the VanityGen code to solve the same problem.
This is only a very small mod.
- set the initial private key = 1, rather than random
- change the pattern matching [you could probably even use the existing one; but there is no need to convert to b58]
BitcoinPuzzle
roughly broken down as:
EC_POINT_add: 5300 ticks
EC_POINTs_make_affine 1193800 ticks [called once per 256 keys]
i.e. 4663 ticks per key
EC_POINT_point2oct: 2760 ticks
sha256: 1216 ticks
ripemd160: 1484 ticks
[186.63 Kkey/s]
VanityGen
[167 Kkey/s]
The saving on the EC point conversion can be seen, as can the better quality hash code used by OpenSSL compared to the code I was using.
If I run this on all cores; I get ~1.4M keys/second.
I've not tried the GPU version; but would expect comparable results to VanityGen generation.
The VanityGen thread suggests that a reasonable GPU can do ~30Mkeys/second.
Now, consider that the average time to find a key is half the search space.
So, for a 40 bit key, the average search = 1/4 * 2^40 = 2.74877906944 x10^11
i.e.
bits search_len rate time
25 8388608 1.4M 6 secs
40 274877906944 1.4M 196341 secs = 3.15 days
40 274877906944 30M 9163 secs = 2.5 hours
50 281474976710656 30M 9382500 secs = 109 days
51 562949953421312 30M 18765000 secs = 217 days
... I'm not going to be running for 200 days for $20
Burt, after generating the public key, it requires converting from elliptic curve coordinates to bytes.
The operation is 'simply' X' = X/Z^2 - but that means a divide in the finite field.
I *think* that make_affine step forces Z to be a constant for all the block that are affined, so this division only has to be done once per block.
EC_POINT_point2oct actually calculates both X'= X/Z^2 and Y'=Y/Z^3; but as we are using a compressed key, we don't need the Y' term.
I've not checked to see if this code gets optimised away properly; if not, it might save 10%.