Relax - I know the math. (I think)
On the other hand, when it comes to this topic, many do not seem to know the math(*). Sure - big numbers are involved. Gargantuan numbers. Not exactly
Googol or
Googolplex like numbers, but still big enough. When it comes to big numbers, people seem to exhibit an intriguing weakness to deal with them.
(*)
- First problem for some people is to decide whether a 2^256 or a 2^160 search space is involved brute force ... over and over again.
- Because it seems to be such an unfeasible task, people dismiss it - often with a standard phrase that used to be used for dismissal. Even if this phrase is obsolete in some context today. "You should use the hardware for mining instead". Right.
- Many seem to have problems to distinguish between attacking one specific public address agains attacking ANY public address. Sure - the odds differ by about 6 orders of magnitude, but given above gargantuan numbers it is negligible - isn't it?
- Some confuse a collision attack with cracking poor privkeys
- ...
Now - because my SHA256-ASIC miners are not capable of driving a collision attack and because my servers can use some load and because I seem to have way too much time on my hands and because I like to boldly go where no man has gone before...
I started a feasibility study for a Collision AttackIt is a feasibility study, because at the moment I do not think I will ever find a privkey for some (foreign) address that has funds on it and even if I do, I think the benefit of documenting that may be higher than just slurping the funds. We'll see...
So what have I done so far?At the moment, I have written a small Perl script that parses the output of the
blockparser to get the N topmost addresses with funds on them (I'm cutting it off after 2.5 mio) and store them in-memory in a hash.
I also modified the directory.io
no latency code to give me a bunch of addresses from a certain offset, sped up the code and lowered memory footprint for the privkeys.
Again, Perl to the rescue, I'm using
a little parallelization to launch the GO programs.
All put together, I am able - at the moment - to check 2 million pubkeys against the 2.5 million addresses with funds on them per second. That's a neat 5000000000000 comparisons per second. Given the amount of time I spent hacking this (roughly a day) it's ok. The number looks impressive, but as is I have a 50% chance of hitting an address with funds on it after a mere 1691552820984841340513524111940 days (or 4634391290369428330174038662 years).
Of course, I could hit one tomorrow, but the odds are just low ATM.
So why am I doing this anyway?Besides the "boldly go where no man..." issue, I'm actually looking how far I can push this. I want to see for myself how far an individual with some skills, some access to decent hardware can push this issue. Consider it my Google rainy Summer of Code. Obviously, if I could speed the whole thing up by a factor of 4634391290369428330174038662, I could hit an address once a year or so. I intend to edit this post - or maybe even to participate in some constructive discussion here - and to update any progress there may be.
You can also think of it as a canary project. You now know someone is definitely doing it. Someone who has decided to talk about it. So are your funds save? Theoretically yes. Practically? Probably yes. If not, I will tell.
What am I going to do next?Investigating speedup of course. The pubkey generation is the most time consuming thing at the moment. oclvanitygen claims even on my "tiny" notebook a nice 16 MKeys/s, but that's with a prefix comparison. Trying to feed it 2.5 mio prefixes of full length fails. ;-) However, I will spend some time to see how/if I could make use of GPU power for this specific use case.
I'll have a look at Amazon EC2 services once I feel the per-server efficiency is approaching the optimum (which it clearly doesn't ATM). My rough estimate is, that I could speed up this task by a factor of 100k, but after this I'm missing the ideas.
Still - if I manage to check 200 bn Keys per second against all addresses with funds on them for a month, I will consider this feasibility study a success. The result will probably be "not feasible", but the study can be attached for reference to these feasibility discussions. :-)
Will you provide the code?No, at the moment I have no plans doing so. I may provide it when I consider the study done and if I am convinced no harm could be caused with it.
What I can do at the moment, is to provide you for a small fee (0.01 BTC) with 10 blocks of 2M pubkeys (1M uncompressed and the corresponding 1M compressed) which are computed from consecutive privkeys. Just say the offset. The block is 75MB uncompressed and 52MB bzip2-compressed. You can also get the list of 9.2 mio pub addresses with funds on it for the same fee. I'd be interested to hear about your use case.
How does it look like on the console?I'd like to show, but the code tags here seem to filter and not show things verbatim. Ok, so a picture:
The "FOUND:" message is what one would expect to see if a collision occurred. Right now it's just an address I injected into the "addresses with funds" hash to test if it actually works It has the privkey 666000066 - see also
here. ;-) Every dot in there is the checking of a 2M block of pubkeys.
Rico
update 16-07-17:After some tweaks and using 12 CPU cores, I can check 6M keys per second against 2.5M addresses with funds on them, but I'm running into an I/O bottleneck here, so I have distributed the load to some I/O subsystems on the machine. Which is better, but doesn't quite cut it (still see disk waits). I could use a RAMdisk, but then a check batch would be limited to 2bn keys and cleanup would be required. Pipes are too slow and once generated pubkeys would be lost. Right now I'm storing them for some later re-runs/analysis.
update 16-07-19:Never trust anyone. :-) Seems the znort-blockparser is garbling up some data (it's presenting me public addresses starting with "4" - see
here), also it's quite difficult to test if your checks are really working, because 100% of the time you have no matches ... so you may not immediately realize when you make changes to the program that you have a bug and would never match. So now I have got a test suite!
update 16-07-20:Seems the "4" prefix addresses are a feature and not a bug. I'm just ignoring these addresses right now, but I am looking at ALL addresses with a balance (4368371 with my data). Revisited the 1st 10 bn addresses with that extended cross checking - still no match. :-) So while I lowered the avg time until I find an address with funds on it to 884080757329110464399284207 years (I literally spared 3750310533040317865774754455 years) - It's still tad pole.
update 16-07-21:Tried to use gccgo compiler for the Go key-generation program, but failed. Obviously the Go btc library does have some optimized routines written in Assembler. Unfortunately, the format is Plan9 where the gccgo expects GNU format. :-/ Had a look at some rainbow table techniques and also made some calculations about various probabilities of a Birthday attack. Also thought about offsets where to start the attack from. 0 isn't probably the best choice. Meanwhile 12 CPUs crunch the data with
26210226000000 19279164000000 checks per second.
update 16-07-24:I'm now generating the blocks on the fly, as the 2 Terabyte I allocated to storing the computed keys are full.
I also had to fix handling of big integer numbers, as starting from 0 has a long way to go until you run into some (64bit?) overflow, but when I gave it an offset like 84295974720949272949333500620693206304552891965859367805505284669110495 the poor thing folded. So that works now too and I try some more "promising" offsets. In the meantime, I had/have a look at the
Perl bitcoin library and am shaping it up a little bit, because I have some neat ideas of interweaving that library, my
trading bot (well - it doesn't run on CEX.io anymore) and some nice scripts I created during this study to a nice bigger blob.
BTW - I'm checking now against all addresses that have at least 1 Satoshi balance on them (9207506)
update 16-07-27:The checking of the 30bn keys for duplicates is not feasible at the moment. A naive approach would require 0.5 * 30bn
2 comparisons, a cached approach (with say k keys in memory and O(1) lookup) still O(N
2 / k). I tried with a bloom filter for 3bn keys and 0.00001 probability of false positives, which took 67GB memory, but after feeding it roughly 1.5bn keys, the false positives kept pouring in... I will set up another machine with 144GB and try there, but right now I'm busy doing other things, so I simply let it crunch the regular process. Updated the status report below.
update 16-07-28:I've examined like 36bit of the search space. ;-) 40 bit will be like 16times as much effort and then there are still 120 bits left. First doubts appear if I will ever be able to complete this task.
Maybe I only need faster machines?
update 16-07-31:Had a look at the
secp256k1 library to fasten things up. Unfortunately it's - still - documented quite poorly, so for a non-C programmer as myself it's quite hard to get something working. However, I did manage to hack the
bitcoin-tool to do my bidding. Unfortunately the resulting C code is slower than the Go code I'm using right now, so no fastening up at the moment...
update 16-08-03:*DING* - as of today, over half a billion of pages equivalent to those on directory.io were searched. That's an equivalent of over 64 billion addresses (total sum is somewhat around 70 billion checked). I'm thinking about setting up a distributed infrastructure for this, if for nothing else but to manage multiple of my machines to work on this in some coordinated effort. This will require to implement some rudimentary "work distributor", a client-server protocol and of course to shape up the client side. One more thing: I am investigating the use of the
Fast Intel SHA256 implementation to possibly speed things up a little bit more.
update 16-08-04:Suggested a collision finders pool here. Distributing the work on some computers of mine, so worked a little bit on the client. About 22 CPU cores in total, so you will see the number of checked pages below in the status rising somewhat faster.
update 16-08-07:Some more hacking on the client (which is called LBC now) and also started to hack a "pooling server". Changed the block size from 1000000 to 1048576 so it's actually exactly 20bit search space per block.
update 16-08-11:Pooling server in beta test mode. Stats will be presented on the
stats page there once up. Not long and we will have the equivalent of 1 billion pages of drectory.io processed.
update 16-08-12:End of Status updates here. For current statistics, please go
here, for any news, go
here.