Collision "Attack" feasibility study

rico666

legendary

Activity: 1120

Merit: 1037

฿ → ∞

Quote from: cloverme on September 23, 2016, 04:28:33 PM

Are you approaching the batches in a linear fashion versus maybe dividing up the keyspace so that you are "attacking" different lengths with different batch pools?

Not sure if I understand correctly, but currently there is only one pool: The PK space 1 ... almost 2²⁵⁶.
Of this, we will preferably look at the 1st 160 bits, because IF both SHA256 and RIPEMD160 are behaving as they should, they have uniform distribution.

Therefore the output of RIPEMD160 should have also a source in the 1st 160bits of the SHA256 keyspace (and the next 160bit, and the next, ...)
(That's why there are - in theory - 2⁹⁶ valid private keys per address)

Because we are watching something around 9 million addresses, the effective search space is around 136.75 bits until something is found.
Of this, something over 41 bits were already searched.

Clients are basically free to decide which search space interval they want to cover (if they give it manually).
If they simply let the server decide where to search, it will assign them search intervals depending on their speed and time commit within the 1st 160bit.

Quote

This is also an interesting read in using SAT solving: http://jheusser.github.io/2013/02/03/satcoin.html
(Note that it's talking about mining versus PK generation, but still related on a problem solving approach.)

Uh. That's heavy cuisine. Bookmarked for later study. Thanks.

Rico

cloverme

legendary

Activity: 1512

Merit: 1057

SpacePirate.io

Are you approaching the batches in a linear fashion versus maybe dividing up the keyspace so that you are "attacking" different lengths with different batch pools?

This is also an interesting read in using SAT solving: http://jheusser.github.io/2013/02/03/satcoin.html
(Note that it's talking about mining versus PK generation, but still related on a problem solving approach.)

rico666

legendary

Activity: 1120

Merit: 1037

฿ → ∞

Quote from: rico666 on July 17, 2016, 05:47:54 AM

update 16-07-28:

I've examined like 36bit of the search space. ;-) 40 bit will be like 16times as much effort and then there are still 120 bits left. First doubts appear if I will ever be able to complete this task. Cheesy

Maybe I only need faster machines?

Well - here we are today, 7 weeks later, and the 40bit will fall in a couple of hours. :-)
40->41 bit? (same amount as 0->40) 48hours at current speed.

Rico

rico666

legendary

Activity: 1120

Merit: 1037

฿ → ∞

Quote from: cloverme on July 24, 2016, 12:56:12 PM

Some more research data, look at vitalik buterin's comments about half way through this thread:
https://www.reddit.com/r/Bitcoin/comments/1xfs7h/how_many_private_keys_are_mathematically_possible/

It's about a different - and very academic - topic. It's about the probability, that some addresses cannot be produced using the existing algorithm.
And Vitalik gives the exact probability as

[–]vbuterin 1 point 2 years ago
It's the probability that all 2²⁵⁶ private keys do not hash to a given address, yes.
Actually, it's (1 - 2¹⁶⁰ ) ^ (2²⁵⁷ - 2³³ - 1954) due to key compression and redundant EC points, but close enough.

He's not precise when he says "all ... keys", because he meant "any of the ... keys". But this is something we do not care about. a) it's never going to happen and b) even if there might be this theoretical private key cannot hash to a valid address ... pfff who cares - no one will ever see that address.

As for the question how many priv keys there are for each public key, it's still safe to believe them are 2⁹⁶ (on average).

Rico

cloverme

legendary

Activity: 1512

Merit: 1057

SpacePirate.io

Some more research data, look at vitalik buterin's comments about half way through this thread:
https://www.reddit.com/r/Bitcoin/comments/1xfs7h/how_many_private_keys_are_mathematically_possible/

Quote

[–]vbuterin 6 points 2 years ago*
Unless some addresses cannot be produced using the existing algorithm
The probability of that is
(1 - 2-160 ) ^ 2256
~= (1/e) ^ 296
= 0.0000....(296 zeroes)....
= for all intents and purposes, zero.
It might be that some addresses have significantly more corresponding private keys than the others.
Average keys per address: 296
Expected standard deviation: 248
Probability of being more than 1 standev outside mean: 32%
Probability of being more than 3 standevs outside mean: 0.3%
Probability of being more than 240 standevs outside mean: yeah, not gonna happen

cloverme

legendary

Activity: 1512

Merit: 1057

SpacePirate.io

Impressive.... Your dedication to this is truly remarkable.

rico666

legendary

Activity: 1120

Merit: 1037

฿ → ∞

So the study has been going on for some days and I have some side-info I'd like to share:

Other attempts

Lurking in the forums I learned that someone in the german forum started something similar, more clandestine and - luckily - more naive. Wgetting http://directory.io ... Cheesy

And these people write/think they are "IT-Freaks".

The Rico666-Vanity

I could offer a totally useless service of "the 1st Vanity address". Namely I found 1FVCvgkHJU7JczSm9NouG1hRico6668oBa and I know for a fact, there is no other pubkey containing that string before that pubkey. Where "before" means with a smaller privkey. By definition, "1st vanity address with a given string with the smallest numerical privkey" call such a specific vanity address the Rico666-Vanity. It's evil because insecure. But just for fun, if someone would like to know a specific Rico666-Vanity (6-7 chars), I could look it up.

Extending the Collision checks

Because I have now some 30bn pubkeys (and their privkeys) testset, I will run a crosscheck for any duplicates. RIght now I have checked "only" about 9 mio addresses with funds against 30bn keys. I have not checked these 30bn keys against the 30bn keys for any collisions. I do not expect to find any, but you never know...

Rico

rico666

legendary

Activity: 1120

Merit: 1037

฿ → ∞

Quote from: cloverme on July 17, 2016, 11:10:32 AM

An interesting project for sure. Everyone does quote the math, I think if anything, your study could be used to support such challenges in the future. Once guy here in this thread was convinced of a threat but couldn't prove it: https://bitcointalksearch.org/topic/bitcoin-greatest-vulnerability-1516755

I am aware of the "Levent Korkmaz incident". Every time now and then a complete crackpot comes along and makes outrageous claims. People may be insecure for a while, because what if he/she was onto something?

Reading his blog - which is very similar to his post - reveals a complete lack of understanding of the public/private encryption concept, combinatorics, bitcoin specifications and what not... A tragedy.

Best thing in the Levent-thread were the references to the etotheipi-diagrams.

Rico

cloverme

legendary

Activity: 1512

Merit: 1057

SpacePirate.io

An interesting project for sure. Everyone does quote the math, I think if anything, your study could be used to support such challenges in the future. Once guy here in this thread was convinced of a threat but couldn't prove it: https://bitcointalksearch.org/topic/bitcoin-greatest-vulnerability-1516755

I'll keep an eye out for your project for sure.

rico666

legendary

Activity: 1120

Merit: 1037

฿ → ∞

Relax - I know the math. (I think)

On the other hand, when it comes to this topic, many do not seem to know the math(*). Sure - big numbers are involved. Gargantuan numbers. Not exactly Googol or Googolplex like numbers, but still big enough. When it comes to big numbers, people seem to exhibit an intriguing weakness to deal with them.

(*)

First problem for some people is to decide whether a 2^256 or a 2^160 search space is involved brute force ... over and over again.
Because it seems to be such an unfeasible task, people dismiss it - often with a standard phrase that used to be used for dismissal. Even if this phrase is obsolete in some context today. "You should use the hardware for mining instead". Right.
Many seem to have problems to distinguish between attacking one specific public address agains attacking ANY public address. Sure - the odds differ by about 6 orders of magnitude, but given above gargantuan numbers it is negligible - isn't it?
Some confuse a collision attack with cracking poor privkeys
...

Now - because my SHA256-ASIC miners are not capable of driving a collision attack and because my servers can use some load and because I seem to have way too much time on my hands and because I like to boldly go where no man has gone before...

I started a feasibility study for a Collision Attack

It is a feasibility study, because at the moment I do not think I will ever find a privkey for some (foreign) address that has funds on it and even if I do, I think the benefit of documenting that may be higher than just slurping the funds. We'll see...

So what have I done so far?

At the moment, I have written a small Perl script that parses the output of the blockparser to get the N topmost addresses with funds on them (I'm cutting it off after 2.5 mio) and store them in-memory in a hash.

I also modified the directory.io no latency code to give me a bunch of addresses from a certain offset, sped up the code and lowered memory footprint for the privkeys.

Again, Perl to the rescue, I'm using a little parallelization to launch the GO programs.

All put together, I am able - at the moment - to check 2 million pubkeys against the 2.5 million addresses with funds on them per second. That's a neat 5000000000000 comparisons per second. Given the amount of time I spent hacking this (roughly a day) it's ok. The number looks impressive, but as is I have a 50% chance of hitting an address with funds on it after a mere 1691552820984841340513524111940 days (or 4634391290369428330174038662 years).

Of course, I could hit one tomorrow, but the odds are just low ATM.

So why am I doing this anyway?

Besides the "boldly go where no man..." issue, I'm actually looking how far I can push this. I want to see for myself how far an individual with some skills, some access to decent hardware can push this issue. Consider it my Google rainy Summer of Code. Obviously, if I could speed the whole thing up by a factor of 4634391290369428330174038662, I could hit an address once a year or so. I intend to edit this post - or maybe even to participate in some constructive discussion here - and to update any progress there may be.

You can also think of it as a canary project. You now know someone is definitely doing it. Someone who has decided to talk about it. So are your funds save? Theoretically yes. Practically? Probably yes. If not, I will tell.

What am I going to do next?

Investigating speedup of course. The pubkey generation is the most time consuming thing at the moment. oclvanitygen claims even on my "tiny" notebook a nice 16 MKeys/s, but that's with a prefix comparison. Trying to feed it 2.5 mio prefixes of full length fails. ;-) However, I will spend some time to see how/if I could make use of GPU power for this specific use case.

I'll have a look at Amazon EC2 services once I feel the per-server efficiency is approaching the optimum (which it clearly doesn't ATM). My rough estimate is, that I could speed up this task by a factor of 100k, but after this I'm missing the ideas.

Still - if I manage to check 200 bn Keys per second against all addresses with funds on them for a month, I will consider this feasibility study a success. The result will probably be "not feasible", but the study can be attached for reference to these feasibility discussions. :-)

Will you provide the code?

No, at the moment I have no plans doing so. I may provide it when I consider the study done and if I am convinced no harm could be caused with it.

What I can do at the moment, is to provide you for a small fee (0.01 BTC) with 10 blocks of 2M pubkeys (1M uncompressed and the corresponding 1M compressed) which are computed from consecutive privkeys. Just say the offset. The block is 75MB uncompressed and 52MB bzip2-compressed. You can also get the list of 9.2 mio pub addresses with funds on it for the same fee. I'd be interested to hear about your use case.

How does it look like on the console?

I'd like to show, but the code tags here seem to filter and not show things verbatim. Ok, so a picture:

The "FOUND:" message is what one would expect to see if a collision occurred. Right now it's just an address I injected into the "addresses with funds" hash to test if it actually works It has the privkey 666000066 - see also here. ;-) Every dot in there is the checking of a 2M block of pubkeys.

Rico

update 16-07-17:

After some tweaks and using 12 CPU cores, I can check 6M keys per second against 2.5M addresses with funds on them, but I'm running into an I/O bottleneck here, so I have distributed the load to some I/O subsystems on the machine. Which is better, but doesn't quite cut it (still see disk waits). I could use a RAMdisk, but then a check batch would be limited to 2bn keys and cleanup would be required. Pipes are too slow and once generated pubkeys would be lost. Right now I'm storing them for some later re-runs/analysis.

update 16-07-19:

Never trust anyone. :-) Seems the znort-blockparser is garbling up some data (it's presenting me public addresses starting with "4" - see here), also it's quite difficult to test if your checks are really working, because 100% of the time you have no matches ... so you may not immediately realize when you make changes to the program that you have a bug and would never match. So now I have got a test suite!

update 16-07-20:

Seems the "4" prefix addresses are a feature and not a bug. I'm just ignoring these addresses right now, but I am looking at ALL addresses with a balance (4368371 with my data). Revisited the 1st 10 bn addresses with that extended cross checking - still no match. :-) So while I lowered the avg time until I find an address with funds on it to 884080757329110464399284207 years (I literally spared 3750310533040317865774754455 years) - It's still tad pole.

update 16-07-21:

Tried to use gccgo compiler for the Go key-generation program, but failed. Obviously the Go btc library does have some optimized routines written in Assembler. Unfortunately, the format is Plan9 where the gccgo expects GNU format. :-/ Had a look at some rainbow table techniques and also made some calculations about various probabilities of a Birthday attack. Also thought about offsets where to start the attack from. 0 isn't probably the best choice. Meanwhile 12 CPUs crunch the data with ~~26210226000000~~ 19279164000000 checks per second.

update 16-07-24:

I'm now generating the blocks on the fly, as the 2 Terabyte I allocated to storing the computed keys are full. Wink

I also had to fix handling of big integer numbers, as starting from 0 has a long way to go until you run into some (64bit?) overflow, but when I gave it an offset like 84295974720949272949333500620693206304552891965859367805505284669110495 the poor thing folded. So that works now too and I try some more "promising" offsets. In the meantime, I had/have a look at the Perl bitcoin library and am shaping it up a little bit, because I have some neat ideas of interweaving that library, my trading bot (well - it doesn't run on CEX.io anymore) and some nice scripts I created during this study to a nice bigger blob. Grin

BTW - I'm checking now against all addresses that have at least 1 Satoshi balance on them (9207506)

update 16-07-27:

The checking of the 30bn keys for duplicates is not feasible at the moment. A naive approach would require 0.5 * 30bn² comparisons, a cached approach (with say k keys in memory and O(1) lookup) still O(N² / k). I tried with a bloom filter for 3bn keys and 0.00001 probability of false positives, which took 67GB memory, but after feeding it roughly 1.5bn keys, the false positives kept pouring in... I will set up another machine with 144GB and try there, but right now I'm busy doing other things, so I simply let it crunch the regular process. Updated the status report below.

update 16-07-28:

I've examined like 36bit of the search space. ;-) 40 bit will be like 16times as much effort and then there are still 120 bits left. First doubts appear if I will ever be able to complete this task. Cheesy

Maybe I only need faster machines?

update 16-07-31:

Had a look at the secp256k1 library to fasten things up. Unfortunately it's - still - documented quite poorly, so for a non-C programmer as myself it's quite hard to get something working. However, I did manage to hack the bitcoin-tool to do my bidding. Unfortunately the resulting C code is slower than the Go code I'm using right now, so no fastening up at the moment...

update 16-08-03:

*DING* - as of today, over half a billion of pages equivalent to those on directory.io were searched. That's an equivalent of over 64 billion addresses (total sum is somewhat around 70 billion checked). I'm thinking about setting up a distributed infrastructure for this, if for nothing else but to manage multiple of my machines to work on this in some coordinated effort. This will require to implement some rudimentary "work distributor", a client-server protocol and of course to shape up the client side. One more thing: I am investigating the use of the Fast Intel SHA256 implementation to possibly speed things up a little bit more.

update 16-08-04:

Suggested a collision finders pool here. Distributing the work on some computers of mine, so worked a little bit on the client. About 22 CPU cores in total, so you will see the number of checked pages below in the status rising somewhat faster.

update 16-08-07:

Some more hacking on the client (which is called LBC now) and also started to hack a "pooling server". Changed the block size from 1000000 to 1048576 so it's actually exactly 20bit search space per block.

update 16-08-11:

Pooling server in beta test mode. Stats will be presented on the stats page there once up. Not long and we will have the equivalent of 1 billion pages of drectory.io processed.

update 16-08-12:

End of Status updates here. For current statistics, please go here, for any news, go here.

Topic: Collision "Attack" feasibility study (Read 4737 times)