Pages:
Author

Topic: What are the chances of an address collision? and what happens when it does? - page 3. (Read 22435 times)

legendary
Activity: 1092
Merit: 1016
760930
Quote
    * getting hit by lightning while taking a crap for how many years in a row is equally probable as finding a collision: log(1E-65) / log(6.8E-12) = 5.8

I don't get why you put logs here

perhaps because we're talking about taking a crap?
hero member
Activity: 728
Merit: 540
Quote
    * getting hit by lightning while taking a crap for how many years in a row is equally probable as finding a collision: log(1E-65) / log(6.8E-12) = 5.8

I don't get why you put logs here
donator
Activity: 2772
Merit: 1019
on a side not: the probability of getting struck by lightning during your 80 year long life is 1/3500???

1.0 / (1.0 - pow((1.0-1.0/280000),80)) = 3500

on the bright side: in case that happens to me the probability of me taking a crap at that time is only 1/525600.
donator
Activity: 2772
Merit: 1019
The number of addresses generated isn't important it is the number of funded addresses.  A collision with a spent change address that will never be used again isn't very valuable.

So if we assume 1 trillion active addresses (10 billion users with an average of 100 active funded addresses at any point in time). The odds of finding a collision are ~ 1 in 1*10^65.

If the odds of getting hit by lightning are 1 in 280,000 then the odds of getting stuck by lightning 12 times per year is roughly the same as creating a collision with with one of the 1 trillion active addresses.  I don't like lightning comparison because lightning is localized (not randomly distributed).  If we use the lottery example the odds of buying lottery tickets for 8 consecutive games and winning the jackpot on all 8 is roughly the same as finding a single collision.  

I would point out that the single random collision is unlikely to be very profitable.  If there are 1 trillion addresses and all coins are mined the average address will hold ~0.000021 BTC (21 uBTC - microBTC).  Another way to look at it is if we assume in this 10 billion scenario Bitcoin has replaced all global money supplies then the collective value of the network would be on the order of ~$60T (2012 dollars).  At 1 trillion addresses the value of finding that collision (less likely than winning lottery 8 times in a row) would be ~$60.  BTW fun fact that would make the exchange rate $2.857 per uBTC!


thanks, Death&Taxes for doing the math.

What would you suggest to answer to a newbies question: "But wouldn't 2 people accidentally find the same address at some point?"

I still 'd like to be able to answer something like: "It's possible, but it's about as likely as you getting struck by lightning while taking a crap - *pause* - every year for 5 years in a row!". Just don't know.

Let me try to do the math:

  • probability of getting struck by lightning in any given year: 1/280000.
  • probability of taking a shit at any given point in time: 1/(60*24*365) = 1/525600 (assuming you take a crap every day and the actual process takes 1 minute)
  • probability of getting struck by lightning while taking a crap in any given year: 1/(280000*525600) = 1/1.47E11 = 6.8E-12
  • probability of finding a collision: 1E-65
  • getting hit by lightning while taking a crap for how many years in a row is equally probable as finding a collision: log(1E-65) / log(6.8E-12) = 5.8

is my math roughly correct?
kjj
legendary
Activity: 1302
Merit: 1026
An address collision would be a SHA256 collision

And if you find a SHA256 collision then bitcoin is the last of our problem



Technically it would be a RIPEMD160(SHA256(SHA256())) collision. Not nitpicking, it's an important distinction given that the last step in that process yields 2^160 possible addresses instead of 2^256. 96 bits of keyspace ain't no joke son.

If I understand this right, does this basically mean that each address actually has tons of private keys (only one of which is known)?

Well, probably.  The mapping isn't known to be evenly distributed.  But on average, there should be something like 296 keys per address.

P.S.  It is RIPEMD-160(SHA-256(pubkey)) .  Only two hashes, not three.
hero member
Activity: 728
Merit: 540
An address collision would be a SHA256 collision

And if you find a SHA256 collision then bitcoin is the last of our problem



Technically it would be a RIPEMD160(SHA256(SHA256())) collision. Not nitpicking, it's an important distinction given that the last step in that process yields 2^160 possible addresses instead of 2^256. 96 bits of keyspace ain't no joke son.

If I understand this right, does this basically mean that each address actually has tons of private keys (only one of which is known)?

That reminds me of this joke :

An 80-years old woman is attending a lecture on astronomy, where the astronomer states that our sun will explode and transform itself into a red giant within 5 billion years.

At the end of the lecture, the old lady asks the astronomer :

- How long did you say our sun will last ?
- 5 billion years
- What a relief ! I thought you said 5 millions !

legendary
Activity: 1092
Merit: 1016
760930
An address collision would be a SHA256 collision

And if you find a SHA256 collision then bitcoin is the last of our problem



Technically it would be a RIPEMD160(SHA256(SHA256())) collision. Not nitpicking, it's an important distinction given that the last step in that process yields 2^160 possible addresses instead of 2^256. 96 bits of keyspace ain't no joke son.

If I understand this right, does this basically mean that each address actually has tons of private keys (only one of which is known)?
donator
Activity: 1218
Merit: 1079
Gerald Davis
The number of addresses generated isn't important it is the number of funded addresses.  A collision with a spent change address that will never be used again isn't very valuable.

So if we assume 1 trillion active addresses (10 billion users with an average of 100 active funded addresses at any point in time). The odds of finding a collision are ~ 1 in 1*10^65.

If the odds of getting hit by lightning are 1 in 280,000 then the odds of getting stuck by lightning 12 times per year is roughly the same as creating a collision with with one of the 1 trillion active addresses.  I don't like lightning comparison because lightning is localized (not randomly distributed).  If we use the lottery example the odds of buying lottery tickets for 8 consecutive games and winning the jackpot on all 8 is roughly the same as finding a single collision.  

I would point out that the single random collision is unlikely to be very profitable.  If there are 1 trillion addresses and all coins are mined the average address will hold ~0.000021 BTC (21 uBTC - microBTC).  Another way to look at it is if we assume in this 10 billion scenario Bitcoin has replaced all global money supplies then the collective value of the network would be on the order of ~$60T (2012 dollars).  At 1 trillion addresses the value of finding that collision (less likely than winning lottery 8 times in a row) would be ~$60.  BTW fun fact that would make the exchange rate $2.857 per uBTC!
donator
Activity: 2772
Merit: 1019
Comparatively speaking, your odds of being struck by lightning in a given calendar year are about 1 in 280,000. The odds of winning my local lottery are about 1 in 176,000,000. So finding a collision on your first try is roughly equivalent to being hit by lightning 16,540,000,000,000,000,000,000,000 times per second for an entire year or winning the lottery 830,000,000,000,000,000,000,000,000,000 times.

If you find a collision I would stay indoors and play the lottery.
The highlighted part is wrong. The odds of getting struck by lightning n times is not p*n, it's p^n. Ie. the probability of throwing two sixes in a row with a dice is not 1/12, it's 1/36.
[/quote]

right, so how many lightning strikes per year are we talking to have the same probability of seeing an accidental collision occur between any two generated addresses (assuming 10 billion bitcoin users that each generate 1000 addresses per year)?
legendary
Activity: 980
Merit: 1008
Chance are negligible. If collision occurs with a funded address, attacker you can transfer funds elsewhere.



Chances are still negligible when 1 billion people are using it?  also can't I just run some kind of bots, that randomly generate addresses to see if
they have funds in them?

Yes, chances remain negligible. You could run your bot, but it'd be a waste of electricity. Chances are you'd wait the lifetime of the universe before finding a collision.

But probability also says, I could have a success on my first run? isn't it?

Sure, you could absolutely find a success on your first run, but let's apply probability to your scenario.

Let's say there are a billion people using 10 addresses each for 10 billion total addresses.

This means that each address you generate has a (1/2^160)*10,000,000,000 possibility of holding a balance, giving your first attempt a 0.0000000000000000000000000000000000000684% chance of finding a collision on your first attempt.

You are correct in stating that with each try it will either happen or it won't, there is no in-between state, and you're correct in stating that it's possible. It's also bad news for the account holder that a collision would give you control of those funds.

Comparatively speaking, your odds of being struck by lightning in a given calendar year are about 1 in 280,000. The odds of winning my local lottery are about 1 in 176,000,000. So finding a collision on your first try is roughly equivalent to being hit by lightning 16,540,000,000,000,000,000,000,000 times per second for an entire year or winning the lottery 830,000,000,000,000,000,000,000,000,000 times.

If you find a collision I would stay indoors and play the lottery.
The highlighted part is wrong. The odds of getting struck by lightning n times is not p*n, it's p^n. Ie. the probability of throwing two sixes in a row with a dice is not 1/12, it's 1/36.

What I'd like to know is why some letters have higher probability than others? Is that because some letters are more likely on the elliptic curve vs not? If you plug sequences into vanitygen it will give different difficulties depending on letter, eg.

11... > 1A...
1Q... > 1D...
1R... > 1Q... and so on.

I made a table of the first three chars after 1 to help me in a project with fast lookup of difficulty calculation. There are broad classes of probabilities depending on what letters are involved. I guess there is some non 1-1 mapping from 2^160 possible addresses and base 58 representations.

It's not related to elliptic curves, as an address is created from a hash, which is basically just random data.
The reason is that the data that is encoded as base58 (and thus becomes an address) is ||<20 byte RIPEMD160 public key hash>||<4 byte checksum> where || denotes concatenation.

This version byte is 0 on the main network. So the binary data of an address will always start with 0000 0000. Since this data is converted into a large number, the probability that some of the leading zero bits will be included in the leftmost bits of the large number is greater than none of the leading zero bits being included (since there are eight of them). This results in the probability of the number starting with a small digit being greater.

This large number is basically just continually divided by 58, and the *remainder* of this division is then used to determine which character is added to the address string (based on it's index position in the string "123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz"). This is repeated in a loop until the large number has been reduced to zero. After this *the string is reversed*. This means that the last remainder will be the first character in the address (after the '1' which will always be there because a '1' is prepended to the address for every leading zero-byte, of which there is always one because the "version byte" is always zero on the main network). So since there is a greater probability that the first part of the number is a small digit, there is a greater chance that the first character of the address will be one of the first characters of the string from which characters are selected based on their index position in that string.
legendary
Activity: 924
Merit: 1004
Firstbits: 1pirata
I'm not saying collisions anywhere close to likely to occur, but...

Given your example of 1 billion users at 10 addresses each:

There are 2^160 or about 1,460,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000 possible addresses
In your scenario, 1,000,000,000 people are using 10 addresses each for a total of 10,000,000,000 possible addresses
10,000,000,000 / 2^160 should yield the probability of a collision occurring
10,000,000,000 / 2^160 = 0.00000000000000000000000000000000000000684

So the chances of a collision occurring in your scenario are approximately 0.000000000000000000000000000000000000684%

This is the chance of finding a collision when generating 1 address.

The chance of a collision occuring "at all" is higher.

Let's assume (worst case) we use bitcoin for 1E3 years and there are 1E10 people populating earth at any point in time who will generate on average 1E3 addresses during their lifetimes.

What are the chances of a collision to occur under these assumptions?


Hopefully in 1E3 years if a human "stumbles" onto someone else's money they will not touch it, or give it back right away.

"Real is gonna change... just watch"
donator
Activity: 2772
Merit: 1019
I'm not saying collisions anywhere close to likely to occur, but...

Given your example of 1 billion users at 10 addresses each:

There are 2^160 or about 1,460,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000 possible addresses
In your scenario, 1,000,000,000 people are using 10 addresses each for a total of 10,000,000,000 possible addresses
10,000,000,000 / 2^160 should yield the probability of a collision occurring
10,000,000,000 / 2^160 = 0.00000000000000000000000000000000000000684

So the chances of a collision occurring in your scenario are approximately 0.000000000000000000000000000000000000684%

This is the chance of finding a collision when generating 1 address.

The chance of a collision occuring "at all" is higher.

Let's assume (worst case) we use bitcoin for 1E3 years and there are 1E10 people populating earth at any point in time who will generate on average 1E3 addresses during their lifetimes.

What are the chances of a collision to occur under these assumptions?
legendary
Activity: 1092
Merit: 1016
760930
Chance are negligible. If collision occurs with a funded address, attacker you can transfer funds elsewhere.



Chances are still negligible when 1 billion people are using it?  also can't I just run some kind of bots, that randomly generate addresses to see if
they have funds in them?

Yes, chances remain negligible. You could run your bot, but it'd be a waste of electricity. Chances are you'd wait the lifetime of the universe before finding a collision.

But probability also says, I could have a success on my first run? isn't it?

Hey, I started this little pet project with you in mind Smiley
sr. member
Activity: 462
Merit: 250
You are, however, much better off generating collisions on various deterministic wallets like brainwallets etc... there are plenty of people out there who do not get it why some passwords/keys must be strong.

If lucky you will teach some punks a good lesson BTW.

repeating the obvious: the seed secret for deterministic wallets is the equivalent to password.
not only must it be secret, also difficult to guess.
donator
Activity: 980
Merit: 1000
I'm also wondering how long it would take to crack an address/key pair. For example, let's say I have a lot of hardware at my disposal, and can generate
1 Billion address/key pair per second. How long would it take to crack a specific address? can some one do a math on this one? I would guess not very long.

You guess wrong.

1 billion per second is nothing in an address space of size 2^160. It would take ~ 3.4 E+21 times the age of the Universe in average.

Let Wolfram Alpha do it for you http://www.wolframalpha.com/input/?i=(2%5E160)%20%2F%201e%2B9%20seconds%20to%20years
legendary
Activity: 1806
Merit: 1003
I'm also wondering how long it would take to crack an address/key pair. For example, let's say I have a lot of hardware at my disposal, and can generate
1 Billion address/key pair per second. How long would it take to crack a specific address? can some one do a math on this one? I would guess not very long.
donator
Activity: 1218
Merit: 1079
Gerald Davis
To me a "valid" address is one that has a private key.
The checksum verification of an address does not warrant that a private key exist for that address.

Hence my post above: the point represented by the public key MUST be on the ellptic currve.

In short,
1/you can random type an address with a valid checksum
2/ funds can be sent to that address because miners will get it on the blockchain
3/Nobody will ever spend those funds again because there is no private key for that address: the bitcoins on that address are lost for ever

The network has no method of determining if the private key is "known".  It doesn't matter if a private key was known but no longer is (lost),  is unknown but theoretically possible, or simply impossible (there is no possible 256 bit number which produces that address).

In any address if the private key is currently unknown the coins can't be spent.  If an address confirms to the protocol spec it is valid.

Due to the checksum the probability of entering an valid but incorrect address due to a typo is less than 1 in 4 billion.  Realistically it isn't going to happen.
legendary
Activity: 1221
Merit: 1025
e-ducat.fr
What I'd like to know is why some letters have higher probability than others? Is that because some letters are more likely on the elliptic curve vs not? If you plug sequences into vanitygen it will give different difficulties depending on letter, eg.

11... > 1A...
1Q... > 1D...
1R... > 1Q... and so on.

I made a table of the first three chars after 1 to help me in a project with fast lookup of difficulty calculation. There are broad classes of probabilities depending on what letters are involved. I guess there is some non 1-1 mapping from 2^160 possible addresses and base 58 representations.

In ECDSA, you start with a large integer as your private key then multiply ("multiplication" here must be understood as a group operator not as arithmetic multiplication) the base point G of your elliptic curve to get the corresponding public key.
Therefore your public key is a point (x,y) on the elliptic curve.

In bitcoin, your address is some hash of you public key.
As a result, if you start from a random string, the likelihood of a collision between said string and a "valid" address depends on how many integers n satisfy the condition "n*G is a point on the curve AND this point hashes to a matching string".
Therefore a brute force algorithm (like vanitygen) might display varying difficulty as a function of the chosen string.
full member
Activity: 161
Merit: 100
There is probability then when you throw ball it will not bounce but go through wall....
hero member
Activity: 784
Merit: 1009
firstbits:1MinerQ
What I'd like to know is why some letters have higher probability than others? Is that because some letters are more likely on the elliptic curve vs not? If you plug sequences into vanitygen it will give different difficulties depending on letter, eg.

11... > 1A...
1Q... > 1D...
1R... > 1Q... and so on.

I made a table of the first three chars after 1 to help me in a project with fast lookup of difficulty calculation. There are broad classes of probabilities depending on what letters are involved. I guess there is some non 1-1 mapping from 2^160 possible addresses and base 58 representations.
Pages:
Jump to: