Pages:
Author

Topic: Statistical analysis of Bitcoin public key distribution (Read 4856 times)

legendary
Activity: 1260
Merit: 1019
Quote
No.  On average there approximately 296 key pairs mapped to each Bitcoin address.
Your statement has it backwards.

Thank you. I am very sorry. Thinking in English is quite hard for me. Thanks for correcting my mistake
legendary
Activity: 2646
Merit: 1137
All paid signature campaigns should be banned.
Quote
Private keys are intended to be one-to-one with public keys, so that would certainly be a flaw in ECDSA if two private keys correspond to one public key
It will be flaw in basic arithmetic, not in ECDSA.
X * 7 = 28
How many values of X do you know? Two? Give me both, please!

Quote
but since you turn the 256 bit public key into a 160 bit digest, it would just be incredibly unlikely, not impossible, for an ideal hash function to map two different inputs 256 bit to a single 160 bit output.

What's wrong with my understanding?
That is correct. There is 296 addresses for each private key. No problem.

No.  On average there approximately 296 key pairs mapped to each Bitcoin address.

Your statement has it backwards.

To spend the BTC at a Bitcoin address you need to have one of the approximately 296 possible key pairs that will properly map to that Bitcoin address.
sr. member
Activity: 467
Merit: 267
Quote
Private keys are intended to be one-to-one with public keys, so that would certainly be a flaw in ECDSA if two private keys correspond to one public key
It will be flaw in basic arithmetic, not in ECDSA.
X * 7 = 28
How many values of X do you know? Two? Give me both, please!
Elliptic curve mathematics is much more complicated than this. Besides, there are technically multiple values because the EC forms a cyclic group but we choose the normalized value. anyway Why the sarcastic comment?
legendary
Activity: 1260
Merit: 1019
Quote
Private keys are intended to be one-to-one with public keys, so that would certainly be a flaw in ECDSA if two private keys correspond to one public key
It will be flaw in basic arithmetic, not in ECDSA.
X * 7 = 28
How many values of X do you know? Two? Give me both, please!

Quote
but since you turn the 256 bit public key into a 160 bit digest, it would just be incredibly unlikely, not impossible, for an ideal hash function to map two different inputs 256 bit to a single 160 bit output.

What's wrong with my understanding?
That is correct. There is 296 addresses for each private key. No problem.

UPD: sorry, this sentence has mistake. see replies below.
legendary
Activity: 3472
Merit: 4801
Correct my misunderstanding then:

Private keys are intended to be one-to-one with public keys, so that would certainly be a flaw in ECDSA if two private keys correspond to one public key, but since you turn the 256 bit public key into a 160 bit digest, it would just be incredibly unlikely, not impossible, for an ideal hash function to map two different inputs 256 bit to a single 160 bit output.

What's wrong with my understanding?

There is nothing wrong with your understanding.

When I say:
the only way you "could" do something

I mean it in the same way as when I say:
"The only way all the air molecules in the room could spontaneously collect in one corner instantly suffocating everyone in the room is if there is a flaw in our current understanding of physics."

Sure, there is an "incredibly unlikely, not impossible" probability of it happening, but in terms that the typical human being understands it can't happen.

Meanwhile when you say "not impossible", you are simply using a non-zero probability number (regardless of how extremely small that number is) as an indication of "possibility".
sr. member
Activity: 467
Merit: 267
1. The hash function maps billions of 256 bits input to the same value.  It's just that it's not computably easy to find a collision.
2. Bitcoin verifies that you can spend from an output by checking that the public key that you provide has the same hash the address and that you have the private key.
In other words, when the money was sent out, only the hash was given. To claim the money, you show the public key.
member
Activity: 75
Merit: 10

You are wrong.  If you have a private key that generates a particular bitcoin address then you can spend all bitcoins that are received at that address, even if that private key isn't the same private key that was used by the intended recipient to originally generate the address.

(Note: The only way that you "could generate an address that is already done" with a different private key is if there is a flaw in ECDSA, SHA-256, or RIPEMD-160.  There are currently no known flaws that would cause this, and with nearly 6 years of use bitcoin has not ever had a recorded instance of this happening.)

Correct my misunderstanding then:

Private keys are intended to be one-to-one with public keys, so that would certainly be a flaw in ECDSA if two private keys correspond to one public key, but since you turn the 256 bit public key into a 160 bit digest, it would just be incredibly unlikely, not impossible, for an ideal hash function to map two different inputs 256 bit to a single 160 bit output.

What's wrong with my understanding?
legendary
Activity: 1143
Merit: 1000
I dont know which algorithm it is using,

Clearly.  Please explain how random guessing and tossing out words that you think might be associated with cryptography would be helpful at all?

if its gpg,

It's not.  The public key is ECDSA using the secp256k1 curve. The bitcoin address is a RIPEMD-160 hash of a SHA-256 hash.

should be totally random and nothing could be found if its not then the flaw might be from GPG and about the entropy that each system used to generate the key.

The entropy is in the choosing of the private key.  The questions being asked, as I understand them, are if a statistical analysis has been attempted to determine if there are any flaws in any of ECDSA, SHA-256, or RIPEMD-160 that might lead to discovery that either public keys or bitcoin addresses are less random than believed.

In theory you could generate an address that is already done but then the private key would not be the same, therefor you wouldnt be able to spend the funds.. correct me if im wrong.

You are wrong.  If you have a private key that generates a particular bitcoin address then you can spend all bitcoins that are received at that address, even if that private key isn't the same private key that was used by the intended recipient to originally generate the address.

(Note: The only way that you "could generate an address that is already done" with a different private key is if there is a flaw in ECDSA, SHA-256, or RIPEMD-160.  There are currently no known flaws that would cause this, and with nearly 6 years of use bitcoin has not ever had a recorded instance of this happening.)

Thanks alot, this clears alot of things to me Cheesy
sr. member
Activity: 467
Merit: 267
Some other queries:
Top 10 richest addresses:

Code:
select address, balance / 1e8 from balances order by balance desc limit 1,10;
1i7cZdoE9NcHSdAL5eGjmTJbBVqeQDwgw 144341.53393243
13Df4x5nQo7boLWHxQCbJzobN5gUNT65Hh 134033.21518705
1FeexV6bAHb8ybZjqQMjJrcCrHGW9sb6uF 79957.05831838
1HQ3Go3ggs8pFnXuHVHRytPCq5fGG8Hbhx 69471.09995224
16ZbpCEyVVdqu8VycWR8thUL2Rd9JnjzHt 61693.55853673
1PB4xXUFyy4kSNqroCBVaQuCuw9VcN3be4 61544.25018469
1PnMfRF2enSZnR6JSexxBHuQnxG8Vo5FVK 61534.65308581
18f1yugoAJuXcHAbsuRVLQC9TezJ6iVRLp 61517.02586555
1DiHDQMPFu4p84rkLn6Majj2LCZZZRQUaa 61442.99572266
1AhTjUMztCihiTyA4K6E3QEpobjWLwKhkR 61387.07600804

Code:
select count(*) from balances;
3460367

The median is only 1 mbtc
Code:
select address, balance / 1e8 from balances order by balance asc limit 1730183,1;
1H6c1fDEzJBhvViHhtrW5L9YTuSfRNp44w 0.001

Total balance
Code:
select sum(balance) / 1e8 from balances;
13316310.28675084

The top 18 (0.0005 %) addresses own 10% of the total
Code:
select sum(balance) / 1e8 from (select balance from balances  order by balance desc limit 18) as T;
1345428.14052543

The top 3000 (0.09%) own more than half
Code:
select sum(balance) / 1e8 from (select balance from balances  order by balance desc limit 3000) as T;
6820871.23187743

sr. member
Activity: 467
Merit: 267
I found different results.

Code:
addr  count      total balance
1BiT 11784 990.72856
1Btc 1015 2415.5241782
1MUo 614 169.74258679
1tip 588 9.21681564
1CaR 557 479.44438884
1Pay 511 502.95050021
1MaR 506 242.68727996
1AGa 496 835.41160235
1AgZ 463 429.46682356
1AGB 452 470.97131551
1Agw 443 515.40205199
1CHA 428 1142.89394438
1AGY 427 1293.95860474
1BaN 409 650.5419544
1CaN 402 435.30669108
1CAs 391 2208.6754669
1AgX 390 459.49381862
1BrA 367 229.65373024
...

For a total of 3.4 mil addresses
legendary
Activity: 2646
Merit: 1137
All paid signature campaigns should be banned.
Here's a CSV through block 322933 https://mega.co.nz/#!cRVEUQJA!kkqcVwo6g47hHCA1IkA_JX2r7JHXC4iNBwkRVThZavs

Edit: BTW, these stats only cover p2pkh-type outputs.
THANKS!  I wonder about the data, perhapse there is a couple of bugs?

1)  The sum of the first column (number of address ever in each bin) is 48,297,867 and that seem right.  However the sum of the second column, the number of active addresses, is only 345,558.  That seems low to me.  That would indicate that all of the BTC in existence are stored on only 345,558 unique addresses?

2) This is a nit but there are only 195,111 rows.  The last row for bin 1zzz is missing.

I did a quick sort on the number of currently active addresses in each bin and the top ten are:

Code:

 Index    Bin    Ever   Now
------   ----   -----   ---
 32543   1Ag6     852   106
 32542   1Ag5     842    82
  9852   13vs     760    71
 32541   1Ag4     879    56
 23402   17xV     666    51
 32593   1Agx     837    48
   684   11Co      45    35
 35159   1BTC    1551    34
     0   1111     374    33
  1548   11Th      50    32

There does seem to be a lot of them in the 1Ag range but this is easily explained by Casascius because the way he created his coins skews the addresses into certain bins.  For example he created 1786 addresses starting with 1Ag in this batch of coins alone:

http://casascius.uberbills.com/?type=1&status=active

I also did a quick sort on the number of addresses ever in each bin and the top 20 are:

Code:

 Index    Bin    Ever   Now
------   ----   -----   ---
116804   1bit   12043     6
 35159   1BTC    1551    34
 68892   1MUo    1148     5
 36069   1Bit     892    27
 36712   1Buy     883     1
 32541   1Ag4     879    56
 32571   1Aga     869     4
 32572   1Agb     867     8
 32539   1Ag2     866     3
 32592   1Agw     863     7
 32538   1Ag1     859     7
 32595   1Agz     856     5
 32543   1Ag6     852   106
 32594   1Agy     846    25
 32542   1Ag5     842    82
 32593   1Agx     837    48
 75469   1PSC     834     9
 66967   1Luc     819    16
 37550   1CAR     813    13
 34187   1BAS     807    31

Again, vanity addresses obviously skew the distribution.
newbie
Activity: 16
Merit: 0
Here's a CSV through block 322933 https://mega.co.nz/#!cRVEUQJA!kkqcVwo6g47hHCA1IkA_JX2r7JHXC4iNBwkRVThZavs

Edit: BTW, these stats only cover p2pkh-type outputs.
legendary
Activity: 2646
Merit: 1137
All paid signature campaigns should be banned.
I know this is silly so don't tell me it is silly.

I threw the code snippet together quickly so don't critique the code, it is for illustration purposes only.

I am willing to throw a small amount of BTC your way if you do this for me (I am too busy).  Please PM me with your bid and a super short description of what technology you would use (Java, C++, database, etc.)  to produce the result I want.

Here is what I want:

Scan through the entire blockchain and sort all the Bitcoin addresses found into "bins" by the first N letters.  The number of possible bins is related to N as follows:

Code:
//             Number of
//    N    possible bins
//    -   --------------  
//    2               58
//    3            3,364
//    4          195,112
//    5       11,316,496
//    6      656,356,768
//    7   38,068,692,544

I think an N of 4 will be good as a first pass.  If this turns out to be interesting then an N of 5 might be interesting.

For each bin 1111 through 1zzz I want to know how many addresses exit in that bin and how many of those addresses have positive balances.  Here is a short code snippet to illustrate what I am looking for.  

Code:
#include

using namespace std;

static const int N = 4;

//             Number of
//    N    possible bins
//    -   --------------  
//    2               58
//    3            3,364
//    4          195,112
//    5       11,316,496
//    6      656,356,768
//    7   38,068,692,544

char base58[] = "123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz";

int main (int argc, char *argv[])
{
    cout << argv[1] << endl;

    if (argv[1][0] == '1') {

        int bin = 0;

        for (int i=1; i
            for (int j=0; j<58; ++j) {

                if (argv[1][i] == base58[j]) {
                    bin = bin * 58 + j;
                    break;
                }
            }
        }

        cout << "Increment bin " << bin << endl;

        // Here you would increment AdddressExists[bin]

        // Here you would check to see if the address has a balance and if so
        // increment HasBalance[bin]

    } else {

        cout << "Did not start with 1" << endl;
    }

    return 0;
}

At the end of the program produce a CSV file something like this:

Code:
0,"1111",5,2
1,"1112",0,0
...
195110,"1zzy",1,0
195111,"1zzz",2,1
newbie
Activity: 17
Merit: 0
Maybe it is possible to use pub keys as random numbers? Maybe it is possible to use Bitcoin as random generator?
legendary
Activity: 3472
Merit: 4801
I dont know which algorithm it is using,

Clearly.  Please explain how random guessing and tossing out words that you think might be associated with cryptography would be helpful at all?

if its gpg,

It's not.  The public key is ECDSA using the secp256k1 curve. The bitcoin address is a RIPEMD-160 hash of a SHA-256 hash.

should be totally random and nothing could be found if its not then the flaw might be from GPG and about the entropy that each system used to generate the key.

The entropy is in the choosing of the private key.  The questions being asked, as I understand them, are if a statistical analysis has been attempted to determine if there are any flaws in any of ECDSA, SHA-256, or RIPEMD-160 that might lead to discovery that either public keys or bitcoin addresses are less random than believed.

In theory you could generate an address that is already done but then the private key would not be the same, therefor you wouldnt be able to spend the funds.. correct me if im wrong.

You are wrong.  If you have a private key that generates a particular bitcoin address then you can spend all bitcoins that are received at that address, even if that private key isn't the same private key that was used by the intended recipient to originally generate the address.

(Note: The only way that you "could generate an address that is already done" with a different private key is if there is a flaw in ECDSA, SHA-256, or RIPEMD-160.  There are currently no known flaws that would cause this, and with nearly 6 years of use bitcoin has not ever had a recorded instance of this happening.)
legendary
Activity: 1143
Merit: 1000
I dont know which algorithm it is using, if its gpg, should be totally random and nothing could be found if its not then the flaw might be from GPG and about the entropy that each system used to generate the key.

In theory you could generate an address that is already done but then the private key would not be the same, therefor you wouldnt be able to spend the funds.. correct me if im wrong.
hero member
Activity: 900
Merit: 1014
advocate of a cryptographic attack on the globe
That's funny, I was just wondering about this myself. I'd be interested to see the results of both address distribution and public key distribution.
newbie
Activity: 14
Merit: 0
Quote
Technically, I could generate a key pair and have a totally legit address and it not be part of the blockchain (offline wallet). Unless you're saying every address possible is already part of the blockchain.

Yes. But in the very beginning of this topic the words were "... statistical analysis of the Bitcoin public keys on the blockchain ..."
Tell me what you want to achieve, if you perform analysis on the data you generated yourself?  Grin

Just a proof of concept of finding an address generation flaw with a few hundred thousand of addresses between each "flawed" generated address.

I think we've gotten pretty off-topic here, but I've learnt a lot about the blockchain and more information on bitcoin overall, so thanks a ton!
legendary
Activity: 1260
Merit: 1019
Quote
Technically, I could generate a key pair and have a totally legit address and it not be part of the blockchain (offline wallet). Unless you're saying every address possible is already part of the blockchain.

Yes. But in the very beginning of this topic the words were "... statistical analysis of the Bitcoin public keys on the blockchain ..."
Tell me what you want to achieve, if you perform analysis on the data you generated yourself?  Grin
newbie
Activity: 14
Merit: 0
UPD: there is no such thing as "personally generated addresses"
Address is either in blockchain or not exists  Grin

By personally generated addresses, I meant addresses that $username generated. The ones where he has the private key / public key pair. Sorry for not being very clear about that. I suppose it'd be a bit silly for him to include addresses that he already has the private key of.

Technically, I could generate a key pair and have a totally legit address and it not be part of the blockchain (offline wallet). Unless you're saying every address possible is already part of the blockchain.
Pages:
Jump to: