Author

Topic: [PAID] 0.5 BTC bounty for graph of unspent outputs (Read 1973 times)

hero member
Activity: 770
Merit: 566
fractally
I just sent you the last .05 BTC for your latest update.
legendary
Activity: 2088
Merit: 1015
New version of the csv file containing:
Block #, Total unspent, Transactions in block, New outputs in block

Code:
unspent_v2.csv (4.5 MB)
https://mega.co.nz/#!bZp01SQY!HRkpblOZmobu_UUNpnOtKS65zeIjRiiMoKMaGOEsHNE

Note: Still only goes up to block 237,270.
Note2: If you try to verify on blockchain.info by viewing a block by its # you must subtract 1, for example http://blockchain.info/block-height/237262 is block # 237263 in the .csv (blockchain.info first block is 0, .csv first block is 1)

Edit: looking back it seems you wanted total outputs not "new outputs in block" however you can simply add all the previous data points to get the total
legendary
Activity: 2088
Merit: 1015
PAID.   I was unable to open the .xls in keynote thus couldn't copy your address.     
Thanks, and as I said I plan to update again upon downloading more of the blockchain, and add the total outputs portion as well
hero member
Activity: 770
Merit: 566
fractally
Could you send me your bitcoin address in a form I could copy and paste vs in your graphic?
Sure, It was in the spreadsheet Wink
1DtEUTEUBHrUSWTEDLz99Mrz2a2WjaVKeM


Also, if you could export it as a .csv because I don't have .xls and Numbers will not open your .xls file because it has too many rows.
Sure can, the code exports a csv I only opened it in excel to make the graph, however if it was due to the number of rows importing from csv will still be 237k rows anyway and likely cause the same error.

Code:
unspent.csv (3.3 MB)
https://mega.co.nz/#!ONAzgaqA!CDZ8Dlr2NIXeMTaq5Pg0REH0XYFs8Fwtinvb-aiHvS0



PAID.   I was unable to open the .xls in keynote thus couldn't copy your address.     
hero member
Activity: 770
Merit: 566
fractally
What this graph tells me is that if we were to 'compress' the bitcoin blockchain with the simple approach of only 'storing' the unspent outputs where these outputs are 50 bytes each, that it would require a 350 MB output database + index or about 500 MB.    It is also growing by about 1 million new unspent outputs per month or about 50 MB / month and accelerating.   Thus if the only thing we did was to optimize the storage of the blockchain it would likely require over 10 GB in less than 10 years time.  

I suspect that the number of 'unspent outputs' is greatly increased due to mining pools, sdice, and the perverse incentive not to combine dust.   Suppose we were to charge based upon the number of new unspent outputs instead the total transaction size?   Suppose that transactions with more inputs than outputs were 'free' and we somehow reward miners for including these transactions?    Perhaps reduction in storage size would be all of the 'fee' required?  

Now suppose that funds that are not spent for a few years start to be 'charged' storage fees?       Why should the entire network incur an ongoing 'cost' to store these outputs forever even when people have lost their private keys or generated 'dust'?  

I suspect that with a few incentive changes we could reduce the number of unspent outputs by 50% or more.  

The other thing we can conclude from this is that the total number of bitcoin users is WELL UNDER 1 million if we assume that the average wallet contains a mere 10 unspent outputs.  

What I also conclude from this is that if every user only had a hand full of accounts (checking,savings, business, etc) on the same order of magnitude that they currently manage in the banking system then the network 'at maturity' would have 10 * 10,000,000,000 accounts * 50 bytes  and require about 5 TB just to store the 'unspent' outputs.    If bitcoin doesn't do something it will hit that number long before 'maturity' and thus long before technology will enable the average computer to store it and access it in reasonable times.  

The primary argument for lots of addresses per account is the theory that it increases privacy.    The reality is that most of that privacy is an illusion and that some other solution (zero coin, open transactions) should be used instead.      


This also tells me that even going so far as to 'distribute' the outputs via a hash-table and then 'prove' the outputs are in the chain via a merkel tree wouldn't actually save any space.  Merkel trees would require at least 28 bytes just for the hash and the 'merkel tree' of the full set of outputs would be the same size as the outputs. 
hero member
Activity: 770
Merit: 566
fractally
bitspill:  I will throw in an extra 0.05BTC to rerun your script and include the 'total outputs' in addition to 'unspent' outputs.   
Something like http://blockchain.info/charts/n-transactions-total but by block rather than by date?

By block and not transactions but total outputs... ie, maintain a running total without subtracting the spent outputs. 
hero member
Activity: 770
Merit: 566
fractally
What this graph tells me is that if we were to 'compress' the bitcoin blockchain with the simple approach of only 'storing' the unspent outputs where these outputs are 50 bytes each, that it would require a 350 MB output database + index or about 500 MB.    It is also growing by about 1 million new unspent outputs per month or about 50 MB / month and accelerating.   Thus if the only thing we did was to optimize the storage of the blockchain it would likely require over 10 GB in less than 10 years time.  

I suspect that the number of 'unspent outputs' is greatly increased due to mining pools, sdice, and the perverse incentive not to combine dust.   Suppose we were to charge based upon the number of new unspent outputs instead the total transaction size?   Suppose that transactions with more inputs than outputs were 'free' and we somehow reward miners for including these transactions?    Perhaps reduction in storage size would be all of the 'fee' required?  

Now suppose that funds that are not spent for a few years start to be 'charged' storage fees?       Why should the entire network incur an ongoing 'cost' to store these outputs forever even when people have lost their private keys or generated 'dust'?  

I suspect that with a few incentive changes we could reduce the number of unspent outputs by 50% or more.  

The other thing we can conclude from this is that the total number of bitcoin users is WELL UNDER 1 million if we assume that the average wallet contains a mere 10 unspent outputs.  

What I also conclude from this is that if every user only had a hand full of accounts (checking,savings, business, etc) on the same order of magnitude that they currently manage in the banking system then the network 'at maturity' would have 10 * 10,000,000,000 accounts * 50 bytes  and require about 5 TB just to store the 'unspent' outputs.    If bitcoin doesn't do something it will hit that number long before 'maturity' and thus long before technology will enable the average computer to store it and access it in reasonable times.  

The primary argument for lots of addresses per account is the theory that it increases privacy.    The reality is that most of that privacy is an illusion and that some other solution (zero coin, open transactions) should be used instead.      


legendary
Activity: 2088
Merit: 1015
bitspill:  I will throw in an extra 0.05BTC to rerun your script and include the 'total outputs' in addition to 'unspent' outputs.   
Something like http://blockchain.info/charts/n-transactions-total but by block rather than by date?
legendary
Activity: 2088
Merit: 1015
Could you send me your bitcoin address in a form I could copy and paste vs in your graphic?
Sure, It was in the spreadsheet Wink
1DtEUTEUBHrUSWTEDLz99Mrz2a2WjaVKeM


Also, if you could export it as a .csv because I don't have .xls and Numbers will not open your .xls file because it has too many rows.
Sure can, the code exports a csv I only opened it in excel to make the graph, however if it was due to the number of rows importing from csv will still be 237k rows anyway and likely cause the same error.

Code:
unspent.csv (3.3 MB)
https://mega.co.nz/#!ONAzgaqA!CDZ8Dlr2NIXeMTaq5Pg0REH0XYFs8Fwtinvb-aiHvS0

hero member
Activity: 770
Merit: 566
fractally
Also, if you could export it as a .csv because I don't have .xls and Numbers will not open your .xls file because it has too many rows.

hero member
Activity: 770
Merit: 566
fractally
Spreadsheet is linked below and one thing to note the data I have is only up through block 237,270 as I do not currently have the entire blockchain downloaded (slow internet sucks) I have bitcoin-qt downloading more and plan to update when the entire chain is downloaded

unspent.xlsx (5.2 MB)
https://mega.co.nz/#!aMBACaba!N402Lnf1DyK-ZCxIAEAkUjcL6WFkJrRSqChpI0L7N0c

Edit: It seems the forum bbcode is hating on that link with the ! in it so you will need to copy-pasta

Could you send me your bitcoin address in a form I could copy and paste vs in your graphic?

legendary
Activity: 1176
Merit: 1280
May Bitcoin be touched by his Noodly Appendage
Ah yeah I forgot about the bounty, sorry
hero member
Activity: 770
Merit: 566
fractally
He could post it because I already gave him credit for the bounty (conditionally) and thus no one could steal it.

bitspill:  I will throw in an extra 0.05BTC to rerun your script and include the 'total outputs' in addition to 'unspent' outputs.   
legendary
Activity: 1176
Merit: 1280
May Bitcoin be touched by his Noodly Appendage
Textbox intentionally placed over portion of graph until complete so we don't have anyone trying to crop out the graph and claim it Wink
You made the xls file public anyway, so I don't get the point
legendary
Activity: 2088
Merit: 1015
Spreadsheet is linked below and one thing to note the data I have is only up through block 237,270 as I do not currently have the entire blockchain downloaded (slow internet sucks) I have bitcoin-qt downloading more and plan to update when the entire chain is downloaded

unspent.xlsx (5.2 MB)
https://mega.co.nz/#!aMBACaba!N402Lnf1DyK-ZCxIAEAkUjcL6WFkJrRSqChpI0L7N0c

Edit: It seems the forum bbcode is hating on that link with the ! in it so you will need to copy-pasta
legendary
Activity: 2088
Merit: 1015
The data was obtained by looping over every block and adding the transactions to a list after removing its inputs from the list, then after each block I save how many transactions are in the list.

More specifically it is a modified version of mb300sd's code https://github.com/mb300sd/Bitcoin-Tool/blob/master/Bitcoin%20Tool/Apps/ComputeUnspentTxOuts.cs

 In the loop iterating the blocks I save the current count to an array and print that array to a .csv file at the end.

Edit: Forgot to mention I will send the excel file when I get back to my computer later tonight.
hero member
Activity: 770
Merit: 566
fractally
That graph looks interesting, could you tell me how you calculated the numbers (only so that I can be sure these are the numbers I was looking for).

Right now I will assume you have won the bounty provided your methodology is accurate and you provide me that XLS sheet.   So don't worry about anyone else trying to copy it.
legendary
Activity: 2088
Merit: 1015
Are you looking for something along the lines of this?

(click for full size)


Textbox intentionally placed over portion of graph until complete so we don't have anyone trying to crop out the graph and claim it Wink
sr. member
Activity: 333
Merit: 252
the reference client does a nice job combining small outputs as inputs to new transactions.
That is, as far as I understand, it makes some effort to minimize the utxo set and the fee the user has to pay at the same time
hero member
Activity: 770
Merit: 566
fractally
I would like to see a graph (and data table) on the total number of 'unspent outputs' over time, perhaps once per block.

It seems like every time there is a transaction with change, there is at least 1 input to 2 outputs, but how often does the number of inputs 're-combine' the outputs?  How much would the blockchain compress if we only had to store the unspent outputs?

I would guess that the number of unspent outputs will grow proportional to the user base.  


Jump to: