Pages:
Author

Topic: Bitcoin client operating with a finite amount of disk space (Read 4515 times)

member
Activity: 70
Merit: 18
I'm not worried about block chain size nearly as much as log file size.  Is there some way to make bitcoin restrain it's log file to a certain size, deleting older log data as it goes?  I've had to switch storage spaces several times to accommodate the log file, or delete the log file after shutting down the bitcoin client.
Just found that there is an undocumented -printtoconsole option that will attempt to write to stdout instead of to the log file.  It may or may not succeed in writing to stdout but it seems it does suppress appending to the log file.
legendary
Activity: 1428
Merit: 1093
Core Armory Developer
What is even in the log file?  I never knew that it existed (and was a concern) until this thread...

I am much more concerned about the blockchain file, because that's a critical part of the protocol.  Presumably, the log file can be trimmed... but the blk0001.dat cannot.  And the more successful bitcoin becomes, the more that filesize is going to spiral out of control.

I guess there's no options for the miners, they're going to have to hold the whole file no matter what.  But for the users, a reduced set will become necessary.  Maybe not just yet, but eventually.  Once the BTC network starts processing 1000 transactions per block, the blockchain is going to grow about 10-30 MB per day... or possibly 10 GB per year.  It can still be handled by the miners, but the average user isn't going to want to hold that much data just to use the program.
member
Activity: 92
Merit: 10
I'm not worried about block chain size nearly as much as log file size.  Is there some way to make bitcoin restrain it's log file to a certain size, deleting older log data as it goes?  I've had to switch storage spaces several times to accommodate the log file, or delete the log file after shutting down the bitcoin client.
legendary
Activity: 1428
Merit: 1093
Core Armory Developer
I'm saying the latter -- I don't see why the client ever needs to distinguish between those two, it only matters whether the transaction is valid. 

The only difference might be that a 0-confirmation transaction is a bit less trustworthy on a light node, because it doesn't have the ability to verify the transaction itself.  It only knows for sure when it sees that transaction in a block, or can guess with high confidence that it wouldn't have received that Tx data unless it was valid, since invalid Tx's don't get very far in the network.

I'm working on some block-chain analysis tools right now, and playing/testing with the 374 MB of data up to block 136496.  I'm not quite there yet, but I share your intrigue and might take a shot at that calculation in the next few days.
member
Activity: 70
Merit: 18
Quote
I was thinking about how to remove transactions when all the outputs have been used, but it seems to me that at least the transaction hash must be kept, because without it, it's impossible to tell the difference between an orphan transaction and a double-spend.  Although since neither are fully valid, it might make sense to discard them both, and if the orphan later becomes a non-orphan, hope that it will be retransmitted.

I don't see why you need to keep that transaction.  The entire state of the network (and everyone's balances) can be determined solely by the set of unused TxOut objects and the hash/index of their parent Tx object.  And that's all the information you need to sign new transactions.

I agree everything that needs to be known about current balances exists in unspent outputs, and I agree you can authenticate valid transactions without keeping the spent outputs or even the hashes of the spent Tx.

Are you saying it's possible to distinguish between a double-spend and an orphan?  Or that you don't need to distinguish between them?  If it's the latter, I would agree for space-constrained nodes you can just discard an orphan/double-spend without worrying which case it happens to be.  Maybe you could even give it the benefit of the doubt and hang on to it until you see the next block or two before you toss it out.

I'd be curious to know quantitatively what fraction of the space is taken by transactions whose outputs have all been used.
legendary
Activity: 1428
Merit: 1093
Core Armory Developer
Quote
I was thinking about how to remove transactions when all the outputs have been used, but it seems to me that at least the transaction hash must be kept, because without it, it's impossible to tell the difference between an orphan transaction and a double-spend.  Although since neither are fully valid, it might make sense to discard them both, and if the orphan later becomes a non-orphan, hope that it will be retransmitted.

I don't see why you need to keep that transaction.  The entire state of the network (and everyone's balances) can be determined solely by the set of unused TxOut objects and the hash/index of their parent Tx object.  And that's all the information you need to sign new transactions.  All the TxIns and previous TxOuts are only necessary for blockchain verification, but that is done by the miners before they include them in a block.  Your client can get away with storing just the TxOut information above, and trust that they are valid because they were part of the longest blockchain (which is difficult to fake), and would not be be there if they weren't valid.

You can store all the Tx's in a tree data structure, whose values are arrays of TxOut objects.  As TxOut's are spent, you can remove them from the array, saving about 40 bytes for each of them.  When the last TxOut in the array is spend, you can also remove the Tx node, which saves another 40 bytes (approx).  You would only need to keep that data (and/or its hash) if you were concerned about verifying the transaction history.
legendary
Activity: 1008
Merit: 1001
Let the chips fall where they may.
So there is: 4.2GB. (Gedit tried to load the whole thing..)

Code:
ubuntu@ubuntu:/media/803819A438199A6C/bitcoins$ tail debug.log
StopNode()
Running BitcoinMiner with 2 transactions in block
ThreadBitcoinMiner exiting, 0 threads remaining
DBFlush(true)
blkindex.dat refcount=0
blkindex.dat flush
wallet.dat refcount=0
wallet.dat flush
Bitcoin exiting


I suppose that is what I get for running the "beta" version: It is saving a lot of debugging information. Log rotation would probably help, but I doubt it is a priority if only used for debugging.
kjj
legendary
Activity: 1302
Merit: 1026
the blockchain right now is not 600mb, its more like 400, excluding the index files. and that can be compressed to at least 80% of the original size. and. 16 gigs should be good for a linux install and 2 more years worth of blockchain worst case.

It is my understanding that the nodes save multiple copies of the block-chain in case of a split or one of the block-chains becomes the "longest" one. I have had a test-node running since June 9, 2011 (0.2.22 and 0.2.23) for a total of 55 days. It ran out of disk space today; consuming 5.8 GB. That works out to 105MB per day. Disk usage dropped to 4.9GB when the client exited. The client had 125 connections during peak times.

No, it doesn't save multiple copies.  When there is a fork, it keeps both blocks, but it doesn't need to make a copy of the rest of the chain to do it.

Check to see if you have a debug.log.  If I don't clear mine often it gets huge.  Currently around 1.5 GB.
legendary
Activity: 1008
Merit: 1001
Let the chips fall where they may.
I was using the "real" network with the official client in -gen mode.
sr. member
Activity: 350
Merit: 251
It is my understanding that the nodes save multiple copies of the block-chain in case of a split or one of the block-chains becomes the "longest" one. I have had a test-node running since June 9, 2011 (0.2.22 and 0.2.23) for a total of 55 days. It ran out of disk space today; consuming 5.8 GB. That works out to 105MB per day. Disk usage dropped to 4.9GB when the client exited. The client had 125 connections during peak times.
I have no idea about the test network, but my entire %appdata%\bitcoin dir has never went over 800mb, yet, but i would assume it will be by the end of next month at most.
legendary
Activity: 1008
Merit: 1001
Let the chips fall where they may.
the blockchain right now is not 600mb, its more like 400, excluding the index files. and that can be compressed to at least 80% of the original size. and. 16 gigs should be good for a linux install and 2 more years worth of blockchain worst case.

It is my understanding that the nodes save multiple copies of the block-chain in case of a split or one of the block-chains becomes the "longest" one. I have had a test-node running since June 9, 2011 (0.2.22 and 0.2.23) for a total of 55 days. It ran out of disk space today; consuming 5.8 GB. That works out to 105MB per day. Disk usage dropped to 4.9GB when the client exited. The client had 125 connections during peak times.
member
Activity: 70
Merit: 18
I was thinking about how to remove transactions when all the outputs have been used, but it seems to me that at least the transaction hash must be kept, because without it, it's impossible to tell the difference between an orphan transaction and a double-spend.  Although since neither are fully valid, it might make sense to discard them both, and if the orphan later becomes a non-orphan, hope that it will be retransmitted.

I'm not sure how much space savings this would provide, but I think it would be substantial.
sr. member
Activity: 322
Merit: 251
FirstBits: 168Bc
I would like to see a ewallet service that uses a finite set of keys for each user wallet and lets the user download a copy.

And/or a service that handles the block chain and protocol allowing my client to deal only with my transactions and keys such as Webcoin and BitcoinJS are attempting:

hero member
Activity: 576
Merit: 514
the blockchain right now is not 600mb, its more like 400, excluding the index files. and that can be compressed to at least 80% of the original size. and. 16 gigs should be good for a linux install and 2 more years worth of blockchain worst case.

so far bitcoin has been out for around 30 months and has only just reached about 400mb, 500 if you count indexes. thats 2.5 years.

My current bitcoin directory has 705MB right now.
A backup from the 18th has 612MB.
Another from the 14th has 592MB.

So, from 14th->18th, disk space increased by 20MB; that's 4MB/day.
From 18th->27th, another 93MB were saved, which means 9.3MB/day.

Of course it's easy to say "quit whining, diskspace is cheap", but when Bitcoin wants to enter the mobile/smartphone market, diskspace and traffic does matter.
Also, if you are an optimist and hope that Bitcoin will catch on and grow quickly, then the number of transactions will increase which in turn needs even more diskspace.
So, the faster Bitcoin grows, the bigger the storage requirements. This means the initial download/verify time will also increase and most likely anger new users.
Last but not least, when reaching a certain amount of transactions per second, the everyday John Doe simply won't have the bandwith to deal with the blockchain growth.
hero member
Activity: 812
Merit: 1000
so far bitcoin has been out for around 30 months and has only just reached about 400mb, 500 if you count indexes. thats 2.5 years.
yes, but bitcoin is also still waiting for it's major break through and doesn't really have many users and shops yet.

if the first 2.5 years made 400mb, i bet the *next* 2.5 years would easily make an extra 4000mb (if not compressed or pruned).
newbie
Activity: 28
Merit: 0
so far bitcoin has been out for around 30 months and has only just reached about 400mb, 500 if you count indexes. thats 2.5 years.
yes, but bitcoin is also still waiting for it's major break through and doesn't really have many users and shops yet.
kjj
legendary
Activity: 1302
Merit: 1026
The block chain is not very compressible, since most of it is hashes.  I got 21% space savings with gzip and 22% savings with bzip2.  (436690683 vs. 345345519 vs. 341013976 bytes)

The real magic comes in when you realize that you can prune old transactions.  Since any transaction in the chain can be the input for at most one new transaction, you can delete any transaction that was spent more than X blocks ago, with no ill effects.  Someone wrote a tool for that, and if I recall, he reported that something like 70% of the chain can be pruned already.

All non-miner clients need are block headers and access to a trusted node that has the transactions cached.

Ah, no. All a non miner client needs is a prototcol talking to atrusted server.

No need to store anything except addresses and private keys.

You can send signed transactions to the server and get balacnes and new transfers from the server.

Stick it THIN - so you dont need to sync anything. Laptop open, check, finished.

The trusted server part is currently difficult, so I expect a medium-weight client to pop up and be useful sooner than a fully stripped lightweight client.

That is, unless one of the 3 or more people/groups working on hardware wallets makes some big progress before a serious smartphone developer gets the itch to code up a medium client.
full member
Activity: 140
Merit: 100
All non-miner clients need are block headers and access to a trusted node that has the transactions cached.

Ah, no. All a non miner client needs is a prototcol talking to atrusted server.

No need to store anything except addresses and private keys.

You can send signed transactions to the server and get balacnes and new transfers from the server.

Stick it THIN - so you dont need to sync anything. Laptop open, check, finished.
sr. member
Activity: 350
Merit: 251
Can you say more about compressing the data?  How could this be accomplished in a transparent way?

I understand that there is a maximum number of transactions that can be included in a single block.  Do we know how much disk space a maxed-out block like this will consume?

You state that 16 gigs should should be good for 2 years worth of block chain data.  Is this a wild-ass guess or based on some reasonable assumptions?  If the estimate is based on some assumptions, would you please share the data you used in your calculations?

so far bitcoin has been out for around 30 months and has only just reached about 400mb, 500 if you count indexes. thats 2.5 years.

the past 100,000 blocks have an average size of 4244b
the past 50000 7900b
25000 13558b
10000 22871b
5000 24994b
1000 23705b
500 23170b

you can see that it doubles very roughly every 40000-60000 blocks. but this figure could very easily not work depending on bitcoin growth or death. so the average a year from now would be 40000b, so lets just assume from now on the size per block is 40000b,
40000*6*24*365
now double 40000
80000*6*24*365
add
get 6,307,200,000
i don't know if these figures are bits or bytes, but ill assume bytes.
5.87 gigabytes, assuming worst case scenarios. this means i used numbers that would exist at the end of the, at the beginning. so mathematically the numbers can not be higher than this.

again these numbers are probably wrong because of human behavior, my self and others, but it also seems to look like Moore's law a bit, exept the numbers are doubling sooner than every 18 months.

i got the data from block explorer btw
Pages:
Jump to: