I noticed last time I started a new client-install from scratch that it took me more than a full day to get up to the current block. I have also observed over the last few weeks, that the daily "Bitcoins Sent" on bitcoinwatch.com is consistently >1million. It's tough to believe that the BTC community is so active, that more than 20% of the global BTC changes hands every day. I suspect there are some people that recognize they can add a ton of bloat to the network by self-juggling transactions between their own addresses. My understanding is, if they make sure all transactions only have 1 input and 1 output, they won't have to pay a transaction fee. Is this correct? Regardless, it's pretty inconvenient to acquire the entire block-chain from the BTC network. I want to make sure I understand this, because I want to build a client that gracefully avoids being bloated like this:
- (1) Right now, there's only about 11 MB worth of block headers (135,000 blocks x 80 bytes/block), which should take a very short amount of time to download, and mere seconds to verify hash-integrity of the entire blockchain [headers]. However, the global transaction list is considerably larger, and currently downloaded in its entirety by the official BTC client, all the time.
- (2) The block headers give no information about what transactions were included in the block, only the Merkle Tree root (a hash) of the transaction list for that block. Then, if the client wants to know and verify a BTC address balance (without anyone elses' help), he has to download the transaction list for every block, at least back to the original coinbase transactions of all the coins contained in the address.
- (3) You don't need the entire block chain to send/receive BTC, you only need the block headers. A client can create transaction messages and sign them without transaction lists. It couldn't verify whether the transaction was valid, but the transaction will be rejected by the network anyway, if it's not valid.
- (4) Only miners would need the entire block chain, so that they can successfully, and quickly, verify the transactions they are trying to include in their blocks.
- (5) A client that receives only the headers (assuming it's the right chain), doesn't have to trust the other nodes around him so much when requesting data. For instance, if he requests and receives the transaction list for only block X, he can quickly construct and verify the merkle tree against the block X header. He can't verify the entire history, but presumably, if he has the longest/correct blockchain, those transactions must've been valid to be included in a block.
Are (1)-(4) correct assumptions? Perhaps the transaction validity assumption in (5) is weak, since he could've been fed fake tx lists and headers by a dishonest node trying to trick him, knowing he won't follow the transaction history. Though, any dishonest node would have to have a ton of power to even produce a single bogus block, and surely, the client would get word of the longer/correct blockchain within seconds or minutes.
So first of all, is it possible to package up the transaction lists into something like bit-torrent file and distribute them as one giant chunk of compressed data? The first 134,000 blocks aren't going to change, so why require the BTC nodes/network to fill all the block-data requests? Bit-torrent is designed to handle this. And if I'm not mistaken, someone acquiring 1 GB of block data should be able to download that in like 10 minutes, and verify the entire set in less than 10 minutes on a modern computer. This would require bit-torrent protocol to be included in the BTC client, but seems it would be worth it. Package up every 5000 blocks into a new btc_Blk0_to_Blk1.torrent file and let the block-chain be distributed that way. The BTC network would only have to handle requests for the most-recent blocks.
Second, would it be possible to implement a new kind of message on the BTC network that allows lightweight-clients to leverage heavyweight clients to provide only the relevant transaction history, if they don't want the whole block chain? The lightweight client knows that address X is mine, and wants to verify the balance and integrity of the address. So he sends out this special request for address X, and a client on the network that has the full chain can send the a list of block numbers/hashes that are relevant to that address. Then, the lightweight client only needs to request that list of full-blocks from the network. Again, since he has the "correct" blockchain headers, there should be no problem trusting arbitrary nodes to give him the right block listing. The client can continue to download every new block as it is broadcast, but will discard it if it's not relevant to himself.
I want to extend this discussion, but this message is long enough already! Maybe I'll stop there until I know my assumptions are correct.
-Eto