(note: new ideas are posted further down this thread)
What?A method for massively reducing initial blockchain download for full nodes
Why?In order to validate a transaction, you basically need to check the following:
- the scriptsigs of its inputs match the scriptpubkeys of the outputs of the "source transactions" (this is a simplified description)
- the "source transactions" are all valid
- the transaction is not a double-spend of another transaction
Maybe I forgot some checks, but the point I want to make is: the last of these checks can only be done if you have access to all transactions, so you need the whole block chain, at least starting from the block that contains the oldest "source transaction". Since there will always be some very old unspent transactions (e.g. ones for which the private key was lost, without any proof that it was really lost), in practice you'll need the entire block chain in order to be able to validate arbitrary transactions.
You can also do that last check if you only keep track of the unspent transactions: if you know all the unspent transactions, you can forget all the spent transactions: you know a transaction isn't a double-spend if all its inputs come from unspent transactions. If, at some point in time, you know all unspent transactions, you can continue to keep track of the changing set of unspent transactions. The problem is in initialization: when you first set up your node, the only way to find out which transactions are unspent, is to download the entire block chain. This also means there need to be plenty of other nodes which can give you the entire block chain.
Initial block chain download is already a big PITA now, and it is only going to be worse in the future: the block chain can only grow. One might argue that only the miners (or maybe only pool operators?) really need to validate a transaction, and that other people can generally trust a transaction when it has been included in the main branch of the block chain. However, this kind of reasoning forgets the following:
- Mining (including the transaction validation) is meant to be a massively decentralized community effort, not something that is done by some pool operator aristocracy.
- The truth about which transactions have happened is meant to be publicly readable, not to be held only in the secret books of the pool operator priesthood.
The validation elite should be so large that it becomes completely unfeasible for its members to cheat on the "non-elite" by conspiracy. So, it should always be easy for new people to enter this elite. As described above, initial block download is an essential part of this, and it should be as painless as possible. We need some way so that it is only necessary to download and store the block headers and the unspent transactions (with the partial Merkle trees of the unspent transactions, as described by Satoshi). Even full nodes should be allowed to forget spent transactions.
How not to do it?You might simply ask other (full) nodes which transactions are unspent. There are some checks you can perform on whether the answer is honest, e.g. the unspent outputs should add up to the total number of mined bitcoins, and the transactions shouldn't be double spends of each other, but there are no sufficient checks to make certain that you have the correct set of unspent transactions. Worse: if the total amount in the outputs adds up to more than the number of mined bitcoins, you have no way of knowing which transaction is wrong: you can only do that by retrieving
spent transactions, so you still need other nodes to give you the spent part of the block chain.
How to do it?For every block, or maybe once every so-many blocks, make a Merkle tree containing ALL UNSPENT transactions of that moment, including the ones that were already included in the block chain. Include the root hash of that Merkle tree in a block (preferably in the block header, but otherwise in a fixed place in the block Merkle tree). The "unspent transaction" Merkle tree should be made in a deterministic way (e.g. sorted by transaction hash) so it can be validated by checking the root hash value only. Miners should only build on top of a block if they have checked that the root hash of the "unspent transaction" Merkle tree is correct. Blocks with an incorrect root hash should be rejected.
Now, a new full node doesn't need to download the entire block chain anymore. It just needs to find the longest proof-of-work chain of block headers, download the most recent(*) "unspent transactions" Merkle tree and download all transactions that happened after that point.
How to improve this idea?- If this new Merkle tree is included in every block, maybe it can completely replace the traditional way of putting transactions into blocks?
- What would be the best way to sort transactions in this new tree? Preferably, it should not be necessary to rebuild the entire tree in order to update it with new transactions.
(*) for security, take one that already has a high number of confirmations