There have been quite a lot of threads lately with people complaining about the size of the block chain, specifically (1) how long it takes to download for new users and (2) the amount of disk space used, often combined with complaints that the "core dev team" isn't doing anything about it.
This is just a quick note to explain where we're up to with this.
- Starting from a few days ago, MultiBit is the default recommended desktop client on the bitcoin.org choose your wallet page. MultiBit is a what we call an "SPV wallet" so is capable of processing thousands of blocks per second, and its checkpoints are refreshed frequently enough that for brand new users, they will usually be synced with the chain in 5 seconds or less. I'll explain a bit more about how this works in a moment, as we have many newbies join us in recent months who may not be familiar with the details.
- At some point Bitcoin-Qt will change such that it's able to delete old blocks. The details are still being worked out, but most likely you'll be able to say "Use up to 10 GB of disk space" and it will never use more than that. Nodes will broadcast how much of the chain they have and are able to serve. New nodes that are starting from scratch will have to search out other nodes that still have the full chain and sync from them, but any node that just wasn't online for a while and needs to grab the latest parts of the chain will be able to use most of the others. By controlling disk space usage, you can also indirectly control bandwidth usage (you can't upload data you don't have).
The latter piece of work isn't done yet, basically because Pieter has been busy lately with other things (like: real life). He did the bulk of the work already last year, but some parts still need to be designed and written. Remember that nearly everyone taking part is still a volunteer except for Gavin.
Intro to SPV modeOK, now that we're recommending MultiBit as the default wallet app for new users, what does this do? MultiBit is like the Android "Bitcoin Wallet" app by Andreas Schildbach. They're both based on the bitcoinj project that I run. Essentially, these clients download sub-parts of the block chain and then do a bunch of maths to verify that it all hangs together. Because it doesn't download the whole chain, an SPV wallet is light and fast. But because it does download and verify parts, an SPV wallet can talk to the regular P2P network because it doesn't really have to trust the remote server. This makes it more decentralised than something like Electrum or blockchain.info which relies on special servers.
How does this work? It's described in Satoshi's original white paper in the
"simplified payment verification" section (hence, SPV). But here's a brief description to save you opening up your copy of his paper. Each block in the Bitcoin protocol has two parts, the
header and a list of transactions. The header contains data linking the block to a place in the chain (like the hash of the previous block). A full Bitcoin node (Bitcoin-Qt/bitcoind) examines the block headers to figure out which chain of blocks has the most mining work done on it, and then verifies all the transactions in order in those blocks. The best chain determines the order the transactions are applied to the database and thus which transaction loses if there's a double spend. But (and this is crucial), the ordering is the
only thing determined by miners. All the transactions still have to make sense. Miners don't have arbitrary power, if they mined a block that just magicked money out of nowhere or included bogus transactions, full nodes would all reject it.
In SPV mode things work differently. Because they don't download the full chain, they can't verify each transaction individually or build a copy of the database. Instead they verify the headers to find the best chain, and then assume the contents of the best chain must be correct. This is usually a valid assumption, because the majority of mining power is honest. However if there was to be a 51% attack then SPV wallets might display arbitrary nonsense for as long as the attack lasts. They would get back to reality once the good chain became longer (harder) than the bogus chain again.
This leads to the question of how SPV wallets find transactions that send them money, if they don't download the whole chain. The answer is they upload to the remote Bitcoin nodes a "filter", which that node applies to each transaction in the block. If the filter matches, the transaction is sent to the SPV wallet along with a mathematical proof that it was really in the chain (we call this proof a Merkle branch, after Ralph Merkle who invented them). The wallet verifies this proof and thus knows the transaction really was accepted by the majority of miners, without having to trust the server.
Because we're talking to random computers on the internet and not a trusted third party, the filter is designed to let you control your privacy. It is not a list of your addresses, as it is with the blockchain.info wallet. It's actually what we call a Bloom filter (named after Burton Howard Bloom who invented them in 1970). You can't directly get the users addresses back out of a Bloom filter, instead you have to test each one you find in the chain against it to see if it matches. Also, the filter can be made "noisy", which means it randomly matches some other addresses as well. When the Bitcoin P2P node you're downloading from finds a match, it doesn't know if it really found one of your transactions or if it was a false positive. And because there are so many P2P nodes, it's possible to split up your list of addresses and send a subset to lots of different peers, so none get an accurate idea of what's in your wallet (bitcoinj doesn't do that today though). By adjusting your false positive rate, you can decide how much bandwidth you want to spend on garbling the other nodes picture of your wallet. If you're on a very slow or expensive link you might decide you want no noise in your filter at all, if you're on a fast wifi connection, you might be OK with downloading a megabyte or two of other peoples transactions just to obscure which ones are yours.
Using these fancy mathematical tools MultiBit and the Android wallet app give us the same nice performance that we can get from a web wallet like blockchain.info or Coinbase, but without the need for any central servers and keeping Bitcoin's P2P nature intact. SPV wallets will always be fast no matter how popular Bitcoin gets. Together with being able to delete old blocks, these are our solutions to the ever-growing size of the chain - which has been Satoshi's plan since the very first day Bitcoin was announced.
I hope it's all clearer now and everyone understands what's going on.