Author

Topic: An idea that can massively reduce initial blockchain download (Read 3773 times)

cjp
full member
Activity: 210
Merit: 124
https://bitcointalk.org/index.php?topic=88208.0...?

There is a bounty set up to actually implement this in the Project Development forum.
That concept looks complicated.

This morning I woke up with a more simple idea: just make a bit field that indicates whether transactions are spent or not.
This is a visualization of the idea:

Blue blocks = block headers
White circles = unspent tx
Black circles = spent tx
Black/white blocks = merkle tree nodes (black = all underlying tx are spent)
Following the red line, this would encode to the following bits:
1/1011/110010/10/110100/0/1110111
(slashes indicate block boundaries, these are for reading convenience only and are not actually stored.)
Note that redundant information is skipped, in such a way that once an entire Merkle (sub-)tree is spent, it only takes a single bit of storage.

Since the block headers contain the transaction counts, and the transaction count determines the Merkle tree shape,  this bit field can be used unambiguously to reconstruct which transactions are spent and which are not. After retrieving this information, a node can request the unspent parts of the Merkle tree using the existing Bitcoin protocol.

That doesn't sound so complicated, does it? It is very much in line with the existing part of the Bitcoin protocol; the only extensions we need are:
  • There is a standardized way of including the hash of this bit field into a block
  • Nodes have a way of giving this bit field to each other
  • Miners don't build on top of a block that contains an incorrect bit field hash
member
Activity: 70
Merit: 10
I'm starting to become familiar with where I can find what in the Bitcoin source code.
Interesting attitude. But exactly so, I started to learn in many details how exactly the bitcoin-protocol works and what each detail in the block-chain means
by implementing both algorithms by my own. Smiley Because the only fully trustful reference are the bitcoin-client sources - there seem to exist no official specifications how it should work in detail (e.g. the specification in wiki-bitcoin are mainly (hand-edited) copies from the source code). :-/

smtp
cjp
full member
Activity: 210
Merit: 124
Or are you really one of these extremly rare people who indeed checked all the source code of the bitcoin-client for a back-door or have you written your own bitcoin client?

LOL I do actually make a diff of the source code for each new version I install, and give it a quick review to check whether there's nothing suspect and it corresponds with the changelog.  Grin

I haven't written my own bitcoin client yet, but I've made my own implementation of several parts of the Bitcoin protocol (e.g. a parser of raw transaction data and a base58 encoder). I'm starting to become familiar with where I can find what in the Bitcoin source code.
member
Activity: 70
Merit: 10
What?
A method for massively reducing initial blockchain download for full nodes
The most real-time consuming action is not the pure download but the signature-verification for each TX-out in a block since bitcoin-0.8.0
(I wrote my own independent blockchain-parser and did all kind of verifications & checks) -- assuming you have a reasonable internet-bandwith connection.

Therefore you could simple fetch a "verified" blockchain (including a special PK-signature) by downloaded it from a trusted website(s) and only these web-sites need to update this block-chain-file(s) increamentally for public use. Then there is no signature checking necessary from the client anymore and you have a huge speed improvement when using the blockchain data.

Well, I wrote "trusted" website (by you). You doesn't trust ANY website in this matter? Then I wonder that you are using the bitcoin-client. Or are you really one of these extremly rare people who indeed checked all the source code of the bitcoin-client for a back-door or have you written your own bitcoin client?  Cool

smtp
cjp
full member
Activity: 210
Merit: 124
https://bitcointalk.org/index.php?topic=88208.0...?

There is a bounty set up to actually implement this in the Project Development forum.

Thanks for the link. It's good to see people are already looking into this; unfortunately the ideas in that thread seem extremely complicated: while they might solve "as many problems as possible", this massively reduces the likelihood of these ideas being implemented. If you want something that works, you need to keep it very simple.

I don't know how high that bounty is, and whether it is backed by someone with a similarly high reputation, but for now I'll work on another concept that will be "very important for the future of Bitcoin", and leave this concept for others to implement.
legendary
Activity: 2618
Merit: 1007
https://bitcointalk.org/index.php?topic=88208.0...?

There is a bounty set up to actually implement this in the Project Development forum.
cjp
full member
Activity: 210
Merit: 124
(note: new ideas are posted further down this thread)

What?
A method for massively reducing initial blockchain download for full nodes

Why?
In order to validate a transaction, you basically need to check the following:
  • the scriptsigs of its inputs match the scriptpubkeys of the outputs of the "source transactions" (this is a simplified description)
  • the "source transactions" are all valid
  • the transaction is not a double-spend of another transaction
Maybe I forgot some checks, but the point I want to make is: the last of these checks can only be done if you have access to all transactions, so you need the whole block chain, at least starting from the block that contains the oldest "source transaction". Since there will always be some very old unspent transactions (e.g. ones for which the private key was lost, without any proof that it was really lost), in practice you'll need the entire block chain in order to be able to validate arbitrary transactions.

You can also do that last check if you only keep track of the unspent transactions: if you know all the unspent transactions, you can forget all the spent transactions: you know a transaction isn't a double-spend if all its inputs come from unspent transactions. If, at some point in time, you know all unspent transactions, you can continue to keep track of the changing set of unspent transactions. The problem is in initialization: when you first set up your node, the only way to find out which transactions are unspent, is to download the entire block chain. This also means there need to be plenty of other nodes which can give you the entire block chain.

Initial block chain download is already a big PITA now, and it is only going to be worse in the future: the block chain can only grow. One might argue that only the miners (or maybe only pool operators?) really need to validate a transaction, and that other people can generally trust a transaction when it has been included in the main branch of the block chain. However, this kind of reasoning forgets the following:
  • Mining (including the transaction validation) is meant to be a massively decentralized community effort, not something that is done by some pool operator aristocracy.
  • The truth about which transactions have happened is meant to be publicly readable, not to be held only in the secret books of the pool operator priesthood.
The validation elite should be so large that it becomes completely unfeasible for its members to cheat on the "non-elite" by conspiracy. So, it should always be easy for new people to enter this elite. As described above, initial block download is an essential part of this, and it should be as painless as possible. We need some way so that it is only necessary to download and store the block headers and the unspent transactions (with the partial Merkle trees of the unspent transactions, as described by Satoshi). Even full nodes should be allowed to forget spent transactions.

How not to do it?
You might simply ask other (full) nodes which transactions are unspent. There are some checks you can perform on whether the answer is honest, e.g. the unspent outputs should add up to the total number of mined bitcoins, and the transactions shouldn't be double spends of each other, but there are no sufficient checks to make certain that you have the correct set of unspent transactions. Worse: if the total amount in the outputs adds up to more than the number of mined bitcoins, you have no way of knowing which transaction is wrong: you can only do that by retrieving spent transactions, so you still need other nodes to give you the spent part of the block chain.

How to do it?
For every block, or maybe once every so-many blocks, make a Merkle tree containing ALL UNSPENT transactions of that moment, including the ones that were already included in the block chain. Include the root hash of that Merkle tree in a block (preferably in the block header, but otherwise in a fixed place in the block Merkle tree). The "unspent transaction" Merkle tree should be made in a deterministic way (e.g. sorted by transaction hash) so it can be validated by checking the root hash value only. Miners should only build on top of a block if they have checked that the root hash of the "unspent transaction" Merkle tree is correct. Blocks with an incorrect root hash should be rejected.

Now, a new full node doesn't need to download the entire block chain anymore. It just needs to find the longest proof-of-work chain of block headers, download the most recent(*) "unspent transactions" Merkle tree and download all transactions that happened after that point.

How to improve this idea?
  • If this new Merkle tree is included in every block, maybe it can completely replace the traditional way of putting transactions into blocks?
  • What would be the best way to sort transactions in this new tree? Preferably, it should not be necessary to rebuild the entire tree in order to update it with new transactions.

(*) for security, take one that already has a high number of confirmations
Jump to: