This idea has been scattered throughout some other threads, but there is no one place that fully explains the idea
with pictures. I believe this relieves two major problems with the network at once -- compression/pruning, and lightweight node security -- and does so in a non-disruptive way. I am not positive that this is
the right way to go, but it definitely warrants discussion.
Summary: [SEE ILLUSTRATIONS BELOW] Use a special tree data structure to organize all unspent-TxOuts on the network, and use the root of this tree to communicate its "signature" between nodes. The leaves of this tree actually correspond to addresses/scripts, and the data at the leaf is actually a root of the unspent-TxOut list for that address/script. To maintain security of the tree signatures, it will be included in the header of an alternate blockchain, which will be secured by merged mining.
This provides the same compression as the simpler unspent-TxOut merkle tree,
but also gives nodes a way to download just the unspent-TxOut list for each address in their wallet, and verify that list directly against the blockheaders. Therefore, even lightweight nodes can get full address information,
from any untrusted peer, and with only a tiny amount of downloaded data (a few kB).
(NOTE: I have illustrated everything as using straight merkle-trees, but as noted in the downsides/uncertainties section: a variant of the merkle-tree will have to be to used that guarantees efficient updating of the tree.)
(1) Major Benefits:(1a) Near-optimal blockchain compression: theoretically, the size of the pruned blockchain would be proportional to the transaction volume (thus could go up or down), instead of the entire global history which always increases in size. In practice, it wouldn't be so clean, but you really won't do any better than this. Whoops! Before this idea was fully developed, I had overlooked the fact that full nodes will still have to maintain the transaction-indexed database. This address-indexed DB is not a replacement, but would have to be in addition to the that. Therefore, it necessarily increases the amount of work and data storage of a full node. But it can simply be an "add-on" to an existing "ultraprune" implementation. (Either way, this should actually be a downside).- (1b) Trustless lightweight-node support: New nodes entering the network for the first time, will only have to download a tiny amount of data to get full, verifiable knowledge of their balance and how to spend it (much of which can be stored between loads). A single honest peer out of thousands guarantees you get, and recognize, good data.
- (1c) Perfectly non-disruptive: There is no main-network protocol or blockchain changes at all. All the balance-tree information is maintained and verified in a separate blockchain through merged mining. In fact, it's so non-disruptive, it could be implemented without any core-dev support at all (though I/we would like their involvement)
- (1d) Efficient tree querying&updating: The full-but-pruned nodes of the network will be able to maintain this data structure efficiently. New blocks simply add or remove unspent coins from the tree, and all operations are "constant time and space" (there is an upper limit on how much time and space is required to prove inclusion of, insert, or delete a piece of data, no matter how big the network is)
- (1e) No user-setup or options: Unlike overlay networks, achieving full trust does not require finding a trusted node, or subscribing to a service. Just like the main blockchain -- you find a bunch of random peers and get the longest chain. This could be bootstrapped in a similar fashion as the main network.
(2) Downsides and Uncertainties:- (1a) See revised (1a) above
- (2a) Complexity of concept: This is not simple. It's a second blockchain, requiring merged mining -- though if it is successful and supported by the community, it could be added to the network by requiring that miners compute and include the root hash of this data structure in the coinbase script (just like with block height). This is entirely feasible, but it could be a bear to implement it.
- (2b) Uncertainties about lite-node bootstrap data: Depending on how the data is structured, there may still be a bit of a data for a lite node to download to get the full security of a full node. It will, undoubtedly, be much less than downloading the entire chain. But, there is obviously implications if this security comes at the cost of 1 MB/wallet, or 100 MB/wallet (still better than 4GB, as of this writing). UPDATE: My initial estimate based on the "Hybrid PATRICIA/Brandais Tree" (aka Reiner-Tree), is that a wallet with 100 addresses could verify its own balance with about 250 kB.
- (2c) [SEE UPDATE AT BOTTOM] Merkle-tree Alternative Needed: Vanilla merkle-trees will not work, because adding or removing single branches is likely to cause complete recomputation of the tree. But it should be possible to create an alternative with the following properties:
- Commutative computation: a node should be able to get the same answer regardless of whether the tree is computed from scratch, or is based on updating a previous tree.
- O(log(N)) updating: removing or adding a single leaf node should be able to be done in O(log(N)) time. With a vanilla merkle tree, this is true only if you remove a node and add a node to the same leaf location.
(3) Assumptions::- (3a) Need verifiable tree roots: I argue that a regular overlay network won't suffice, solely because it's too easy for malicious nodes to spread incorrect data and muck up the network. If there's enough malicious nodes in an overlay network, it could make lite nodes that depend on it unusable. I am assuming it is necessary to have a verifiable source for pruned-headers -- a separate blockchain succeeds because correctness of data is required to be accepted.
- (3b) Merged mining does what we think it does: It is a secure way to maintain a separate blockchain, leveraging existing mining power.
- (3c) Efficient sorting: Leaf nodes of the main tree will have to be sorted so that all nodes can arrive at the same answer. However, this can be done using bucket-sort in O(N) time, because the leaf nodes are hashes which should be uniformly distributed.
Alt-Chain Merkle Tree construction:-- For each address/script, collect all unspent-TxOuts
-- Compute merkle root of each TxOut tree
-- Sort roots, use as leaf nodes for a master-merkle-tree.
-- Include merkle-root of master tree in alternate chain header.
https://dl.dropboxusercontent.com/u/1139081/BitcoinImg/ReinerAltChain/reinercompression.png
Getting your balance:-- Download headers of both chains
-- Request unspent-TxOut-hash list.
-- Compute sub-merkle root for this address
-- Request secondary-branch nodes (O(log(N))
-- Compute master root; compare to block header
-- Request the full TxOuts for each unspent-TxOut-hash above
https://dl.dropboxusercontent.com/u/1139081/BitcoinImg/ReinerAltChain/reinercompression2.png
Alternate Chain:All data is included on the alternate blockchain, which is maintained through merged mining on the main chain. This is only one extra tx per block on the main chain. That is the full extent of its impact on the main chain, and any nodes that are ignoring/unaware of the alt-chain.
https://dl.dropboxusercontent.com/u/1139081/BitcoinImg/ReinerAltChain/reinerchain.png
Yes, this is a huge undertaking. Yes, there's a lot of uncertainties. Yes, I need a new merkle tree structure.
But, this idea would kill two massive birds with one stone (kill two albatrosses with one block?)
Alright, tear it apart!
UPDATE:After lots and lots of discussion and debate,
I believe that the address index should be maintained as a trie-like structure. Other's have expressed interest in a binary-search tree (BST). Either way, the structure can be adapted to have the same properties we desire of a merkle tree, but with a lot more flexibility, such as very quick insertion, deletion, querying, updating, etc. My preference is the creme-de-la-creme of tries -- a hybrid PATRICIA tree (level-compressed trie) and de-la-Braindais tree (node-compressed). It looks something like this:
https://dl.dropboxusercontent.com/u/1139081/BitcoinImg/ReinerAltChain/DataStructures_Values.pngThe structure would be indexed by TxOut script ("recipient"), and each node is recursively authenticated by the nodes below it. The uniqueness of the trie structure guarantees that there is exactly one solution for a given set of TxOuts, which also means that only the existing set of TxOuts need to be obtained in order to create the trie (the BST requires replaying
all transactions, in order, to have a well-defined internal structure). For education on trie structures, see my pretty pictures
in this post.