Pages:
Author

Topic: Ultimate blockchain compression w/ trust-free lite nodes - page 22. (Read 87937 times)

vip
Activity: 1386
Merit: 1140
The Casascius 1oz 10BTC Silver Round (w/ Gold B)
Question: as users download this alt-chain, do they always download the alt-chain from its genesis block (imagine namecoin), or do they always only download a certain number of blocks back, after which point, all earlier blocks are discarded?  (imagine p2pool).

I would have to imagine it was the second.

If it was the first, the value of the alt-chain would tend to zero over time: it would only be useful for pruning spent transactions that existed at the time the alt-chain was created, as all the blocks representing diffs to the alt chain would be the same as (in proportion) to the diffs on the main chain.  That is, at least the way I understood it.

If it was the second, I would imagine that it would be more sensible to create a superblock (say, every 1000 blocks) that publishes a brand new tree, and then all non-superblocks that follow (up to the next superblock) would be updates to that tree.  Anyone downloading the alt-chain would never need more than 1000 blocks: one superblock and all the incremental blocks since the superblock was created.  Anything older than the superblock would be simply unnecessary.
legendary
Activity: 1428
Merit: 1093
Core Armory Developer
There are two kinds of orderings here. First is the order in which updates are made to the Merkle trees. Each block is ordered, and within a block transactions are ordered, and within a transaction the txIns and txOuts are ordered. To update your Merkle trees as the result of a bitcoin transaction, you remove all the inputs, and insert all the new outputs. Everyone should be able to do this in the same order, right?


Now, within a Merkle tree, the order is determined by a key. Think of each Merkle tree as a database index. We would like to have at least two kinds of indexes, which means maintaining two instances of the Merkle tree:

Consider 10 unspent-TxOuts, {0,1,2,3,4,5,6,7,8,9}.

You and I are both constructing the same binary/red-back tree, except that you are starting from scratch and I am starting from a tree where elements {0,1,2,5,6,9,13} are part of the tree.  

I have to add {3,4,7,8} and remove {13} from my tree.   You just add all 10 elements in a specified order.  

We're going to get different roots.  I technically know how your tree would've been constructed, but I might have to start over and reinsert everything from scratch if I wanted to make sure my tree structure matches yours.  

That's what I mean about "commutative computation" -- we need to make sure that regardless of whether you're starting from scratch, or updating an existing tree from any point in the past, that you'll get the same answer.  

As I said before, a trie would work, but is generally very space inefficient, and that matters here.
hero member
Activity: 868
Merit: 1000
I agree with others, this is a hot issue. And as much as I dislike gambles, I'm thankful Satoshidice put pressure on the blockchain like this.
legendary
Activity: 2198
Merit: 1311
No clue about any of this stuff, but thank you guys for working on this.  I think it's important to have this sorted out before adoption and use gets really heavy.
full member
Activity: 126
Merit: 110
Andrew Miller
There are two kinds of orderings here. First is the order in which updates are made to the Merkle trees. Each block is ordered, and within a block transactions are ordered, and within a transaction the txIns and txOuts are ordered. To update your Merkle trees as the result of a bitcoin transaction, you remove all the inputs, and insert all the new outputs. Everyone should be able to do this in the same order, right?


Now, within a Merkle tree, the order is determined by a key. Think of each Merkle tree as a database index. We would like to have at least two kinds of indexes, which means maintaining two instances of the Merkle tree:

1) TxOuts are identified by the transaction hash and an index within that transaction. So we need to search by (txhash,idx) in order to see if an output has been spent. When outputs are inserted into this tree, they're stored in sorted order according to (txhash,idx).

2) It's also desirable to find all the available txouts for a particular address. Let a second Merkle tree contain keys of the form (scriptpubkey). Now, given a root hash, you can ask for a verifiable list of all your spendable coins.


Alternately, instead of thinking of it as a different tree for each index, you can think of it as a composite structure. The general form is a Search DAG (Directed Acyclic Graph), but the idea is exactly the same [5]. (This includes B-Trees).



[5] A General Model for Authenticated Data Structures
     Martel, Nuckolls, Devanbu, Gertz, Kwong, Stubblebine, 2004.
     http://truthsayer.cs.ucdavis.edu/algorithmica.pdf
legendary
Activity: 2128
Merit: 1073
(2c) Merkle-tree Alternative Needed
I think this is the crucial observation. Bitcoin doesn't really have a single contiguous bitstream that would need a protection of a hash tree. It has a collection of transactions that are only weakly ordered in a lattice fashion.

The better data structure would be probably some classic database representation like B-tree with cryptographically signed transaction log. To allow integrity verification with a trucated log the log blocks should contain something like a hash of inorder traversal of the database content before the update. This will allow for a quick verification of the new log blocks received from the network.

To allow backward compatibility with forward-delta block-chain the Bitcoin protocol would need additional constraint on the ordering of the transactions in the blocks.
legendary
Activity: 1428
Merit: 1093
Core Armory Developer
full member
Activity: 126
Merit: 110
Andrew Miller
Let me try to explain a solution to the 'alternate Merkle tree' you require.

The basic idea is to use a balanced binary search tree, such as a red-black tree or a 2-3 tree. Updating such a data structure, including rebalancing, only requires accessing O(log N) tree nodes. A lite client would be able to verify such an update by receiving just the relevant nodes. There is never a need to recompute the entire tree from scratch. Balancing is strict, in that the worst-case length from the root to a leaf never exceeds O(log N).

There's been a bunch of academic research on this topic, where it's known as "Authenticated Data Structures" [1,2,3]. Although the idea has been around for almost 15 years, I don't know of a single implementation. So I made a toy implementation available at https://github.com/amiller/redblackmerkle

I'm hoping to spread awareness of this technique, since it's pretty clear that some clever use of Merkle trees is going to be important in Bitcoin's future. Let's discuss this!


P.S. Most of these structures only require collision resistant hash functions. However, if you want to use a fancy hash functions with special (homomorphic) properties, you can make even more efficient structures, e.g. a Merkle 'bloom filter' [4].
 

[1] Certificate Revocation and Certificate Update
     Noar and Nissim, 1998. USENIX
     https://www.usenix.net/publications/library/proceedings/sec98/full_papers/nissim/nissim.pdf

[2] Authenticated Data Structures
     Roberto Tamassia, 2003.
     http://cs.brown.edu/research/pubs/pdfs/2003/Tamassia-2003-ADS.pdf

[3] Persistent Authenticated Data Structures and their applications
     Anagnostopoulos, Goodrich, and Tamassia
     http://cs.brown.edu/people/aris/pubs/pad.pdf

[4] Cryptography for efficiency: Authenticated Data Structures Based on Lattices and Parallel Online Memory Checking
     Papamanthao and Tamassia, 2011
     http://www.cse.msstate.edu/~ramkumar/gw-102.pdf
legendary
Activity: 1428
Merit: 1093
Core Armory Developer
Before I read this I just want to quickly post that I personally, no matter whether justifiably or unjustifiably, I personally feel like this is the most pressing issue when it comes to Bitcoin's successful future and I really hope the core team has an order of priorities planed accordingly.

I too believe this is a critical issue for Bitcoin, as a whole.  I had floated the idea that handling blockchain size was critical, in the past, but other issues seemed more pressing for the devs at the time -- I didn't have a solid idea to promote, and the blockchain size wasn't so out of hand yet.

One nice benefit of this solution is that because it's an alt-chain, technically no core devs have to be on-board.  It can be done completely indepedently and operate completely non-disruptively, even with only the support of other devs who believe in it.  I'd certainly like to get core devs interested in it, as they are very smart people who probably have a lot of good ideas to add.  But one of the biggest upsides here is that it can be done completely indepdently.

legendary
Activity: 1078
Merit: 1003
Before I read this I just want to quickly post that I personally, no matter whether justifiably or unjustifiably, I personally feel like this is the most pressing issue when it comes to Bitcoin's successful future and I really hope the core team has planed an order of priorities accordingly.
legendary
Activity: 1428
Merit: 1093
Core Armory Developer
This idea has been scattered throughout some other threads, but there is no one place that fully explains the idea with pictures.  I believe this relieves two major problems with the network at once -- compression/pruning, and lightweight node security -- and does so in a non-disruptive way.  I am not positive that this is the right way to go, but it definitely warrants discussion.



Summary:  [SEE ILLUSTRATIONS BELOW]

Use a special tree data structure to organize all unspent-TxOuts on the network, and use the root of this tree to communicate its "signature" between nodes.  The leaves of this tree actually correspond to addresses/scripts, and the data at the leaf is actually a root of the unspent-TxOut list for that address/script.  To maintain security of the tree signatures, it will be included in the header of an alternate blockchain, which will be secured by merged mining.  

This provides the same compression as the simpler unspent-TxOut merkle tree, but also gives nodes a way to download just the unspent-TxOut list for each address in their wallet, and verify that list directly against the blockheaders.  Therefore, even lightweight nodes can get full address information, from any untrusted peer, and with only a tiny amount of downloaded data (a few kB).  

(NOTE:  I have illustrated everything as using straight merkle-trees, but as noted in the downsides/uncertainties section: a variant of the merkle-tree will have to be to used that guarantees efficient updating of the tree.)


(1) Major Benefits:
  • (1a) Near-optimal blockchain compression:  theoretically, the size of the pruned blockchain would be proportional to the transaction volume (thus could go up or down), instead of the entire global history which always increases in size.  In practice, it wouldn't be so clean, but you really won't do any better than this.  Whoops! Before this idea was fully developed, I had overlooked the fact that full nodes will still have to maintain the transaction-indexed database.  This address-indexed DB is not a replacement, but would have to be in addition to the that.  Therefore, it necessarily increases the amount of work and data storage of a full node.  But it can simply be an "add-on" to an existing "ultraprune" implementation.  (Either way, this should actually be a downside).
  • (1b) Trustless lightweight-node support:  New nodes entering the network for the first time, will only have to download a tiny amount of data to get full, verifiable knowledge of their balance and how to spend it (much of which can be stored between loads).  A single honest peer out of thousands guarantees you get, and recognize, good data.
  • (1c) Perfectly non-disruptive:  There is no main-network protocol or blockchain changes at all.  All the balance-tree information is maintained and verified in a separate blockchain through merged mining.  In fact, it's so non-disruptive, it could be implemented without any core-dev support at all (though I/we would like their involvement)
  • (1d) Efficient tree querying&updating:  The full-but-pruned nodes of the network will be able to maintain this data structure efficiently.  New blocks simply add or remove unspent coins from the tree, and all operations are "constant time and space" (there is an upper limit on how much time and space is required to prove inclusion of, insert, or delete a piece of data, no matter how big the network is)
  • (1e) No user-setup or options:  Unlike overlay networks, achieving full trust does not require finding a trusted node, or subscribing to a service.  Just like the main blockchain -- you find a bunch of random peers and get the longest chain.  This could be bootstrapped in a similar fashion as the main network.

(2) Downsides and Uncertainties:
  • (1a) See revised (1a) above
  • (2a) Complexity of concept:  This is not simple.  It's a second blockchain, requiring merged mining -- though if it is successful and supported by the community, it could be added to the network by requiring that miners compute and include the root hash of this data structure in the coinbase script (just like with block height).  This is entirely feasible, but it could be a bear to implement it.
  • (2b) Uncertainties about lite-node bootstrap data:  Depending on how the data is structured, there may still be a bit of a data for a lite node to download to get the full security of a full node.  It will, undoubtedly, be much less than downloading the entire chain.  But, there is obviously implications if this security comes at the cost of 1 MB/wallet, or 100 MB/wallet (still better than 4GB, as of this writing).  UPDATE: My initial estimate based on the "Hybrid PATRICIA/Brandais Tree" (aka Reiner-Tree), is that a wallet with 100 addresses could verify its own balance with about 250 kB.
  • (2c) [SEE UPDATE AT BOTTOM] Merkle-tree Alternative Needed: Vanilla merkle-trees will not work, because adding or removing single branches is likely to cause complete recomputation of the tree.  But it should be possible to create an alternative with the following properties:
    • Commutative computation:  a node should be able to get the same answer regardless of whether the tree is computed from scratch, or is based on updating a previous tree.
    • O(log(N)) updating: removing or adding a single leaf node should be able to be done in O(log(N)) time.  With a vanilla merkle tree, this is true only if you remove a node and add a node to the same leaf location.

(3)  Assumptions::
  • (3a) Need verifiable tree roots:  I argue that a regular overlay network won't suffice, solely because it's too easy for malicious nodes to spread incorrect data and muck up the network.  If there's enough malicious nodes in an overlay network, it could make lite nodes that depend on it unusable.  I am assuming it is necessary to have a verifiable source for pruned-headers -- a separate blockchain succeeds because correctness of data is required to be accepted.
  • (3b) Merged mining does what we think it does: It is a secure way to maintain a separate blockchain, leveraging existing mining power.  
  • (3c) Efficient sorting:  Leaf nodes of the main tree will have to be sorted so that all nodes can arrive at the same answer.  However, this can be done using bucket-sort in O(N) time, because the leaf nodes are hashes which should be uniformly distributed.



Alt-Chain Merkle Tree construction:

-- For each address/script, collect all unspent-TxOuts
-- Compute merkle root of each TxOut tree
-- Sort roots, use as leaf nodes for a master-merkle-tree.  
-- Include merkle-root of master tree in alternate chain header.


https://dl.dropboxusercontent.com/u/1139081/BitcoinImg/ReinerAltChain/reinercompression.png



Getting your balance:

-- Download headers of both chains
-- Request unspent-TxOut-hash list.  
-- Compute sub-merkle root for this address
-- Request secondary-branch nodes  (O(log(N))
-- Compute master root; compare to block header
-- Request the full TxOuts for each unspent-TxOut-hash above


https://dl.dropboxusercontent.com/u/1139081/BitcoinImg/ReinerAltChain/reinercompression2.png



Alternate Chain:
All data is included on the alternate blockchain, which is maintained through merged mining on the main chain.  This is only one extra tx per block on the main chain.  That is the full extent of its impact on the main chain, and any nodes that are ignoring/unaware of the alt-chain.


https://dl.dropboxusercontent.com/u/1139081/BitcoinImg/ReinerAltChain/reinerchain.png



Yes, this is a huge undertaking.  Yes, there's a lot of uncertainties. Yes, I need a new merkle tree structure.
But, this idea would kill two massive birds with one stone (kill two albatrosses with one block?)

Alright, tear it apart!




UPDATE:

After lots and lots of discussion and debate, I believe that the address index should be maintained as a trie-like structure.  Other's have expressed interest in a binary-search tree (BST).  Either way, the structure can be adapted to have the same properties we desire of a merkle tree, but with a lot more flexibility, such as very quick insertion, deletion, querying, updating, etc.  My preference is the creme-de-la-creme of tries -- a hybrid PATRICIA tree (level-compressed trie) and de-la-Braindais tree (node-compressed).  It looks something like this:


https://dl.dropboxusercontent.com/u/1139081/BitcoinImg/ReinerAltChain/DataStructures_Values.png

The structure would be indexed by TxOut script ("recipient"), and each node is recursively authenticated by the nodes below it.  The uniqueness of the trie structure guarantees that there is exactly one solution for a given set of TxOuts, which also means that only the existing set of TxOuts need to be obtained in order to create the trie (the BST requires replaying all transactions, in order, to have a well-defined internal structure).  For education on trie structures, see my pretty pictures in this post.
Pages:
Jump to: