Pages:
Author

Topic: Ultimate blockchain compression w/ trust-free lite nodes - page 13. (Read 87937 times)

legendary
Activity: 1358
Merit: 1003
Ron Gross
Perhaps it's time to formulate this in the wiki?
Maybe as a BIP draft?

Or is it too soon?
legendary
Activity: 1428
Merit: 1093
Core Armory Developer
Could you describe how you would update the root hash for the trie?

In both BSTs and trie structures there will be leaf nodes and branch nodes.  In some structures (such as the BST) branch nodes may also contain leaf data.  In the context of this proposal, the master tree will be the set of all addresses containing any unspent TxOuts.  So a given leaf represents one such address.  

--For the purposes of organizing the tree/trie, that leaf's value will be the 20-byte address.  So the 20-byte address is the "key" of the master tree.
--For the purposes of computing the root hash, that leaf's value is the root of the subtree of unspent TxOuts for that address (which is constructed in a similar way to the address tree)
--For the purposes of computing the root hash, all branch nodes' hash values are the hash of the concatenation of its children's hash values.

In the simplest case, you have only pure branch nodes and pure leaf nodes.   This looks very much like the merkle trees we already use.  Then each node's "hash value" is the hash of the two children "hash values" concatenated.  [LeftPtrNodeValue || RightPtrNodeValue].  You walk up the tree computing this for every node until you get to the root.

In the BST case where nodes are actually both branch nodes and leaf nodes, then you just use hash([LeafValue | LeftPtrNodeValue | RightPtrNodeValue]).  

In the Patricia/Hybrid Tree case, there are purely branch nodes and leaf nodes, though the branch nodes may have "skip strings".  So a leaf node's hash value is just the root hash of the subtree.  And a branch node's value is the hash of the concatenated skip string and its non-null child node values.

When you add or remove a node from the tree, you are changing the hash value of its parent, which changes its parent, etc.  So for a BST, you to add or delete a node, you have to recompute/update O(logN) other nodes hash values to get the new root value.  Same thing for the trie, except there is exactly 20 parents to update, regardless of the size of the tree (and see below for why this will ultimately be only 4-6).

I really need to make a picture...  (coming soon)


It most certainly takes fewer than 40 hops to get from the root of the balanced binary tree to any leaf. Log N is less than 20.

Right, number of hops of a red-black tree will be 2*logN worst case.  So your point is well-received that N will probably never get high enough in this environment for the query/insert/delete time of a basic trie to absolutely beat the binary search tree.  Both of them have completely acceptable query/insert/delete times.  So two things to mention:
(1) It's still entirely accurate to label trie operations as O(1).  It just not be relevant for this application.
(2) I'm actually proposing a Patricia tree, which is level-compressed.  So a typical access time will be 4-6 hops.  Even with trillions of nodes in the tree.  The absolute max is 20 but it would never be realized.
full member
Activity: 126
Merit: 110
Andrew Miller
Could you describe how you would update the root hash for the trie?

Also, it most certainly takes fewer than 40 hops* to get from the root of the balanced binary tree to any leaf. Log N is less than 20.

*Assuming one hop is actually "8 hops". I don't feel like I'm making this point too clear. You can have a 'balanced B tree' that has up to 8 children at each level. Then there are only 20 hops.

Or to put it another way, if there are so many transactions that we have collisions by birthday paradox, then we would need to pick a bigger hash.
legendary
Activity: 1428
Merit: 1093
Core Armory Developer
I'm not assuming it's all in memory. I'm not talking about pointers, but about a graph of hashes.

To be sure we're on the same page - we are talking about a merkle trie with one hash at the root, right?

There does appear to be a disconnect here.  Let me explain what my understanding of your proposal is:

(1) There is a database of TxOuts.  It has all current, unspent TxOuts in it.  TxOuts are added and deleted (or marked deleted) when blocks come in.
(2) The TxOuts need to be organized into a tree of some sort in order to come up with the single "merkle root" to be included in the meta-chain header ("merkle" is in quotes, because using a binary tree or trie it's no longer a "merkle" tree).  The relationship between DB elements (1) will be represented either as a Binary Search Tree or trie using "pointers".
(3) Thus, after every block, you update the database of TxOuts (1), and the pointers in binary search tree or trie (2).  I was focusing on (2) when I say that you need to store pointers.  Not necessarily in RAM... but somewhere, data needs to be held that identifies the structure of the tree/trie associated with the TxOuts in database (1). 
(4) In a classic binary search tree held in RAM, it uses pointers -- specifically a left pointer and a right pointer for each node.  In this context, whether it's held in RAM or on disk, there's still 8 bytes needed (at minimum) per pointer to represent the relationships.  And a binary tree will use about 1 such pointer per node.

This is not a negligible amount of data.   If TxOuts are 35 bytes, pointers are 8 bytes, then a "tree" of pointers will be about 25% of the size of the entire TxOut database.  It's a rough estimate, not intended to be exact.

So now back to my original point:  it looks to me that in order to "save the state" of the tree, you need to save the entire tree of pointers between nodes (2).  And you need the full tree.  Sure the TxOuts database stores each TxOut once, but saving all the pointers to represent the past states of the tree will triple your space requirements to save just 8 blocks.


Also you keep saying O(1) when you mean O(log N). I don't think you would agree with the following statement:
    "Each transaction is associated with a unique hash. There are only 2^256 hashes, therefore the number of transactions is O(1)."

Another disconnect here.  I'm not sure where your example came from, but here's what I'm referring to -- let's compare a binary search tree full of Bitcoin addresses to a trie of Bitcoin addresses (the entire set of all addresses on the network containing unspent TxOuts).  Assume at a given time there are N such addresses.

Binary Search Tree (red-black or any balanced tree) -- depth is O(log(N)).   
   -- Querying an element is O(logN)
   -- Inserting an element is O(logN)
   -- Deleting an element is O(logN) 
Basic trie:  depth of the tree is equal to the length of the longest key:  so the depth is 20 (because addresses are 20 bytes).
   -- Querying an element is O(1) -- it takes 20 hops to get from the root node to the leaf you want
   -- Inserting an element is O(1) -- it takes 20 hops to get from the root node to the leaf you want
   -- Deleting an element is O(1) -- it takes 20 hops to get from the root node to the leaf you want

So even if there are 100 quintillion bitcoin addresses with unspent TxOuts in the master alt-chain tree, it still takes you exactly 20 hops to query, insert or delete a value in the trie (and less if you're using a patricia tree).
full member
Activity: 126
Merit: 110
Andrew Miller
I'm not assuming it's all in memory. I'm not talking about pointers, but about a graph of hashes.

To be sure we're on the same page - we are talking about a merkle trie with one hash at the root, and hashes at each level, right?

Also you keep saying O(1) when you mean O(log N). I don't think you would agree with the following statement:
    "Each transaction is associated with a unique hash. There are only 2^256 hashes, therefore the number of transactions is O(1)."

Still, I am leaning towards thinking the simplicity of the trie outweighs any marginal cost benefits of the balanced binary tree. They're of the same order in all operations.
legendary
Activity: 1428
Merit: 1093
Core Armory Developer
Perhaps one reason to prefer a balanced binary tree is that if you want to store a snapshot of every previous blockchain, you only have to store the differences. There is a "Persistent Authenticated Datastructure" that would let you handle roll backs efficiently.

[Image suppressed]

(This image is from "Persistent Authenticated Dictionaries and their Applications" by Anagnostopoulos Goodrich and Tamassia http://cs.brown.edu/people/aris/pubs/pad.pdf )

I do not know if this is possible with tries.

I'm not 100% sure I understand that picture.  It looks like you are saving the pointer-tree of each state, though there's only ever one copy of each piece of underlying data.  Multiple trees will be maintained, but leaf nodes will point to global copies of TxOut data.   Is that correct?

In that case, saving the state of the tree at a previous time means storing a whole lot of pointer data.  If I consider a basic binary search tree and use 8-byte pointers (assuming it's all in memory), then each node in your binary tree is going to store at least two pointers (prtLeft & ptrRight) and probably a few more bytes for random stuff like isLeaf and isRedOrBlack, etc.  So I assume 20 bytes per tree node.  For a tree of 2 million unspent TxOuts, then that's about 40 MB of storage for each state you want to store (each block).  And that's going to go up linearly with the size of the tree. 


(NOTE: I'm discussing this as if it's a single tree full of TxOuts, even though my proposal is about a main tree full of addresses containing unspent TxOuts, and subtrees of TxOuts.  However, the concepts are still valid, it's just easier to make my points as if it's a single tree of TxOuts).

On the other hand, if you're using a trie, you don't have to save the state of the entire trie since its structure is purely deterministic for a given set of addresses.  You only have to save the nodes that were removed that might have to be re-added later in the event of a rollback.  You look at the OutPoints of the TxIns in each tx of the block that need to be reversed, and re-add them to the trie.  Then remove the ones that were added by those transactions.  This is all standard O(1) trie operations.

Thus, the data that needs to be stored in order to rollback a block in the trie (besides the blockdata itself) is proportional to the transaction volume on the networkAnd transaction volume is capped by network rules.  Currently, each block could be up to 1MB.  If you filled a block full of transactions that were entirely TxIns with nothing else, it would result in removing about 7200 TxOuts from the tree.  That means that that absolute upper limit for storage per block you want to be able to roll back is about 250 kB, regardless of tree size.

full member
Activity: 126
Merit: 110
Andrew Miller
It's probably fodder for another topic, but I am of the opinion that in the event of a very large reorg (where more than 6 blocks are rolled back), the proper behavior of a bitcoin client should be to stop functioning and demand that a new client be downloaded from the developer(s) of that client, and to assist the user in exporting their wallet and ensuring their replacement client is properly signed by the developer.  This is based on philosophizing that it would be better for the bitcoin network to go down in an orderly fashion in the event of an attack - long enough for developers to agree on countermeasures tailored to the attack - just like the recent space rocket that changed its mind and aborted the launch at the last second - rather than for it to stay up and operate chaotically and at a financial loss to users.

Mentioning that is not intended to derail the thread - I am certain there isn't a consensus on what I just said, and it may very well be an unacceptable idea.

But I am saying it because: If it were ever to be debated and determined that what I threw out IS in fact a good idea, it would also settle the question as to how to ensure the tree can be rolled back.  The answer would be simple: at a minimum, keep a copy of the tree as it looked at the point representing the maximum amount we're willing to roll back without ceasing to function.



Perhaps one reason to prefer a balanced binary tree is that if you want to store a snapshot of every previous blockchain, you only have to store the differences. There is a "Persistent Authenticated Datastructure" that would let you handle roll backs efficiently.


(This image is from "Persistent Authenticated Dictionaries and their Applications" by Anagnostopoulos Goodrich and Tamassia http://cs.brown.edu/people/aris/pubs/pad.pdf )

I do not know if this is possible with tries.
legendary
Activity: 1428
Merit: 1093
Core Armory Developer
But I am saying it because: If it were ever to be debated and determined that what I threw out IS in fact a good idea, it would also settle the question as to how to ensure the tree can be rolled back.  The answer would be simple: at a minimum, keep a copy of the tree as it looked at the point representing the maximum amount we're willing to roll back without ceasing to function.

I think it's an interesting idea, and as you suggested, I don't want to derail the thread.  However, I think that the amount of history to maintain is not a critical question.  Standard reorgs on the main chain are rarely more than 1 block.  We save enough info for 10 blocks, and if a reorg happens further back than that (and the client is willing to continue), then the node can just request the missing information from peers.  It's all verifiable information, and if you are trying to catch up to the longest chain there should be plenty of peers who can supply it.

legendary
Activity: 1596
Merit: 1100
It's probably fodder for another topic, but I am of the opinion that in the event of a very large reorg (where more than 6 blocks are rolled back), the proper behavior of a bitcoin client should be to stop functioning and demand that a new client be downloaded from the developer(s) of that client, and to assist the user in exporting their wallet and ensuring their replacement client is properly signed by the developer.  This is based on philosophizing that it would be better for the bitcoin network to go down in an orderly fashion in the event of an attack - long enough for developers to agree on countermeasures tailored to the attack - just like the recent space rocket that changed its mind and aborted the launch at the last second - rather than for it to stay up and operate chaotically and at a financial loss to users.

During the output overflow incident, we were all rather glad that the collective client base did not do this.  Once miners upgraded, everyone reorg'd back into working order.

This is just one of many reasons why there is a 120-block new-bitcoin confirmation policy in place.

vip
Activity: 1386
Merit: 1140
The Casascius 1oz 10BTC Silver Round (w/ Gold B)
It's probably fodder for another topic, but I am of the opinion that in the event of a very large reorg (where more than 6 blocks are rolled back), the proper behavior of a bitcoin client should be to stop functioning and demand that a new client be downloaded from the developer(s) of that client, and to assist the user in exporting their wallet and ensuring their replacement client is properly signed by the developer.  This is based on philosophizing that it would be better for the bitcoin network to go down in an orderly fashion in the event of an attack - long enough for developers to agree on countermeasures tailored to the attack - just like the recent space rocket that changed its mind and aborted the launch at the last second - rather than for it to stay up and operate chaotically and at a financial loss to users.

Mentioning that is not intended to derail the thread - I am certain there isn't a consensus on what I just said, and it may very well be an unacceptable idea.

But I am saying it because: If it were ever to be debated and determined that what I threw out IS in fact a good idea, it would also settle the question as to how to ensure the tree can be rolled back.  The answer would be simple: at a minimum, keep a copy of the tree as it looked at the point representing the maximum amount we're willing to roll back without ceasing to function.

legendary
Activity: 1428
Merit: 1093
Core Armory Developer
Perhaps that's another reason to use insert-order-invariant tree structure.  If you have to reverse some transactions, you don't have to worry how you undo them.  Undoing a block/tx is just as easy as adding it in the first place.

You still have to have the complete data that you would remove. E.g. when I spend txn X,  I don't specify all of X's data to spend it (thats burried elsewhere in the chain) only the hash. Order invariant wouldn't let me recover that. I need some kind of undo data, even if its just the location of the original txn so that I could fetch it. (though more efficient if you can serve me up a whole undo block)

Yeah, I actually realized that and was editing my previous post to say this:

It's not actually trivial to reverse blocks in any particular pure-pruned scheme, since adding blocks involves deleting UTXOs.  So reversing the block means you have to re-add UTXOs that are not defined by the block you are reversing (it only references the OutPoint by hash:index, not the whole UTXO).  So you have to either save them or request them from another node.  Perhaps the solution is to keep a circular buffer of the last N UTXOs that were removed, as long as N is enough to cover the last, say 50 blocks.  Any time you use map.delete when updating the tree, you use a buffer.add to save it (which will also discard the oldest element in the buffer).  Then when you need to reverse a tx, you know the OutPoints that need to be re-added, and if N is large enough, you'll have it in the buffer.  Worst case, you have to fetch some data from another node, which isn't terrible.

staff
Activity: 4284
Merit: 8808
Perhaps that's another reason to use insert-order-invariant tree structure.  If you have to reverse some transactions, you don't have to worry how you undo them.  Undoing a block/tx is just as easy as adding it in the first place.

You still have to have the complete data that you would remove. E.g. when I spend txn X,  I don't specify all of X's data to spend it (thats burried elsewhere in the chain) only the hash. Order invariant wouldn't let me recover that. I need some kind of undo data, even if its just the location of the original txn so that I could fetch it. (though more efficient if you can serve me up a whole undo block)
legendary
Activity: 1428
Merit: 1093
Core Armory Developer
One recent revelation I've had as a result of Pieter's ultraprune implementation is that any tree commitment scheme should also commit to undo logs so that nodes don't necessarily have to all individually store all the data required to reorg forever.

Perhaps that's another reason to use insert-order-invariant tree structure.  If you have to reverse some transactions, you don't have to worry how you undo them.  Undoing a block/tx is just as easy as adding it in the first place.
staff
Activity: 4284
Merit: 8808
One recent revelation I've had as a result of Pieter's ultraprune implementation is that any tree commitment scheme should also commit to undo logs so that nodes don't necessarily have to all individually store all the data required to reorg forever.
vip
Activity: 1386
Merit: 1140
The Casascius 1oz 10BTC Silver Round (w/ Gold B)
The alt-chain mints blocks independently of the main-chain

An alt-"chain" is probably the wrong way to think of this.  It's committed data. There doesn't need to be a chain.  Each bitcoin block could have 0 or 1 tree commitments of a given type.

+100

The term that makes the most sense for me is a "meta-tree".  It would never be "mined" - it would simply be committed to in normal blocks - optionally - at least until it is proven to work in practice and a decision is made to make it a mandatory part of the protocol.

full member
Activity: 156
Merit: 100
Firstbits: 1dithi
The "alt-chain" term confuses people. It's more like a "sub-chain" of the main blockchain. It doesn't generate any coins at all, and it's a temporal fix until the majority of miners support it.
sr. member
Activity: 389
Merit: 250
The alt-chain mints blocks independently of the main-chain

An alt-"chain" is probably the wrong way to think of this.  It's committed data. There doesn't need to be a chain.  Each bitcoin block could have 0 or 1 tree commitments of a given type.

The chain has established itself as a good proof-of-work system, which is the largest reason to stick with it that I can see right off hand. However it may run more like P2Pool in that storing old blocks (other than headers) may be slightly useless, or at least contrary to the goal of eliminating extra data. Setting up the alt-chain with either "none" or "some" updates may be the way to go to preserve simplicity however compared to normal chains. For one of the datastructure guys, would there be an advantage in marking updates as "a few" vs "many" in terms of packing the data? Maybe something else worth looking into for someone who knows how the current setup would work.

As a much more random aside, any idea what the alt-chain coins could be used for? and if transfer would be worthwhile. If transfer doesn't matter just give 1 coin per block and let people sign with their keys how many they've accumulated helping to secure the chain for lite nodes.  Smiley
staff
Activity: 4284
Merit: 8808
The alt-chain mints blocks independently of the main-chain

An alt-"chain" is probably the wrong way to think of this.  It's committed data. There doesn't need to be a chain.  Each bitcoin block could have 0 or 1 tree commitments of a given type.
sr. member
Activity: 389
Merit: 250
No, it'll work as-is. The alt-chain mints blocks independently of the main-chain (excepting merged-mined blocks where they are both minted at the same time).
Everything would work as is, but this chain wouldn't work quite the same I don't think. Since this alt-chain essentially needs to have the same number of blocks as the primary bitcoin chain. If the alt-chain catches up there's no incentive, or value, in mining anything else on it, which otherwise "wastes" the workers that are participating on the chain.

My main point is that linking the block count to the main chain one-to-one changes things somewhat. Or is this not as big of an issue as I'm thinking it is?

Edit: Fixed typos made while using my phone.
legendary
Activity: 905
Merit: 1012
No, it'll work as-is. The alt-chain mints blocks independently of the main-chain (excepting merged-mined blocks where they are both minted at the same time).
Pages:
Jump to: