Proposal: How to cull the blockchain

haltingprobability

member

Activity: 98

Merit: 26

Today, I perused Wuille's ECMH idea, followed his references and also searched bitcoin-dev for related threads. Not a comprehensive deep-dive, but a start.

While some of these ideas address the problem of full-node sync time, others are optimizing for different problem sets. I'll list some of the problem sets I've seen addressed (and which seem to be related):

- Full-node sync time
- UTXO set size growth
- UTXO memory footprint (mining)
- "Dormant" UTXOs (incl. dust, unspendable (non-provable), cold storage, etc.)
- Merklized UTXO constructs (UTXO "commit")
- Network bandwidth
- Storage costs

The concern I have with Wuille's approach is its complexity. The blockchain is not an especially performance-optimized structure; specifically, whenever security and performance are mutually-exclusive, the blockchain design chooses security over performance. Hence, 10 minutes per block, 1MB block size, etc. Of course, for the same security, it's always preferable to choose better performance, all else equal. But I don't agree that EC-based "128-bit equivalent" security is suitable for UTXO-set operations because the UTXO-set is built out of blocks which are directly secured by mining.

I think a Merklix tree offers a superior alternative that captures any benefits of ECMH without requiring any leaps of faith in the theoretical bit-equivalent security. By inserting in txid-radix order and propagating the Merkle hash up to the root, only log(N) hashes have to be recalculated for a UTXO-set of size N.

I think the angst over how to serialize transaction data is unnecessary because the blockchain already just is a serialized structure containing all transaction data. For the leaves of the Merklix tree, we define the canonical form of each "leaf" as the serialized TXID+witness+UTXO balance, as these are represented in the block structure itself. In this way, there is no added burden, it's just a matter of juggling a few pointers (or offsets, if you're using a pointer-safe language). Here is a very similar proposal that is basically an ad hoc/mis-shapen Merklix tree.

As explained in this blog post, the Merklix tree could even provide horizontal scaling in transaction processing by allowing nodes to choose to only host part of the UTXO-set (with Merkle proofs) and sluff off requests for any missing parts of the set to another node. This would not require any nodes to run in "backup" or "archive" mode, so long as all the nodes, taken together, have a complete copy of the UTXO-set. Of course, we are a long ways away from the kind of scale where that would matter but it's always nice to have a solution that is both parallelizable and scalable at O(log(N)).

Settling on a definition for calculating a UTXO-commit Merkle root does not compel anybody to choose this or that internal implementation because (a) the blockchain already is serialized and (b) any definition of a UTXO-commit structure will be, by its very nature, normative in the definition of its calculation but not in the manner that the node operator chooses to implement that calculation. Suppose we define the UTXO Merkle root in terms of a radix-tree on txids - that does not actually compel anybody to store transactions in a radix-tree. A full-node could sync on the block-chain using any internal implementation, then it can calculate the UTXO Merkle root by iterating over its chosen UTXO representation in TXID-order with nothing more than a linked-list of hashes (log N). This is an added step for sync, but it pays for itself because the full-node only needs the latest UTXO-commit Merkle root and a correct copy of the UTXO-set from any other node that will serve it, in order to get synched. If desired, the full-node can then begin "working back" through the blockchain (at least to the point where UTXO-commit-fork occurred).

Description of Merklix tree

haltingprobability

member

Activity: 98

Merit: 26

Quote from: achow101 on October 30, 2017, 10:06:57 AM

Quote from: haltingprobability on October 29, 2017, 11:58:44 PM

Thanks for that info. Can you tell me who is working on this? Any links to working docs/discussions/dev-lists? In my opinion, this is the last technological hurdle that Bitcoin needs to jump before it is ready to scale to widespread adoption - that's why I care about this issue a lot.

Pieter Wuille has been working on this. A mailing list email about this is here: https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2017-May/014337.html

What he has been working on is a way to do rolling UTXO set hashes so that the UTXO set can be committed to the blockchain to allow for UTXO set syncing. Once we figure that out, the rest is fairly trivial.

Thanks for that link.

achow101

staff

Activity: 3458

Merit: 6793

Just writing some code

Quote from: haltingprobability on October 29, 2017, 11:58:44 PM

Thanks for that info. Can you tell me who is working on this? Any links to working docs/discussions/dev-lists? In my opinion, this is the last technological hurdle that Bitcoin needs to jump before it is ready to scale to widespread adoption - that's why I care about this issue a lot.

Pieter Wuille has been working on this. A mailing list email about this is here: https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2017-May/014337.html

What he has been working on is a way to do rolling UTXO set hashes so that the UTXO set can be committed to the blockchain to allow for UTXO set syncing. Once we figure that out, the rest is fairly trivial.

haltingprobability

member

Activity: 98

Merit: 26

Quote from: achow101 on October 29, 2017, 11:49:46 PM

Quote from: haltingprobability on October 29, 2017, 11:47:14 PM

Thanks for the feedback. Now that you've put it that way, it makes me think that this could be seen as a "full node with fast p2p join" protocol, where the small, latest suffix of the blockchain (+snapshot) is utilized to bootstrap the full node, but the full node operator could set an option to continue pulling in archival data from other archiving full-nodes after the blockchain suffix has been verified and the full node has gone online.

This idea is known as UTXO syncing and is something that has been proposed and though about for several years now. It is still being worked on and to enable UTXO syncing would require a fork. Enabling UTXO syncing requires that each block commits to the entire UTXO set. People are working on ways to securely and efficiently commit the UTXO set but all of that is still being researched.

Thanks for that info. Can you tell me who is working on this? Any links to working docs/discussions/dev-lists? In my opinion, this is the last technological hurdle that Bitcoin needs to jump before it is ready to scale to widespread adoption - that's why I care about this issue a lot.

achow101

staff

Activity: 3458

Merit: 6793

Just writing some code

Quote from: haltingprobability on October 29, 2017, 11:47:14 PM

Thanks for the feedback. Now that you've put it that way, it makes me think that this could be seen as a "full node with fast p2p join" protocol, where the small, latest suffix of the blockchain (+snapshot) is utilized to bootstrap the full node, but the full node operator could set an option to continue pulling in archival data from other archiving full-nodes after the blockchain suffix has been verified and the full node has gone online.

This idea is known as UTXO syncing and is something that has been proposed and though about for several years now. It is still being worked on and to enable UTXO syncing would require a fork. Enabling UTXO syncing requires that each block commits to the entire UTXO set. People are working on ways to securely and efficiently commit the UTXO set but all of that is still being researched.

haltingprobability

member

Activity: 98

Merit: 26

Quote from: doctor-s on October 29, 2017, 11:28:07 PM

Interesting idea, one thing to note is that this would result in the loss of historical transaction data prior to the snapshot unless further stratification of the network occurred i.e.:

* Full nodes with historical txs
* Full nodes with snapshots
* SPV nodes
* Web wallets

However, I guess this would allow a new type of fully verifying node to exist, which would be good.

Thanks for the feedback. Now that you've put it that way, it makes me think that this could be seen as a "full node with fast p2p join" protocol, where the small, latest suffix of the blockchain (+snapshot) is utilized to bootstrap the full node, but the full node operator could set an option to continue pulling in archival data from other archiving full-nodes after the blockchain suffix has been verified and the full node has gone online.

doctor-s

member

Activity: 73

Merit: 13

Interesting idea, one thing to note is that this would result in the loss of historical transaction data prior to the snapshot unless further stratification of the network occurred i.e.:

* Full nodes with historical txs
* Full nodes with snapshots
* SPV nodes
* Web wallets

However, I guess this would allow a new type of fully verifying node to exist, which would be good.

haltingprobability

member

Activity: 98

Merit: 26

How to cull the blockchain

I would like to see this drafted into a BIP. I'm not offering to author a BIP at this point, but I would definitely like to get the conversation started. Fire away.

Topic: Proposal: How to cull the blockchain (Read 616 times)