Today, I perused Wuille's ECMH idea, followed his references and also searched bitcoin-dev for related threads. Not a comprehensive deep-dive, but a start.
While some of these ideas address the problem of full-node sync time, others are optimizing for different problem sets. I'll list some of the problem sets I've seen addressed (and which seem to be related):
- Full-node sync time
- UTXO set size growth
- UTXO memory footprint (mining)
- "Dormant" UTXOs (incl. dust, unspendable (non-provable), cold storage, etc.)
- Merklized UTXO constructs (UTXO "commit")
- Network bandwidth
- Storage costs
The concern I have with Wuille's approach is its complexity. The blockchain is not an especially performance-optimized structure; specifically, whenever security and performance are mutually-exclusive, the blockchain design chooses security over performance. Hence, 10 minutes per block, 1MB block size, etc. Of course, for the same security, it's always preferable to choose better performance, all else equal. But I don't agree that EC-based "128-bit equivalent" security is suitable for UTXO-set operations because the UTXO-set is built out of blocks which are directly secured by mining.
I think a
Merklix tree offers a superior alternative that captures any benefits of ECMH without requiring any leaps of faith in the theoretical bit-equivalent security. By inserting in txid-radix order and propagating the Merkle hash up to the root, only log(N) hashes have to be recalculated for a UTXO-set of size N.
I think the angst over how to serialize transaction data is unnecessary because the blockchain already just is a serialized structure containing all transaction data. For the leaves of the Merklix tree, we define the canonical form of each "leaf" as the serialized TXID+witness+UTXO balance, as these are represented in the block structure itself. In this way, there is no added burden, it's just a matter of juggling a few pointers (or offsets, if you're using a pointer-safe language).
Here is a very similar proposal that is basically an ad hoc/mis-shapen Merklix tree.
As explained in
this blog post, the Merklix tree could even provide horizontal scaling in transaction processing by allowing nodes to choose to only host part of the UTXO-set (with Merkle proofs) and sluff off requests for any missing parts of the set to another node. This would not require any nodes to run in "backup" or "archive" mode, so long as all the nodes, taken together, have a complete copy of the UTXO-set. Of course, we are a long ways away from the kind of scale where that would matter but it's always nice to have a solution that is both parallelizable and scalable at O(log(N)).
Settling on a definition for calculating a UTXO-commit Merkle root does not compel anybody to choose this or that internal implementation because (a) the blockchain already is serialized and (b)
any definition of a UTXO-commit structure will be, by its very nature, normative in the definition of its calculation but not in the manner that the node operator chooses to implement that calculation. Suppose we
define the UTXO Merkle root in terms of a radix-tree on txids - that does not actually compel anybody to store transactions in a radix-tree. A full-node could sync on the block-chain using any internal implementation, then it can calculate the UTXO Merkle root by iterating over its chosen UTXO representation in TXID-order with nothing more than a linked-list of hashes (log N). This is an added step for sync, but it pays for itself because the full-node only needs the latest UTXO-commit Merkle root and a correct copy of the UTXO-set from any other node that will serve it, in order to get synched. If desired, the full-node can then begin "working back" through the blockchain (at least to the point where UTXO-commit-fork occurred).
Description of Merklix tree