Author

Topic: Locally verifiable unspent transaction output commitments (Read 1769 times)

TierNolan

legendary

Activity: 1232

Merit: 1094

Quote from: jl2012 on September 16, 2015, 05:56:47 AM

I don't quite understand your approach but I have another approach for this problem. Not sure if they are effectively the same.

The idea is that you provide + change + proof + and it is possible to validate that all the set updates are valid. The UTXO set is valid if each individual step is valid.

This can have a resolution down to a single transaction (or even a single output or input). In that case, the UTXO set commitment is actually the root of a merkle root of all UTXO set roots for each block.

Quote

root_merkle_sum_tree = hash(hash_5|hash_6|12|15)

I was just aiming for the basic set structure for now, but that should have included a sum-tree.

Quote

UTXO is presented as (txid|txoutindex|scriptPubkey|value|minimum_height_to_spend)

I would use hash(scriptPubKey) instead of scriptPubKey. That keeps the entries constant width without loss of security.

Quote

Two UTXO Merkle roots are calculated:
1. All UTXO sorted alphanumerically (root_all_utxo)
2. Spent UTXO in this block, sorted alphanumerically (root_spent_utxo)

With a tree, the sorting is automatic (assuming key is used to select the position in the tree). Under my scheme, there is no dedicated tree balancing. Since the keys are all hashes, they should have reasonable distribution.

An attack requires exponential effort to unbalance the tree. This means that there would be slight imbalances, but it is easier to just accept that.

Quote

Any one of the following proofs is sufficient to invalidate an illegal UTXO commitment, and an illegal UTXO commitment will necessarily produce at least one of the following proofs. (In other words, these conditions are collectively exhaustive for an illegal UTXO commitment)

---------------
Proof of wrong UTXO order: Merkle path of 2 misplaced UTXO records.

If the insertion proofs are valid, then you can't insert an incorrect leaf anyway. You can still prove an error using this proof, but that means that there is an earlier insertion proof that was not properly checked.

Quote

---------------
Proof of illegal deletion of an existing UTXO in root_all_utxo:

Merkle path of the deleted UTXO in last block
Merkle path of the 2 adjacent UTXOs in the root_all_utxo of this block, proving the UTXO is deleted
Merkle path of the 2 adjacent UTXOs in the root_spent_utxo of this block, proving the lack of spending record of this UTXO

The root_all_utxo is completely recalculated for each block?

The advantage of the proposal in the OP means is that you only need to include the parts that change.

"Extended" blocks could replace transactions with:

Code:







...






...

A 250 byte transaction with 2 inputs and 2 outputs is around 250 bytes. With 1 million UTXOs, that means proofs of around 20 hashes. That means 160 bytes per input or output. That increases the transaction size to 890.

Since the start of each proof would be common, it might be possible to decrease that to 80-100 bytes overhead.

jl2012

legendary

Activity: 1792

Merit: 1121

I don't quite understand your approach but I have another approach for this problem. Not sure if they are effectively the same.

I believe it's mathematically correct, but not sure about its computational efficiency

For every block, there will be a Merkle sum tree for the number of spent UTXO and new UTXO. For example, if we have 4 txs:

Tx1 (coinbase): 0 spent UTXO, 2 new UTXO
Tx2: 1 spent UTXO, 4 new UTXO
Tx3: 5 spent UTXO, 1 new UTXO
Tx4: 6 spent UTXO, 8 new UTXO

sum_hash_1 = hash(tx1_hash|0|2)
sum_hash_2 = hash(tx2_hash|1|4)
sum_hash_3 = hash(tx3_hash|5|1)
sum_hash_4 = hash(tx4_hash|6|8)
sum_hash_5 = hash(hash_1|hash_2|1|6)
sum_hash_6 = hash(hash_3|hash_4|11|9)
root_merkle_sum_tree = hash(hash_5|hash_6|12|15)

UTXO is presented as (txid|txoutindex|scriptPubkey|value|minimum_height_to_spend)

(For normal tx, minimum_height_to_spend is the block height. For coinbase tx, it is block height + 100)

Two UTXO Merkle roots are calculated:
1. All UTXO sorted alphanumerically (root_all_utxo)
2. Spent UTXO in this block, sorted alphanumerically (root_spent_utxo)

Any one of the following proofs is sufficient to invalidate an illegal UTXO commitment, and an illegal UTXO commitment will necessarily produce at least one of the following proofs. (In other words, these conditions are collectively exhaustive for an illegal UTXO commitment)

---------------
Proof of wrong UTXO order: Merkle path of 2 misplaced UTXO records.

---------------
Proof of incorrect # of UTXO in root_all_utxo: # of UTXO in the last block + # of new UTXO - # of spent UTXO != # of UTXO in this block

---------------
Proof of incorrect # of UTXO in root_spent_utxo: # of UTXO in root_spent_utxo != record in root_merkle_sum_tree

---------------
Spending of a UTXO not documented in root_spent_utxo:

A transaction with its Merkle path to the root_merkle_sum_tree
Merkle path of the 2 adjacent UTXOs in the root_spent_utxo, proving the lack of record

---------------
New UTXO not documented in root_all_utxo:

A transaction with its Merkle path to the root_merkle_sum_tree
Merkle path of the 2 adjacent UTXOs in the root_all_utxo, proving the lack of record

---------------
Proof of illegal deletion of an existing UTXO in root_all_utxo:

Merkle path of the deleted UTXO in last block
Merkle path of the 2 adjacent UTXOs in the root_all_utxo of this block, proving the UTXO is deleted
Merkle path of the 2 adjacent UTXOs in the root_spent_utxo of this block, proving the lack of spending record of this UTXO

Explanation:

By 1, we show that a UTXO exists in the last block
By 2, we show that the UTXO is removed in this block
By 3, we show that the removal of this UTXO is not documented in root_spent_utxo

If the removal is indeed documented in root_spent_utxo, some other spent UTXO must not be documented, given that the number of UTXO in root_spent_utxo matches the record in root_merkle_sum_tree. In that case, we could just prove with "Removal of UTXO not documented in root_spent_utxo"

---------------
Proof of illegal addition of UTXO in root_all_utxo / Spent UTXO not removed from root_all_utxo:

This could be indirectly proved by the proofs mentioned. There will be either too many UTXO in root_all_utxo, or some existing UTXOs got illegally deleted, or some new UTXOs are not documented

TierNolan

legendary

Activity: 1232

Merit: 1094

The RAM usage of this system could be reduced significantly by trading off CPU time against RAM.

Branch nodes with fewer than N (say 16) children could avoid storing their digests.

Since in a binary tree, almost all the nodes are in the last few levels of the tree, this would mean that almost all nodes would have no digest.

The digests would have to be regenerated on the fly.

TierNolan

legendary

Activity: 1232

Merit: 1094

A proposal to help SPV nodes verify blocks is to commit the root of an (unbalanced) Merkle tree containing all unspent (and spendable) transaction outputs.

It is possible to create proofs of validity for each modification of the set. The proof would prove that inserting an entry into a tree with a root of X will give a new tree with a root of Y. There would also be proofs for removing entries.

This means that each modification to the UTXO set could have a proof associated with it. A block could contain transactions and proofs. Each proof would be linked to an input or output of a transactions. For each input, there would be a proof of removal from the set and for each output, there would be a proof of insertion into the set. Unspendable outputs would not be inserted into the set. (This would require a formal definition of unspendable, probably outputs which spend to "OP_RETURN ...").

Thinking out loud about the proofs, this is what I was thinking.

Verifiable Merkle Set

There are 2 kinds of nodes, leaf and branch nodes. Each node has a digest associated with the node.

Branch Nodes

These nodes have exactly 2 children. The data storage for these nodes is as follows.

Code:

int256 digest;
MerkleNode *parent;
MerkleNode *left;
MerkleNode *Right;
uint16 height;

Assuming 4 byte pointers, each node requires 46 bytes (or 58 for 8 byte pointers).

The digest for branch nodes is hash( 0x00 | left->digest | right->digest | node->height ).

The height value counts from leaf level. 2 leaves that differed only in the LSB of their entry_digest would have a height 1 parent. The highest height is 256. This is for leaves which differ in the MSB of their entry_digest.

The leading 0x00 in the array passed to the hash function is to indicate a branch node.

Leaf Nodes

These nodes are the leaves of the tree. They have no children. The data storage for these nodes is as follows.

Code:

MerkleNode *parent;
int256 entry_digest;

The digest for the node is hash( 0x01 | hash(entry_value) ) but only hash(entry_value), i.e. the entry_digest, is stored in the tree. The node's digest can be recomputed, as needed, saving RAM.

The leading 0x01 indicates a leaf node. Leaf and branch nodes can be distinguished from each other due to the prefix.

Root Digest

If the set is empty, the digest is defined as all zeroes. Otherwise, it is defined as the digest of the root node of the tree.

Proof of Existence

A proof of existence proves that a particular entry is in the set. This proof is a byte array defined as follows.

Code:

uint16 length;
uint16[length] heights;
int256[length] path_digests;

The proof is a Merkle branch that starts at the leaf and follows the path to the root. It includes the height values for the path.

The height value is used to determine which bit in the entry_digest to use to determine if the path goes left or right at that point.

Since the height value is included as input into the branch node's hash, it is hash sealed.

If the root digest is zero, then it means that there are no entries in the set, so a proof of existence always fails.

Code:

if (root_digest == 0) {
    return false;
}
digest = hash(0x1 | entry_digest);
for (i = 0; i < length; i++) {
    boolean right_child = entry_digest & (((uint256) 1) << heights[i]) != 0;
    if (right_child) {
        digest = hash(0x01 | path_digests[i] | digest | height);
    } else {
        digest = hash(0x01 | digest | path_digests[i] | height);
    }
}
return digest == root_digest;

Removal Proofs

It is possible to compute the new root digest from the existence proof of the leaf node to be removed. The effect of node removal is that the leaf node and the node's (branch node) parent are removed from the tree. The node's sibling replaces the node's parent in the tree.

If the path length is zero, then the node is the only node in the tree.

Code:

< verify the existence proof >

if (length == 0) {
    return new_root_digest == 0; // tree should be empty after removal
}
digest = path_digests[0]; // Sibling's digest
for (i = 1; i < length; i++) {
    boolean right_child = entry_digest & (((uint256) 1) << heights[i]) != 0;
    if (right_child) {
        digest = hash(0x01 | path_digests[i] | digest | height);
    } else {
        digest = hash(0x01 | digest | path_digests[i] | height);
    }
}
return new_root_digest == digest;

Insertion Proofs

As with removal proofs, it is possible to compute the new root digest from an existence proof. At each branch, the proof should choose its path based on the entry_digest of the entry being inserted. The digest of this leaf is called the leaf_entry_digest.

Since the entry_digest is used to select the path, the leaf_entry_digest must have the same bit as the entry_digest for the height of each branch node along the path.

All entry_digests of leaf nodes that are decendants of a branch node have the same most significant bits, begining at a height one greater than the branch node's height. This is inherent due to the tree being a binary tree.

This means that the most significant bits of the leaf_entry_digest can be used to determine the prefix for each branch node. The bits starting at one higher than the height of the branch node indicate the prefix for that branch node.

The new branch node must be inserted directly below the lowest branch node that has a prefix that matches the entry_digest.

Consider the insertion of a node with an entry_digest of 01011010b (assuming the hash function produced single byte digests). The top branch node in the tree has a height of 7. Since bit 7 is one (LSB is bit 1) in the entry_digest, the right path is used. This leads to a branch node with a height of 3. Since bit 3 is 0, the left path is used. The next node is the leaf. It has a leaf_entry_digest of 01110000. This is guaranteed to match the entry digest for bits 7 and 3 but can be anything else for the other bits.

The prefix for the height 7 branch node is 0 (all bits in the leaf_entry_digest above bit 7). This matches the entry_digest, so the new node will be a descendant of that branch node.

The prefix for the height 3 branch node is 01110. This doesn't match the MSBs of the entry_digest. This means that the entry_digest is not a descendant of that branch node.

The new branch node must be inserted as a child of the height 7 branch node instead of the height 3 branch node.

Comparing the 2 digests:
01011010 (entry_digest)
01110000 (leaf_entry_digest)

The highest bit that is different is the bit with height 6. This means that the new branch node has a height of 6 (which is between heights 3 and 7).

A new leaf node is produced for the new entry. It is the left child of the new branch node. The height 3 branch node is the right child.

The pointer from the height 7 branch node that pointed to the height 3 branch node is redirected to the new branch node.

Code:

< verify the existence proof first >

if (old_root_digest == 0) {
    if (length != 0) {
        return false; // proofs must be zero length when inserting into an empty tree
    }
    return new_root_digest == entry_digest;
}

// Find the highest bit that is different between the leaf_entry_digest and the entry_digest
uint16 insertion_height = find_highest_bit_difference(leaf_entry_digest, entry_digest);

// Compute the branch from the leaf to below the insertion point
int256 digest = hash(0x01 | leaf_entry_digest);
int i;
for (i = 0; i < length; i++) {
    if (heights[i] > insertion_height) {
        break;
    }
    boolean right_child = entry_digest & (((uint256) 1) << heights[i]) != 0;
    if (right_child) {
        digest = hash(0x01 | path_digests[i] | digest | height);
    } else {
        digest = hash(0x01 | digest | path_digests[i] | height);
    }
}

int256 new_digest = hash(0x01 | entry_digest);

boolean right_child = entry_digest & (((uint256) 1) << insertion_height) != 0;
if (!right_child) {
    digest = hash(0x01 | new_digest | digest | height);
} else {
    digest = hash(0x01 | digest | new_digest | height);
}

// Finish the path from after the insertion point to the root
for (; i < length; i++) {
    boolean right_child = entry_digest & (((uint256) 1) << heights[i]) != 0;
    if (right_child) {
        digest = hash(0x01 | path_digests[i] | digest | height);
    } else {
        digest = hash(0x01 | digest | path_digests[i] | height);
    }
}
return new_root_digest == digest;

Replacement

Replacement proofs contain a deletion and insertion proof, since the new leaf node cannot be placed in the same location. This doubles the proof size.

In the reference client, the UTXO set is compressed. All the outputs for a single transactions are stored together. This could be supported by replacement. When an output is spent, the old version of the transaction's entry is removed and a new one is added with that output marked as cleared.

This trades off RAM verses hard drive space. The proofs associated with a block would need to be stored, but need not be stored in RAM. If replacement is used, then disk space is doubled, but RAM usage is reduced.

The number of branch nodes in a tree is equal to the number of leaves (minus 1). This means the amount of RAM would be roughly doubled (assuming the data associated with each leaf is around the same as the size of a digest).

In the reference client, the UTXO set is stored in a hash map that maps a 32 byte txid to a compact representation of the transaction outputs. Hard coding that compact represenation as a network rule might be overly limiting. If a new output type became popular, it would not compact it much.