Author

Topic: Reduced storage mining nodes and compressing the UTXO set (Read 978 times)

legendary
Activity: 1232
Merit: 1094
These days it is a real hassle to verify a signature of a transaction, while the inputs are god knows where.

Yes, and it is easy for the owner of coins to keep the input transactions. 
legendary
Activity: 2053
Merit: 1356
aka tonikt
All right.
Now I think I get it - thanks for explaining!

I personally like the "extended transactions" idea; where each tx is broadcasted along with its input-txs.
These days it is a real hassle to verify a signature of a transaction, while the inputs are god knows where.
legendary
Activity: 1232
Merit: 1094
But what about the volume checks?
Each UTXO has a specific BTC value associated with it and each transaction must not spend more then the sum of all its inputs.
Even if it is properly signed and even if it got mined into a block with a valid POW - it must still verify the transaction volumes which I do not quite see possible if you don't have the full database.

The amount is in the input transaction.  You would check the signatures match, and that the amounts all add up correctly.

The idea is to allow blocks to be verified on a stand alone basis.  You receive an extended block and it has all the info needed to verify that the block is valid.

The only extra info you need is the header chain and the UTXO set.

This reduces the load on each node. 

There would still need to be some nodes that convert transactions into extended transactions.  These could be relay nodes that have the entire database.

However, if the system was popular, then wallets would store the info so they could broadcast extended transactions directly.
legendary
Activity: 2053
Merit: 1356
aka tonikt
However, once you have all the inputs into the transaction, the node can do the verification without needing a database of all the transactions.
But what about the volume checks?
Each UTXO has a specific BTC value associated with it and each transaction must not spend more then the sum of all its inputs.
Even if it is properly signed and even if it got mined into a block with a valid POW - it must still verify the transaction volumes which I do not quite see possible if you don't have the full database.
legendary
Activity: 1232
Merit: 1094
With pruning, all transactions in the the UTXO set must be stored. There is no point in all nodes storing all this info.
How do you see a node being able to verify a validity of a new block, if it would not have the entire UTXO database?

It would need to have the extended block, rather than just a standard block.

A block at the moment is basically just a header + list of transactions.



tx[0]
tx[1]
...

Instead, the extended block would be something like



tx[0]
   
    tx for input 1; merkle path
    tx for input 2; merkle path
    ....

tx[1]
   
    tx for input 1; merkle path
    tx for input 2; merkle path
    ...

....

The average transaction is 250 bytes and most transactions have 2 inputs.  If there are 4096 transactions in a block, then there are 12 levels of the merkle tree down to the transaction.  This gives 32 bytes * 12 = 384 bytes for the merkle path.  This means that each input would be around 250 + 384 = 634 bytes.

That gives 634 * 2 + 250 = 1518 bytes for each extended transaction.  This is 6 times larger than just including the transaction.

However, once you have all the inputs into the transaction, the node can do the verification without needing a database of all the transactions.

A soft fork could be added setting the maximum extended block size to 6MB (in addition to the normal block representation being limited to 1MB).

When broadcasting transactions, spenders could include the expanded transaction.  Some miners might refuse to include transactions received as standard transactions in their memory pools.
legendary
Activity: 2053
Merit: 1356
aka tonikt
With pruning, all transactions in the the UTXO set must be stored. There is no point in all nodes storing all this info.
How do you see a node being able to verify a validity of a new block, if it would not have the entire UTXO database?
legendary
Activity: 1232
Merit: 1094
I was thinking about the "ultimate compression" system and I wonder if it is overkill for the time being.  The whole point is to record the UTXO set, but that set isn't really that large anyway.

With pruning, all transactions in the the UTXO set must be stored. There is no point in all nodes storing all this info.

Each entry in the UTXO set just needs to record block index, transactions index and output index.

The UTXO set entry needs to include a 3 byte block index (until block 16 million or so), a 2 byte transaction index and a 1 byte output index.  They could be formally variable length integer types.

This works out as 6 bytes that need to be stored for each UTXO set entry rather than needing at least 250 bytes per transaction.

When downloading the chain, the node would find the longest POW chain (headers first) and then scan that chain from genesis to the longest point.  It could snapshot the UTXO set every few blocks once it nears the end of the chain (... 100000, 10000, 1000, 100, 20 blocks back, and then all of last 20 blocks as diffs).

If a fork happens that is less than 20 blocks (almost all barring bug based forks), it could recalculate the set from the diffs.  The exact snapshot system isn't that critical.  Diffs would likely improve things, especially for the last 20 snapshots.

As it is, this node couldn't do signature verification, but it could make sure there is no double spending.

If this was combined with "expanded" transactions, then it could do signature checks.  An expanded transaction contains all of the transaction's input transactions (and merkle path from the block header).  A wallet could store the transactions that pay to any of its addresses.  This would allow it construct the expanded transaction when it was spending.

If a node received the expanded transaction, it would check

- the merkle path for all input transactions (requires block headers)
- the signatures for all inputs (requires input transactions)
- the inputs aren't double spent (requires the utxo set)

If those checks pass, then it knows the transaction is valid and it can be included in blocks.

Expanded transactions put the burden of tx storage on the owners of the outputs rather than requiring all nodes to store the info.

It also increases total network bandwidth usage in order to reduce storage.  An "expanded" block would be a block that includes expanded transactions (and so would be larger).

This could be an extra bit in the services map.  A EXPANDED_STORE_NODE would make no statement with regards to verification.  It would just store expanded blocks.

A hard fork would make this even more efficient.  Each transaction output could be simply a MAST (Merkle abstract syntax tree) hash and it could be included directly into the block's Merkle tree.

The expanded transaction would include the path to the MAST root for each input and then the sig script (which could be verified against the MAST root).  This means that if an input is from a very large transaction, it is still very small.

However, even without a hard fork, it would allow nodes to use much less disk space (but use up more bandwidth).

Nodes running on a virtual machine in a data centre probably are disk limited.  Nodes running in someone's home is likely bandwidth limited.

A full node could convert a standard transaction into an expanded transaction and act as a bridge.
Jump to: