First of all it is important that we actually have the categories of UTXOs named beforehand, which boils down to classifying them by script. Right now I can only think of coinbase scripts, OP_RETURN scripts and everything else so that makes three types.
My question is how will you optimize UTXO storage for a specific kind of script, e.g. making OP_RETURN UTXOs more compact? Or more specifically, where in the merkle forest will this try to place each of these three script types (and whether there are more types you have in mind to categorize with)
the standard bitcoin-core UTXO(chainstate db) is usually laid out like
EG
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
<------------------------------------------------------------------------------>
|
TXID
01 04 8358 00 816115944e077fe7c803cfa57f29b36bf87c1d35 8bb85e
<><><--><><--------------------------------------------------> <---->
| | | \ | |
| | value | address blockheight
| code tx type
|
version
from what i can fathum from the topic creators idea (from many forum subcategories posts)
is instead of this one bloated database
have 2-3 databases. mainly
1. utxosets over 80k blocks ago(older than 18months) block 0-607600
2. utxoset under 80k blocks ago(less than 18month old) block 607601-687601
where by its deems old hoarded coins older then 18months are less likely to move soon
and so not needed to by sifted through every time
his next point, adding in other topic idea of utreexo
is to organise the records.
i presume
blockheight that then branches off the transactions within the blockheight. and then branch off the utxo of the transaction
i am going to predict his next idea in a few weeks is to have 3 databases
whereby its still 2 databases of under/over 18month
but instead of having the txid. ver,code,type, value, address, blockheight
its simply
blockheight:address:index
where that index then seeks a third database for the txid ver,code, type, value
so that when checking if an input is actually a UTXO it just seeks if the address is in the blockheight
and if found then seeks the extra data in the 3rd database
where blockheight+index finds the corresponding branch for details of the utxo
..
in short database 1. is always open(in ram)
whereby with only holding 80k blocks its very lean
4byte blockheight* 80k
20byte address * ~3000 *80000
2byte index *3000 * 80000
the other 2 databases are stored on harddrive and are only opened when needed
thus reducing the amount of ram used by not storing all blockchain of utxos (EG 10m instead of 80mill utxo)
not using as many hard drive reads. thus less wear on hard drives
the problems with this:
utxo's can be formed and spent in the same block
though its more organised to find a tx by its blockheight its falling foul of errors of duplications and other issues
some people still re-use address and so identifying each utxo idependantly is crucial
as for other methods of treeing utxo together. well most uxto are independant of each other. you cant really cluster transactions together based on whatever social analysis topic creator tries.
even spends originating from exchanges. once distributed you cant really code if statements to know if one utxo is going to be hoarded for 2 years or spend in less time