Author

Topic: Reducing UTXO: users send parent transactions with their merkle branches (Read 2777 times)

full member
Activity: 200
Merit: 104
Software design and user experience.
I am a naughty thread participant. I just patterned matched what the OP was saying with something else I'd recently been thinking and talking about. Tongue  Not exactly fair of me.

No probs. I love all brainstorming going on here. Thanks for sharing ideas and links.
staff
Activity: 4326
Merit: 8951
I am a naughty thread participant. I just patterned matched what the OP was saying with something else I'd recently been thinking and talking about. Tongue  Not exactly fair of me.
legendary
Activity: 1526
Merit: 1134
I'm talking about my TXO commitments idea, which is what I replied to the OP with and is what gmaxwell was talking about.

Ah sorry, my confusion. I haven't really thought about it enough to comment, but it sounds somewhat plausible. A huge change though. I'd need to study it a whole lot more to have a useful opinion.
full member
Activity: 200
Merit: 104
Software design and user experience.
I didn't comment on the proposal because I didn't understand it. Showing that inputs are in the chain does not prove they are unspent. The point of the UTXO set is to check for double spending.

Oops, you are right. The whole scheme allows double spending easily. Thanks for pointing this out to me.
legendary
Activity: 1120
Merit: 1164
But is that what was actually proposed? It didn't seem like it:

Quote
3. Full node does not need to lookup UTXO to check if the parents are valid. This part of UTXO is already provided by the sender.

But the second part of the sentence is not implied by any previous steps.

My comment was just about the first post, not about alternative schemes.

I'm talking about my TXO commitments idea, which is what I replied to the OP with and is what gmaxwell was talking about.
legendary
Activity: 1526
Merit: 1134
But is that what was actually proposed? It didn't seem like it:

Quote
3. Full node does not need to lookup UTXO to check if the parents are valid. This part of UTXO is already provided by the sender.

But the second part of the sentence is not implied by any previous steps.

My comment was just about the first post, not about alternative schemes.
legendary
Activity: 1120
Merit: 1164
I didn't comment on the proposal because I didn't understand it. Showing that inputs are in the chain does not prove they are unspent. The point of the UTXO set is to check for double spending.

Picture taking a merkle tree of some data, computing the tip, and then changing some data and recomputing - only a subset of the intermediary digests need to be updated, and you can prove that the transition between unspent and spent was correct.

Merkle mountain ranges support efficient appends and updates; when a txout is spent the tree is updated, and the updated root is what is committed in the block. Same idea as UTXO commitments, except with a merkle mountain range not only can you can do secure updates, but because it's insertion ordered you can also add new txouts without actually having any of the data in the set. (other than the tips of the trees)
legendary
Activity: 1526
Merit: 1134
I didn't comment on the proposal because I didn't understand it. Showing that inputs are in the chain does not prove they are unspent. The point of the UTXO set is to check for double spending.

legendary
Activity: 1120
Merit: 1164
Mike: rather than platitudes, you got any useful comments on txin proofs from a wallet-side perspective?

Besides, the whole point of MMR TXO commitments is to allow you to verify fully and efficiently without the whole UTXO set; it's the same security as if you did have the full UTXO set. What particular bandwidth vs. local storage tradeoff best suits your needs can then be up to you.

As for hard numbers, so it's log2(n)*32bytes to get to the merkle root in the block header. In reality we can probably skip the Merkle Mountain Range part of the whole scheme for txin proofs, and only provide proofs to block headers, so if we had, say, 2^16 txouts a block you'd be looking at 512 byte proofs per txin roughly, worst case. That's not bad really - a txin takes on the order of ~140 bytes anyway and the bandwidth quickly drops by just storing some of the merkle trees associated with blocks.
legendary
Activity: 1526
Merit: 1134
Well, there has been much discussion about that, but you reach some fundamental limits - the nature of Bitcoin is every node is independent and doesn't trust the others. That means it has to be able to check transactions for itself, which means a full UTXO set.

Fortunately, many interesting scalability targets don't really pose problems with the current representations we're using. I often use VISA as an example because people understand that, and it's what Satoshi originally used to explain Bitcoin's scalability to me. But you can pick other targets and things still look OK, in that hobbyists could afford to run nodes without breaking the bank. The time when people start getting worried is when they pick "infinity" as a target, or simply don't pick any target at all (which is almost the same thing), and then of course you run into the problem of infinite resource usage.

staff
Activity: 4326
Merit: 8951
Wow, I'd just assumed that was a typo, not that you actually thought that. An index of all addresses is only about 1.5GB of additional data if efficiently stored.

If you're just cramming the whole blockchain into some off the shelf SQL database, then yea— 100GBytes wouldn't be surprising at all.
full member
Activity: 200
Merit: 104
Software design and user experience.
(Also, full efficient UTXO index today takes >100 Gb, so miners have to simply scan blockchain to find parents.)

By "UTXO index" you must be meaning something different to what I'd expect, as that's more like 100 Mb, not 100 Gb.

Yep, sipa already corrected me on that one.  A guy from Coinbase in San Jose told me that their index of all addresses is beyond 100 Gb (BitcoinQT indexes only your addresses). I mistakenly thought it was the same as UTXO.

Anyway, I've seen a lot of talk about growing UTXO and "dust" transactions and I don't like shifting technical problems into social ones (e.g. blaming users for "spamming" blockchain etc.) So I think we should simply find nonjudgmental technical solution to it instead of pointing fingers.

legendary
Activity: 1526
Merit: 1134
(Also, full efficient UTXO index today takes >100 Gb, so miners have to simply scan blockchain to find parents.)

By "UTXO index" you must be meaning something different to what I'd expect, as that's more like 100 Mb, not 100 Gb.
staff
Activity: 4326
Merit: 8951
Won't this increase the use of bandwidth, which is much more expensive than local storage?
It would enable a bandwidth/storage tradeoff. E.g. if you're willing to store all the data yourself, you could tell your peers not to send you all the proof data which you can simply construct on your own.

If all full nodes stored the data the bandwidth would be ~= to what we have now.

Bandwidth is more costly than storage, indeed, but it's "instant bandwidth" vs "integral storage" so the tradeoff isn't entirely clear.  Even without this idea the data must be sent to you, so the most additional bandwidth for storageless is the overhead of the proof vs actually sending the data.  This would be a small constant factor, I suppose.  Might be useful to actually reason out some of the details to get more concrete numbers.
legendary
Activity: 1792
Merit: 1121
Won't this increase the use of bandwidth, which is much more expensive than local storage?
staff
Activity: 4326
Merit: 8951
2. Users send not only a transaction, but all parent transactions and their merkle branches.
3. Full node does not need to lookup UTXO to check if the parents are valid. This part of UTXO is already provided by the sender. Node needs only to check that merkle branches are valid and point to a block that was already validated.
The trick here is that the UTXO needs to be constructed here in such a way that the information provided with transactions is always enough to update the new committed UTXO hash.

This is trickier than it might seem at first glance for a couple reasons.

First, a proof of UTXO existence must also carry enough data to perform a proof for removal. Some tree structures make these proofs one and the same, but in others they are not.

Secondly, users construct their proofs independently of each other.  So a user constructs a proof for find and remove A, and another user constructs a proof for find and remove B.  This means the block must contain a proof for remove A+B.   This requirement generally eliminates any UTXO scheme based on a self-balancing tree and any scheme with content adaptive level compression,  since the A+B proof may need to access additional data than the A or B alone in order to rebalanced or recompress the tree.  (Note, most UTXO discussion on this forum has been about tree types invalidated by this requirement. It's easily fixed, but I guess it's good we didn't run headlong into implementing them). In #bitcoin-dev we've been calling this a "composable" or "commutative" property.

Insertion of new utxo, in particular, is somewhat tricky: For any kind of binary search tree insert may need to happen at arbitrary location. Users can not write proofs for the insertions of their new UTXO sets because they have no clue what the state of the tree would be in at the time their insertion actually happens.

Petertodd's suggestion is to store the utxo as an authenticated insertion ordered binary tree which supports efficient inserts, a merkle mountain range. This addresses the above issues nicely. Proofs are still invalidated by updates, but anyone who has the update can correct any proof (even a proof originally written by a third party).

Most important about Petertodd's suggestion is that it completely eliminates the necessity of storing third party data.  In a practical system nodes that store everything would still exist, of course, but they aren't required in Petertodd's idea: The system would work fine so long as each person keeps track of only his own coins (plus headers, of course).

There are some tradeoffs in this scheme however:  Anyone storing the proof required to spend a coin must observe every block so that they can update their proofs as the coins surrounding their coins in the UTXO set change.  Proof sizes would be log2() in the size of the complete history, rather than in the size of the spendable coins because unless we nodes to store the complete history there may be no single party that has the data required to re-balance the tree as coins are spent.

The latter point is not really a killer, since log2() grows slowly and the universe is finite. Smiley It's also somewhat offset by the fact that spends of recently created coins would have smaller proofs. The first point can be addressed by the existence of nodes who do store the complete data, and unlike in the Bitcoin of today, those nodes could actually get compensated for the service they provide.  (E.g. I write a txn spending my coin, but I can't produce the proof because I've been has been offline for a long time.  Your node tells me that it'll provide the proof, so long as the transaction pays it some fee).

The cost of observation could potentially be reduced if nodes were required to store the N top levels of the tree, by virtue of not including them with transactions. Then you would only need to observe the blocks (or, rather, the parts of blocks) which made updates to branches where you have txouts.

The potential to completely eliminate storing third party data removes some of the hazards of the Bitcoin design. E.g. no more incentive to abuse the blockchain as a backup service. No need to worry about people stuffing child pornography into it to try to get the data censored.  However, that full vision also requires that new nodes be able to bootstrap without auditing the old history. This would be a substantial change from the current zero trust model, and I'm not sure if such a change would be viable in Bitcoin. At a minimum it would probably require the robust existence of fruad proofs, in order to make a persuasive argument to newcomers that the history they can't review doesn't contain violations of the rules.
legendary
Activity: 1120
Merit: 1164
We were discussing this issue the other day on IRC actually; I've got an idea called TXO commitments that lets full nodes and mining require no disk storage at all, and pushes the cost of storing the chain to those whose wallets are involved. I'll write it up better when I get a bit of time, but in the meantime you might find the IRC logs interesting reading: https://s3.amazonaws.com/peter.todd/bitcoin-wizards-13-10-17.log
full member
Activity: 200
Merit: 104
Software design and user experience.
Edit-2: this scheme is completely broken as pointed out by Mike Hearn below (allows double spending).


If it was already proposed somewhere, I'll appreciate the link.

Problem: as more people use Bitcoin, number of spendable transactions inevitably grows (UTXO set). Today full nodes must maintain full UTXO set ("previous transactions") to verify incoming transactions. This is a duplicated effort which would be nice to distribute more fairly. (Also, full efficient UTXO index today takes >100 Gb, so miners have to simply scan blockchain to find parents.)

Solution:

1. Each full node only stores all block headers that it considers valid (not simply most POW-ed, but really valid).

2. Users send not only a transaction, but all parent transactions and their merkle branches.

3. Full node does not need to lookup UTXO to check if the parents are valid. This part of UTXO is already provided by the sender. Node needs only to check that merkle branches are valid and point to a block that was already validated.

4. Mining nodes collect all incoming transactions this way. When the block is mined, they don't need to keep those transactions in UTXO.

5. Some nodes still need to store full blocks to send to users — so users know if their transactions are included and where. So they can further send parent txs for each new tx.

As a result, UTXO is proportionally distributed among all users with insignificant overhead for network bandwidth. (At least, now users pay for that overhead, not miners for scanning the blockchain.)

This does not address the blockchain storage. Someone should store all or some blocks so users can know about merkle branches for their spendable transactions. But block storage does not require high-performance indexing like UTXO.

Block storage can be further reduced by having nodes store random portions of old history or not having old blocks at all - only blocks for the last month (so that receiving clients have enough time to catch up and find their new spendable txs). Full history could be provided for payment by specialized services.


Edit: this behavior can be an extension to a protocol which can be promoted by lower transaction fees. If the miner needs to lookup your parent transactions, you'd have to pay for that. If you send them along your own, it'll be cheaper. In other words, your tx priority is lower if it incurs extra validation costs.

Edit-2: as pointed out by Mike Hearn below, this scheme allows double-spending as node does not check if the inputs were already spent.
Jump to: