Author

Topic: Proposal: including (UTXO) state hash in blocks (to eliminate IBD for new nodes) (Read 221 times)

legendary
Activity: 2870
Merit: 7490
Crypto Swap Exchange
The problem with block size limit is more than storage capacity and IBD time. I'll just quote my older post

You're missing the point, bitcoin community mostly agree that running a full node should be cheap, which means block size are limited with hardware and internet growth rate.

IMO blockchain size and internet connection aren't the worst part, but rather CPU, RAM and storage speed,
1. Can the CPU verify transaction and block real time?
2. How much RAM needed to store all cached data?
3. Can the storage handle intensive read/write? Ethereum already suffering this problem
The blockchain size will reach 500GB soon, requiring new nodes to download and process 500GB before they can trust anything... The number of full nodes is declining as well.

It's true, but that's not my point. My point is running full node (after completing IBD) will become more expensive if the block size limit is increased.

P.S. i'm not against block size increase, as long as the limit isn't ridiculously high.
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
So what? Once there is consensus about the state of the blockchain in a specific point of time, why should anybody care about the ledger details? It is established practice in banking and accounting systems,  they summarize accounts in balance sheets for using them as the basis for transactions thereafter, it is efficient and secure with no practical alternative available.

I like the idea of having a UTXO hash in each block to speed up verification but I'm against deleting the historic blocks because these blocks contain the transactions between different addresses and it will become impossible for wallets to query some of their transaction history.

Also it'll become impossible to make spin up new block explorers because the blocks are not available to be processed. Some centralized explorers like Blockchain.info already have this publicly available, but why bother with them when they have a rate limit for each request which inhibits fast initial population of the database?

There is a paradox in what you are claiming here and in the proposal in your OP, and that is if you do not store the old blocks, you also cannot store their UTXO hashes in them. This means the entire guarantee that the block is valid is lost. Do you see the problem here? On paper the historic block can be assumed to be valid, but the only way you're going to get these blocks is from other peers, who might break the validity guarantee which is why it's important the old blocks are stored somewhere.
No I don't see the problem? Please try to give an example situation.

Let me sketch the following situation and tell me what would be an issue:
Bitcoin starts to incorporate UTXO state hashes (of the previous block) in new blocks.
I'm a new node and I want to join the bitcoin network. I download an UTXO set of 500 blocks ago, including the blockchain from that point to present (so the chain length is 500 blocks), from a random person I cannot trust blindly.
Because I cannot blindly trust the UTXO state is valid, what I can do to gain trust in the validity: I start verifying the chain of 500 blocks. The chain turns out to be valid.

To gain more trust I can do 2 things:
- Query a lot of nodes in the network if they also have the same UTXO state hash as me in their UTXO-state-hash-history. (trust the network)
- Progressively download an UTXO state+chain from longer ago, and keep verifying forward. This can be done in small steps or big steps, all the way until the genesis block, gaining more and more trust, the older the UTXO state. (verify myself)

What reason would there be not to trust its validity? Please sketch a scenario.

Sure, though my argument is less about the security of UTXO hashes themselves than the raw transactions in blocks.

Suppose I have an address that has a few mining rewards from 10 years ago, some old transactions from 5 years ago, and then more transactions from last week. Also assume each of these groups of transactions happen to be in different blocks. An SPV wallet or full node wants to query the transaction history for this address. Without older blocks being stored anywhere, neither raw transactions nor their hashes can be obtained from each block's UTXO hash alone. Only the newer transactions will be visible because their blocks are still available from their peers.
copper member
Activity: 909
Merit: 2301
Quote
What makes you trust the genesis block of bitcoin, and consequently all blocks after that?
Having it fully verified at least once by my PC.

Quote
It is the amount of PoW (difficulty) and the chain size.
Having all headers with the highest total PoW is one thing, fully validating all blocks is another thing. There were cases in the past when at least for a while there were some chains of invalid blocks with the highest PoW. Of course that chains were quite short, because full nodes quickly rejected them. But when most of the network relies on centralized pools and when they only check headers, they could potentially mine on top of invalid blocks. If you skip checking something, that could be abused if potential gains would be high enough.

Quote
Altcoins are altcoins because they use different algo's.
That don't have to be the case in the future. Imagine a fork with unlimited block size. One block bigger than 1 MB will fork the chain, but Proof of Work will be calculated in the same way. In practice, in all previous battles BTC finally won, but is creating such security issue really worth it? The difference between BTC and some future altcoin can be small enough to pass heaviest-PoW-validation, but big enough to break consensus rules. Also, as you only notarize the final UTXO states, it is possible to include very long transaction chain and you won't be able to tell the difference unless you download the whole block. Imagine a block with Alice's coinbase output and Zack's transaction output. That's all from UTXO's perspective. But if there would be thousands of transactions in this block, sending single coin from Bob to Charlie, then to Daniel, then ..., and then to Zack, you won't verify that the total block size is bigger than allowed! You would stop at UTXO database checking, not even noticing that you are checking some altcoin with unlimited block size.

Quote
If not, they would just be bitcoin. The attack you describe is something like a 51% attack.
Now, 51% attack can be used only when meeting all consensus rules. In your proposal, breaking some rules would be possible, because you seems to assume that the heaviest PoW always shows the truth, which could be different in the future. You assume that all previous UTXO's are correct and for example "trust" that the whole history up to some point in time is correct. Having all nodes doing that would cause that some blocks would be lost forever, forcing new people to put more and more trust in the system, as more and more blocks would be missing.

Quote
What would stop an attacker currently to create a longer chain with extremely low mining difficulty from distributing that as the 'real' bitcoin?
Nothing. Single ASIC can produce a chain with minimal difficulty and faked timestamps, there is no problem with that. But currently, full nodes would check that something is wrong and they would reject that. On the other hand, nodes from your proposal would take the last three years from the chain tip and check that "the difficulty was equal to one for three years" and they would accept such chain, because UTXO database would be valid.

Quote
The blockchain size will reach 500GB soon, requiring new nodes to download and process 500GB before they can trust anything... The number of full nodes is declining as well.
Erasing history is not possible, you can only compress it. When dealing with some sidechains or second layers, you can delete things when they settle on-chain. Here you cannot, because there is no higher layer to commit to.

Quote
So what? Once there is consensus about the state of the blockchain in a specific point of time, why should anybody care about the ledger details? It is established practice in banking and accounting systems,  they summarize accounts in balance sheets for using them as the basis for transactions thereafter, it is efficient and secure with no practical alternative available.
There is an alternative. If you have some sidechain, then you can delete transactions when they settle on-chain. The same for LN and any other second layer. As long as Bitcoin is the first layer, you cannot remove anything. But nothing stops you from building some coin on top of Bitcoin and adding UTXO commitments there, increasing block time to any value you want, for example that chain could produce blocks once per day. It would contain 144 Bitcoin headers and UTXO commitments from that day. Then, you could use that chain without forcing Bitcoin users to change anything. Also, you would probably see all security issues related to your proposal, especially if some altcoin would adapt it.

Quote
As of my chat log, I don't find a serious argument, let alone a correct one, that justifies such an obsession with "history", it'll be highly appreciated if you'd kindly present me with just one example.
1. Backward compatibility. It is backward-incompatible, so nobody will merge that kind of changes.
2. You have to do Initial Block Download once. Only once. Then, you can trust your own computer (if you cannot, then your coins are not safe anyway).
3. You don't have to do Initial Block Download to use Bitcoin Core. You can open your console window and create transactions by explicitly providing all needed data. As long as you can send it to some other full node, it would work without downloading the whole chain.
4. Bitcoin Core is software used by full nodes. If you want some SPV wallet, there are many other options. There is no reason to lower security by forcing other people to skip some data.
staff
Activity: 4326
Merit: 8951
If you're happy blindly trusting miners, Bitcoin has had a name for that security model for a long time: SPV.  Feel free to use it.

Lots of people running the spv model has highly visible failure modes-- we see this already in ethereum: Major exchange configure their nodes if they get stuck to wipe their state and fast sync from the miners.  This means a majority miners can likely substantially override the system and ignore the rules which respect to some (if not almost all) of the major economic players.  The vulnerability is real although it hasn't been exploited yet.

Layering on stuff about querying multiple nodes just obfuscates the vulnerability it doesn't remove it.  If querying multiple nodes worked Bitcoin could have just used it instead of mining.  It's easy to spin up hundreds of thousands of fake nodes and spoof that kind of checking.

Assumeutxo doesn't blindly trust miners-- it requires that the software you're running is correct, which is already a requirement.  Like assumevalid an assumeutxo hash is much easier to audit than most software changes (just check a newly proposed value against your already running node).  Though as you mentioned there were suggestions on including the hashes in blocks for an additional independent verification of it.  The downside of the consensus commitment is that the format becomes normative and the latency of computing it becomes critical.  This would stifle innovation and create an need to have everything mostly perfect the first time through... not ideal.

Quote
It also makes it possible to significantly increase block size (since nobody needs to store the full-chain anymore, everyone can be a pruned node).
Block size is a concern for a lot more than just IBD time, and if actually validating the chain becomes infeasible then Bitcoin's core security guarantees are essentially lost.  Part of the reason for the block size is to create a market for space to generate fees to pay for security, so even with all other considerations aside (propagation, at tip resource usage, utxo set growth rate, etc.) you would still be left with the fact that without the resource constraint there is no mechanism to pay for security other than introducing perpetual inflation.

Quote
While the UTXO set is 4GB or so, a solution can be made to make it easier for nodes and miners to hash the 4GB UTXO set. Merkle Tree?
 You should read more of whats discussed.   Any kind of hash tree commitment is quite expensive to update.  E.g. increasing IO costs a factor of log(txouts) is not great when you're talking about a billion txouts.

The assumeutxo work uses a rolling multiset hash which has no such cost and if its not committed it doesn't put anything in the latency critical path.  Perhaps research it some more before dismissing it?  I think it's a lot more of a realistic near term improvement than anything you're suggesting!
newbie
Activity: 4
Merit: 1
There is a paradox in what you are claiming here and in the proposal in your OP, and that is if you do not store the old blocks, you also cannot store their UTXO hashes in them. This means the entire guarantee that the block is valid is lost. Do you see the problem here? On paper the historic block can be assumed to be valid, but the only way you're going to get these blocks is from other peers, who might break the validity guarantee which is why it's important the old blocks are stored somewhere.
No I don't see the problem? Please try to give an example situation.

Let me sketch the following situation and tell me what would be an issue:
Bitcoin starts to incorporate UTXO state hashes (of the previous block) in new blocks.
I'm a new node and I want to join the bitcoin network. I download an UTXO set of 500 blocks ago, including the blockchain from that point to present (so the chain length is 500 blocks), from a random person I cannot trust blindly.
Because I cannot blindly trust the UTXO state is valid, what I can do to gain trust in the validity: I start verifying the chain of 500 blocks. The chain turns out to be valid.

To gain more trust I can do 2 things:
- Query a lot of nodes in the network if they also have the same UTXO state hash as me in their UTXO-state-hash-history. (trust the network)
- Progressively download an UTXO state+chain from longer ago, and keep verifying forward. This can be done in small steps or big steps, all the way until the genesis block, gaining more and more trust, the older the UTXO state. (verify myself)

What reason would there be not to trust its validity? Please sketch a scenario.

As vjudeu pointed out, you could get a false UTXO state and chain being sent to you (from an altcoin for example), but you can query other nodes to quickly verify if you're dealing with a false UTXO state, to gain trust. You can also progressively download older UTXO states+chain from other nodes, to gain trust. You can also look at the difficulty of the chain, as altcoins have lower mining difficulty.
legendary
Activity: 1456
Merit: 1177
Always remember the cause!
If this works out, it will eliminate the need for a full-chain initial block download. It also makes it possible to significantly increase block size (since nobody needs to store the full-chain anymore, everyone , it is can be a pruned node).

If everybody becomes a pruned node then nobody's going to have a copy of all of the blocks, so all older blocks will be lost.
So what? Once there is consensus about the state of the blockchain in a specific point of time, why should anybody care about the ledger details? It is established practice in banking and accounting systems,  they summarize accounts in balance sheets for using them as the basis for transactions thereafter, it is efficient and secure with no practical alternative available.

Unfortunately, despite its obvious decentralization advantages because of promoting full nodes, for some mysterious reason, UTXO commitment being used as a means for getting rid of unnecessary IBD, is considered nothing less than a blasphemy by most of the core devs.

As of my chat log, I don't find a serious argument, let alone a correct one, that justifies such an obsession with "history", it'll be highly appreciated if you'd kindly present me with just one example.
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
If everybody becomes a pruned node then nobody's going to have a copy of all of the blocks, so all older blocks will be lost.
That is exactly the goal yes, to make blocks mined 10 years ago obsolete to download/verify.
Also if even a handful of nodes are non-pruned that's still going to be a problem since each node has on average 40 to 50 peers. With over 10000 nodes in the network, each node needs at least one peer that has the full blockchain. However, the nodes can be stopped anytime/have network or system failures and that'll cause other nodes to lose their only peer with all the blocks.
Nodes do not need at least one peer with the full blockchain.

There is a paradox in what you are claiming here and in the proposal in your OP, and that is if you do not store the old blocks, you also cannot store their UTXO hashes in them. This means the entire guarantee that the block is valid is lost. Do you see the problem here? On paper the historic block can be assumed to be valid, but the only way you're going to get these blocks is from other peers, who might break the validity guarantee which is why it's important the old blocks are stored somewhere.
newbie
Activity: 4
Merit: 1
If everybody becomes a pruned node then nobody's going to have a copy of all of the blocks, so all older blocks will be lost.
That is exactly the goal yes, to make blocks mined 10 years ago obsolete to download/verify.
Also if even a handful of nodes are non-pruned that's still going to be a problem since each node has on average 40 to 50 peers. With over 10000 nodes in the network, each node needs at least one peer that has the full blockchain. However, the nodes can be stopped anytime/have network or system failures and that'll cause other nodes to lose their only peer with all the blocks.
Nodes do not need at least one peer with the full blockchain.
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
If this works out, it will eliminate the need for a full-chain initial block download. It also makes it possible to significantly increase block size (since nobody needs to store the full-chain anymore, everyone can be a pruned node).

If everybody becomes a pruned node then nobody's going to have a copy of all of the blocks, so all older blocks will be lost.

Also if even a handful of nodes are non-pruned that's still going to be a problem since each node has on average 40 to 50 peers. With over 10000 nodes in the network, each node needs at least one peer that has the full blockchain. However, the nodes can be stopped anytime/have network or system failures and that'll cause other nodes to lose their only peer with all the blocks.
newbie
Activity: 4
Merit: 1
So you put your trust in other nodes. The whole point is that you should always be able to verify everything by yourself. Here, you trust that a group of nodes does not have some chain of invalid blocks created out of thin air. For example, some nodes could build a chain on top of some invalid block. If that block will be below your pruning point, you will trust that all blocks are valid, even if they would be after 30th halving and most of the coins will belong to some group of miners. You could even fall into some altcoin land in this way and be convinced that you use Bitcoin.
What makes you trust the genesis block of bitcoin, and consequently all blocks after that? It is the amount of PoW (difficulty) and the chain size. Imagine you download the UTXO set of +-3 years ago, with +-100GB of blocks since that moment to present. Why would a chain of 100GB of blocks, having an UTXO hash in every block, be less trustable? The block difficulty is so high. It would be the same as joining bitcoin 3 years after it's launch.

Altcoins are altcoins because they use different algo's. If not, they would just be bitcoin. The attack you describe is something like a 51% attack.

What would stop an attacker currently to create a longer chain with extremely low mining difficulty from distributing that as the 'real' bitcoin?

No. If everyone is in pruned mode, then introducing new people to the system is impossible, because they can no longer verify everything. And that breaks backward-compatibility. The current version of Bitcoin Core requires downloading everything at least once. If all future version nodes will use pruning, then running the current version without any node with historical data will push you out of the network and will force you to trust other people, which is unacceptable. You can use pruning if you want, but you should never be forced to do so.
You are right about the backwards compatibility, so this proposal would indeed not eliminate the need for a full node to exists, it just gives a possibility to gain some trust in your downloaded UTXO without checking the full chain before you can get started.


The problem with block size limit is more than storage capacity and IBD time. I'll just quote my older post

You're missing the point, bitcoin community mostly agree that running a full node should be cheap, which means block size are limited with hardware and internet growth rate.

IMO blockchain size and internet connection aren't the worst part, but rather CPU, RAM and storage speed,
1. Can the CPU verify transaction and block real time?
2. How much RAM needed to store all cached data?
3. Can the storage handle intensive read/write? Ethereum already suffering this problem

The blockchain size will reach 500GB soon, requiring new nodes to download and process 500GB before they can trust anything... The number of full nodes is declining as well.
copper member
Activity: 909
Merit: 2301
Quote
If in every block, an hash of the UTXO-Set / UTXO-state (after applying the TX's of the previous block) is included, this would eliminate the NEED for full-nodes (and an initial block download for new nodes). This way, only the latest N blocks and latest N UTXO-set needs to be stored & distributed among peers.
You cannot rely on that hash being correct and not being just a random hash or invalid hash, creating some UTXO's out of thin air. You have to validate everything at least once and then you can trust your own data.

Quote
Miners cannot include a false UTXO hash in a block because it will be rejected by nodes (This will require nodes and miners to hash the UTXO of the previous block for every new block to mine/verify).
So you put your trust in other nodes. The whole point is that you should always be able to verify everything by yourself. Here, you trust that a group of nodes does not have some chain of invalid blocks created out of thin air. For example, some nodes could build a chain on top of some invalid block. If that block will be below your pruning point, you will trust that all blocks are valid, even if they would be after 30th halving and most of the coins will belong to some group of miners. You could even fall into some altcoin land in this way and be convinced that you use Bitcoin.

Quote
If this works out, it will eliminate the need for a full-chain initial block download.
It does not. Initial block download should be done at least once, then you can trust your own computer and for example use pruning or move historical data to another drive. But you have to do it once, in other case you would use some coins without making sure that they really exist. Even worse: if you would be convinced that you received some coins and would sign some transactions, that could be potentially used to steal real coins from you.

Quote
It also makes it possible to significantly increase block size (since nobody needs to store the full-chain anymore, everyone can be a pruned node).
No. If everyone is in pruned mode, then introducing new people to the system is impossible, because they can no longer verify everything. And that breaks backward-compatibility. The current version of Bitcoin Core requires downloading everything at least once. If all future version nodes will use pruning, then running the current version without any node with historical data will push you out of the network and will force you to trust other people, which is unacceptable. You can use pruning if you want, but you should never be forced to do so.

Quote
But the primary method is that you don't need others to verify, you can verify yourself
You cannot verify everything by yourself, that's the main problem with your proposal. You are forced to trust miners and nodes.
newbie
Activity: 4
Merit: 1
I know this idea is known already, and some small discussions can be found on the mailing list and in previous topics, however I have not seen a strong conclusion that it is not possible or feasible.
I see AssumeUTXO is an experiment trying to achieve the same thing, also looking to MAYBE include UTXO hashes in blocks in the far future....

If in every block, an hash of the UTXO-Set / UTXO-state (after applying the TX's of the previous block) is included, this would eliminate the NEED for full-nodes (and an initial block download for new nodes). This way, only the latest N blocks and latest N UTXO-set needs to be stored & distributed among peers.

Most nodes can store the UTXO data, which does not grow rapidly. Verification of a freshly downloaded UTXO set (by new nodes) happens by looking at the UTXO-hash in the corresponding block on the chain. Miners cannot include a false UTXO hash in a block because it will be rejected by nodes (This will require nodes and miners to hash the UTXO of the previous block for every new block to mine/verify).
While the UTXO set is 4GB or so, a solution can be made to make it easier for nodes and miners to hash the 4GB UTXO set. Merkle Tree? Keeping most hashes the same every new block, submitting the tree-root UTXO hash of previous block in a new block. Some thoughts were already being exchanged on what the most useful format/data-structure for this would be: https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2017-February/013590.html Note that a Merkle Tree does not need to be a Binary Tree.
In short, the data-structure should be specialized to hash a big file with minor (random?) local changes, I believe such a thing can be made.
 
If this works out, it will eliminate the need for a full-chain initial block download. It also makes it possible to significantly increase block size (since nobody needs to store the full-chain anymore, everyone can be a pruned node).

Picture this: You are a new node joining the network (currently you need to download and process the full blockchain, starting from the genesis block, to verify your UTXO download).
The way to validate your UTXO download, is by looking at the size of the chain going forward. A freshly downloaded UTXO set from 1 block ago will not have much PoW spent on it, thus a new node can be less sure it wasn't modified by an evil node from which he downloaded it. An UTXO set of 500 blocks ago will have much more PoW spent on it. Trust comes from the age of the UTXO state. The older the UTXO state, the harder to modify, since the PoW chain is longer, and the chain contains UTXO state hashes which have to be correct each block.

Another, additional way of adding trust to a downloaded UTXO is to query other nodes if they have the same hash for an UTXO state, corresponding to a block's timestamp, the more nodes confirm, the more trust it would add. (But the primary method is that you don't need others to verify, you can verify yourself).

Jump to: