Author

Topic: Methods for cutting old blocks from the blockchain (Read 427 times)

legendary
Activity: 2870
Merit: 7490
Crypto Swap Exchange
~
But UTXO commitment have niche advantage such as enabling user to send Bitcoin while their full node wallet download and verify whole blockchain.

that specific issue has multiple other ways to be solved. for instance a user in rush for spending their outputs could use bloom filters or other SPV specific methods such as the way Electrum nodes work to fetch their UTXO.

And that's exactly why i use keyword "niche"

It's pretty similar and it's called UTXO commitment. The key differences are miners only submit hash of the UTXO (or hash of the merkle UTXO) and don't need to submit the UTXO on specific website.

I just looked at the UTXO commitment technology. In my opinion, a UTXO dump should also contain the list of block headers, starting with the GENESIS block or at least 2 years prior to the set date. Otherwise an attacker will create a malicious UTXO dump, mine a relatively short blockchain with fake timestamps and publish this file in the future. Users should be able to estimate the amount of Proof-of-Work operations that have been performed since the publication of the UTXO dump and before.

The latest UTXO dump can be publicly hosted on a specific website, in torrents or on the Bitcoin network.

I don't see the point since full node client download block header first before download whole blockchain.

Besides, dumping UTXO on specific website or torrent will introduce Single Point of Failure. Ensuring there's deterministic way to create UTXO and store it locally is better option IMO.
legendary
Activity: 3472
Merit: 10611
I did not know about it. I thought that the full node client downloads the entire block #2, validates it, then downloads the entire block #3, and so on.
it depends on what block is the node downloading. if it is during the first synchronization (downloading block 1 to today's) then the entire block has to be downloaded. if it is the new blocks that are created as the node is already synced and has a mempool then there are let call it "optimization" in place to reduce the amount of data being downloaded. basically the node downloads the block header and then checks which transaction is not already in its mempool. usually all of them are so there is no need to re-download them.

Quote
~
Perhaps this is the main disadvantage of this feature. If most node clients will be able to load and create a UTXO dump, far fewer people will share the old Bitcoin blocks on their computers.
maybe not. i pointed out some other disadvantages (having to trust third parties) so many bitcoin users will never use such a feature ever.
legendary
Activity: 2618
Merit: 2304
I don't see the point since full node client download block header first before download whole blockchain.

I did not know about it. I thought that the full node client downloads the entire block #2, validates it, then downloads the entire block #3, and so on. I wanted to point out that users who load a UTXO dump should validate all block headers starting with the GENESIS block.

To be honest, I do not believe that the second method will ever be implemented because Bitcoin miners will not approve of such a consensus. Therefore, the first method seems to me more feasible.



some people will prefer to rely on a certain person (or persons) whom they trust,
you don't see the problem with that?

I see the problem, but I think it is inevitable. Imagine Bitcoin blockchain has 5 million blocks. Network synchronization will take a huge time and much resources. I guess that people will personally download and validate new Bitcoin blocks once and create a UTXO dump for themselves (or their close friends) to load this file as a snapshot in the future.



Quote
2) would the amount of full nodes be lower if UTXO commitments were introduced?
if any full node implementation of Bitcoin other than bitcoin core added such a feature, then no.

Perhaps this is the main disadvantage of this feature. If most node clients will be able to load and create a UTXO dump, far fewer people will share the old Bitcoin blocks on their computers.
legendary
Activity: 2898
Merit: 1823
What would be the point of Bitcoin if you're not verifying/validating everything for yourself? For your convenience, use a SPV wallet, but don't confuse that your actually/truly using Bitcoin.
legendary
Activity: 3472
Merit: 10611
What we would get basically with UTXO commitments is a third class of nodes besides full nodes (including pruned ones) and SPV nodes. This class would not be as useful to request blockchain data from, as they only host a part of it, and they cannot guarantee in the same way that their data is correct.
i guess if an implementation of Bitcoin protocol added this feature as an option then it wouldn't be the worst thing.

some people will prefer to rely on a certain person (or persons) whom they trust,
you don't see the problem with that?
legendary
Activity: 2618
Merit: 2304
instead of downloading the entire blockchain, you can use PRUNE mode
to prune means "reduce the extent of (something) by removing superfluous or unwanted parts." and that is exactly what it does, it "removes". it does NOT skip.
in other words you still have to download the entire blockchain, build the UTXO database and then remove the old blocks.

Quote
and online blockchain explorers,
you can not include this as an alternative because it is centralized.

I forgot that PRUNE mode simply deletes obsolete data which has already been downloaded from the Bitcoin network, so yes, this is not a proper solution to cut old blocks from the blockchain. I also agree that online blockchain explorers cannot be considered an alternative.


Quote
a certain trusted person who has an excellent reputation in the cryptocurrency community
this goes against everything that bitcoin stands for.
in bitcoin you are supposed to "verify" "everything" "yourself" instead of relying on any "third party". if people want to rely and trust someone else then they should use banks and fiat instead of bitcoin.

You're right, but the blockchain size is rapidly growing day by day, so I assume that some people will prefer to rely on a certain person (or persons) whom they trust, instead of downloading and verifying the entire multi-gigabyte blockchain, spending much time and consuming a lot of computational resources.


Quote
If other miners do not agree that all the data included in this file is correct, they will not continue to mine this branch, so the mined block will become an orphan.
in other words every month we get a huge amount of drama with pools trying to damage other pools and taking their block reward in competition over hundreds of thousands of dollars by simply rejecting their block!

A UTXO dump should be standardized in accordance with the format described. All tables are sorted and serialized by one algorithm, so it is impossible to create two different UTXO dumps for the same Bitcoin block. If one mining pool merely rejects a new block for no solid reason and tries to mine its own branch, it will lag behind other mining pools which have accepted this valid block containing the UTXO dump hash and continued to work on the main branch.

By the way, in such a scheme, miners should create the UTXO dump based on the block which was added to the blockchain, for example, 12 blocks back (i.e. about 2 hours ago), so they will have enough time to calculate the SHA256 hash in advance. Then, one of the lucky mining pools will write the hash of this UTXO dump in the input of the COINBASE transaction.



It's pretty similar and it's called UTXO commitment. The key differences are miners only submit hash of the UTXO (or hash of the merkle UTXO) and don't need to submit the UTXO on specific website.

I just looked at the UTXO commitment technology. In my opinion, a UTXO dump should also contain the list of block headers, starting with the GENESIS block or at least 2 years prior to the set date. Otherwise an attacker will create a malicious UTXO dump, mine a relatively short blockchain with fake timestamps and publish this file in the future. Users should be able to estimate the amount of Proof-of-Work operations that have been performed since the publication of the UTXO dump and before.

The latest UTXO dump can be publicly hosted on a specific website, in torrents or on the Bitcoin network.



The problem seems to be (I'm not an expert!) that they're not trivial to implement. The hard problem seems to be the size of the UTXO set which is several GB large (as of today, 65 million UTXOs with 3,7 GB, see statoshi).

In my opinion, 3,7 gigabytes is not so much. I am wondering what format they used to serialize their UTXO set.
legendary
Activity: 3906
Merit: 6249
Decentralization Maximalist
there is a much bigger problem that is ofttimes neglected and that is the fact that such change goes against one of the main principles of bitcoin. in bitcoin the only thing we hardcode and "trust" is the genesis block. literary everything else from block #0 to the current block is being verified by a full node.
Good point, from my interpretation it changes the security model from "trusting only the genesis block" into a concept more similar to the "finalization" model ("after X confirmations a transaction should be stable"), with X being the depth of the last block we don't verify ourselves but rely on an UTXO commitment.

What we would get basically with UTXO commitments is a third class of nodes besides full nodes (including pruned ones) and SPV nodes. This class would not be as useful to request blockchain data from, as they only host a part of it, and they cannot guarantee in the same way that their data is correct.

What we would have to ask then, is:

1) is this third class of nodes more useful for the network than SPV nodes are (whose contribution from my understanding is almost zero)?
2) would the amount of full nodes be lower if UTXO commitments were introduced?
3) would it give advantages regarding SPV to the users?

I think question 3 is almost sure to have a positive answer, as the "UTXO commitment-following" nodes (is there an accepted term for them?) are not subject ot the well-known SPV weaknesses. For the average user, however, SPV clients may be enough.

To answer Question 1, you have to evaluate if the potential danger these nodes could potentially represent (which should be very low, because block headers are always verified by all nodes, but not entirely zero) outweights the potential advantage to get more nodes where nodes can query recent blocks and transactions from. Question 2 will likely have a positive answer but while there is no shortage of "real full nodes", then its perhaps not dangerous. But the extent of the decrease of "real full nodes" also depends on the computation-heavyness of the verification of UTXO commitments, so this may be one of the main challenges to overcome.

(By the way: this is the original UTXO commitment discussion - will read a bit there)

All the potential problems would apply, obviously, also to the ideas the OP presented.
legendary
Activity: 3472
Merit: 10611
~
But UTXO commitment have niche advantage such as enabling user to send Bitcoin while their full node wallet download and verify whole blockchain.

that specific issue has multiple other ways to be solved. for instance a user in rush for spending their outputs could use bloom filters or other SPV specific methods such as the way Electrum nodes work to fetch their UTXO.
legendary
Activity: 3472
Merit: 10611
The problem seems to be (I'm not an expert!) that they're not trivial to implement.

there is a much bigger problem that is ofttimes neglected and that is the fact that such change goes against one of the main principles of bitcoin. in bitcoin the only thing we hardcode and "trust" is the genesis block. literary everything else from block #0 to the current block is being verified by a full node.
if we do a UTXO commitment then we can no longer call any node that downloads that and starts from that a real Full Node because it will be skipping years of verification and is then trusting everyone else who has been around to not have conspired anything. so now another argument arises: if someone is willing to skip verification then why don't they use SPV nodes, they also skip verifications and rely on other nodes to have done it properly already. keep in mind that we have no shortage of full nodes.
legendary
Activity: 3906
Merit: 6249
Decentralization Maximalist
UTXO commitments (which is the decentralized variant of OP's last method, like ETFBitcoin already wrote) is indeed something I would love to see in Bitcoin. They are discussed some years already (at least from 2015 on) and so people could ask why they're not already implemented.

The problem seems to be (I'm not an expert!) that they're not trivial to implement. The hard problem seems to be the size of the UTXO set which is several GB large (as of today, 65 million UTXOs with 3,7 GB, see statoshi). It should be possible for full nodes, not only for miners, to verify UTXO commitments. A standard merkle tree would take a lot of resources to compute, according to this Stackexchange thread, which would basically limit the full verification process to nodes with strong computers, and thus not help the average Bitcoin full node owner.

So a efficient, incremental algorithm would be needed. I wonder if there was some progress about that? A method is included in the Stackexchange answer (adding two commitments: added UTXOs and spent UTXOs) but it's difficult to find information about more recent discussions/developments.

While I've also read that it needs a hard fork, I'm not too sure about that. The hash should be possible to be commited in a way old clients can ignore them. So it maybe is softfork-compatible.

A kind of "Poor Man's UTXO Commitment" with some minimal trust involved, to improve the security of SPV clients, would be interesting at a first glance, but I have no idea how sybil attacks on that could be prevented.
legendary
Activity: 2898
Merit: 1823
instead of downloading the entire blockchain, you can use PRUNE mode

to prune means "reduce the extent of (something) by removing superfluous or unwanted parts." and that is exactly what it does, it "removes". it does NOT skip.
in other words you still have to download the entire blockchain
, build the UTXO database and then remove the old blocks.


Plus I would like to highlight this for newbies. You are not merely downloading the blockchain, you are VALIDATING it for YOURSELF.

I believe that's the function newbies sometimes miss.
legendary
Activity: 3472
Merit: 10611
instead of downloading the entire blockchain, you can use PRUNE mode
to prune means "reduce the extent of (something) by removing superfluous or unwanted parts." and that is exactly what it does, it "removes". it does NOT skip.
in other words you still have to download the entire blockchain, build the UTXO database and then remove the old blocks.

Quote
and online blockchain explorers,
you can not include this as an alternative because it is centralized.

Quote
a certain trusted person who has an excellent reputation in the cryptocurrency community
this goes against everything that bitcoin stands for.
in bitcoin you are supposed to "verify" "everything" "yourself" instead of relying on any "third party". if people want to rely and trust someone else then they should use banks and fiat instead of bitcoin.

Quote
If other miners do not agree that all the data included in this file is correct, they will not continue to mine this branch, so the mined block will become an orphan.
in other words every month we get a huge amount of drama with pools trying to damage other pools and taking their block reward in competition over hundreds of thousands of dollars by simply rejecting their block! need i remind you of how long it took to come to >95% agreement for SegWit fork in 2017? hint: it was a couple of years.

similar to last method, this one is also adding centralization to bitcoin.
legendary
Activity: 2618
Merit: 2304
Reserved.
legendary
Activity: 2618
Merit: 2304
The blockchain size is constantly increasing, and as a result, it is necessary to store a large amount of data on the computers of cryptocurrency holders, so I believe that minimizing of the blockchain is a matter of current interest. Many users do not want to download the entire multi-gigabyte blockchain, that has been created over the years of the cryptocurrency's existence, for the sole purpose of sending several transactions securely. Therefore, I propose to publish here various methods for cutting old blocks from the blockchain.

Of course, in any case the entire blockchain is necessary for the full verification and validation of all transactions, that have been performed since the mining of the first GENESIS block, as well as for the analysis of coin transfers from one address to another. As a variant, instead of downloading the entire blockchain, you can use PRUNE mode and online blockchain explorers, but in this case new vulnerabilities are added to such solutions due to the possible inaccessibility of the website, phishing and hacking of DNS servers. I am talking about alternative methods for storing the current state of cryptocurrency addresses for a client-side computer.



The first method for Bitcoin

According to Bitcoin technology, to perform a transaction, you must know the SHA256D hash of the previous transaction and the index number of the unspent output in this transaction. Therefore, in order to minimize the blockchain size, it is only necessary to store the structure and data related to all unspent UTXO outputs and Bitcoin transaction hashes which contain these outputs with their index numbers in this transaction.


A sample dump contains three tables:


1. The list of Bitcoin transactions which contain at least one unspent output:
1) Structure of the transaction 1:
  • SHA256 hash of the transaction (32 bytes)
  • transaction parameters (lock_time, SegWit data, etc)
2) Structure of the transaction 2
...

The table is sorted by the ascending "SHA256 hash of the transaction" field which has the little-endian byte order.


2. The list of unspent outputs UTXO:
1) Structure of the output 1:
  • output script
  • amount of BTC coins in the output
  • index number of the transaction (according to the list above)
  • index number of the output in this transaction
2) Structure of the output 2
...

The table is sorted in ascending order, firstly by the field "index number of the transaction", secondly by the field "index number of the output". Both fields have the big-endian byte order.


3. The list of block headers:
1) Structure of the header 1:
  • index number of the block in the blockchain
  • block header
2) Structure of the header 2
...

The table is sorted by the ascending "index number of the block" field which has the big-endian byte order. In fact, this table is a chain of block headers.


In addition, the dump contains the following fields:
  • dump version
  • cryptocurrency name

The "dump version" field has the big-endian byte order.


The essence of the method for cutting old blocks from the blockchain is that, for example, every month a certain trusted person who has an excellent reputation in the cryptocurrency community creates the list of unspent transaction outputs, which is described above, and calculates the SHA256 hash of this dump. Then he signs a standardized comment, that contains the size and the hash of the created file, with his ECDSA secp256k1 key and publishes this dump on some well-known website. Users download this file, verify the digital signature and load the dump to a special version of Bitcoin Core. Afterwards they download new blocks, starting with the last block number in the dump.

Thus, users will not need to download the entire blockchain to their computers, and the correctness of the data will be based on the trust in the person who created the list of unspent transaction outputs and signed it with his ECDSA key. Moreover, network synchronization will take the minimum time.



The second method for Bitcoin

It's the same scheme, but the difference is that, for example, every 4320 blocks (i.e. approximately every month) Bitcoin miners add a standardized comment to the input script of the COINBASE transaction. This comment should contain the size and the SHA256 hash of the file, which is a list of unspent UTXO outputs and is published on some well-known website. This will mean that the miner confirms that the published list is correct and can be safely used by users on their computers. If other miners do not agree that all the data included in this file is correct, they will not continue to mine this branch, so the mined block will become an orphan.

In this case, users who have downloaded and verified the list of unspent transaction outputs will rely on the amount of Proof-of-Work operations that were performed after the miner confirmed that the data included in this file is correct.



Actually, although the second method looks more convincing, it is almost practically unrealizable, because all the Bitcoin miners need to approve consensus. But in the current situation, miners are indifferent to the texts written in the input of the COINBASE transaction, so they will mine any valid branch and will not check the correctness of the published file which contains a list of unspent outputs. Therefore, in my opinion, the first method for Bitcoin looks more preferable.

However, the second method can be implemented in another cryptocurrency the consensus of which will require miners to check the size and the hash of the dump of unspent outputs that are written in the input script of the COINBASE transaction. In addition, the full node will not need to download a published dump, because it will can compose such the file itself by the algorithm described above.



If you wish, propose here other methods for cutting old blocks from the blockchain with the full description of your algorithm, not only for Bitcoin. I will edit the second post and add links to your ideas and solutions.
Jump to: