Pages:
Author

Topic: Efficient Blockchain Data Management (Read 230 times)

legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
With the default assumeValid, it's not fully verifying from scratch, since it only verifies historical scripts and signatures of the last few years. You're trusting the developers' claim that all earlier scripts are valid too.
Since I read about that, I always felt that it goes agains "verify, don't trust". Since you're downloading all blocks anyway, how hard can it be (for your computer) to verify everything it downloads?
legendary
Activity: 990
Merit: 1108
When you say "from scratch", do you mean that you changed assumeValid to be 0 (instead of defaultAssumeValid) ?
With "from scratch" I meant no blocks have been stored, folders blocks and chainstate were empty. I didn't touch or use assumeValid in my bitcoin.conf.
With the default assumeValid, it's not fully verifying from scratch, since it only verifies historical scripts and signatures of the last few years. You're trusting the developers' claim that all earlier scripts are valid too.
hero member
Activity: 714
Merit: 1010
Crypto Swap Exchange
January 09, 2025, 01:54:52 PM
#20
When you say "from scratch", do you mean that you changed assumeValid to be 0 (instead of defaultAssumeValid) ?
With "from scratch" I meant no blocks have been stored, folders blocks and chainstate were empty. I didn't touch or use assumeValid in my bitcoin.conf.

In this post I listed a few other configuration details, not entirely sure if I had blocksonly=1 in my config. I would use that too for an IBD.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
January 09, 2025, 04:02:39 AM
#19
@OP: you're proposing something based on an inaccurate prediction of blockchain growth for the next 200 years. When I started with Bitcoin, I had a 512 GB spinning disk. Now, I have 12 times more storage, and the blockchain growth is not a problem. When needed, I prune it. Your proposal could work for running a node for your own use, but it makes it impossible for a new user to verify all blocks from the genesis block. And since that's the very basic of "Verify, don't Trust", it shouldn't be removed. It's like downloading a snapshot from a pruned node:
don't do this
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
January 09, 2025, 03:13:14 AM
#18
There is already a proposal for something like this which has to do with compact block rollups, but it would require a way for the blockchain for a way to bundle up all of the unspent UTXOs into one list, from a carefully chosen date threshold, which all of the Bitcoin Core clients then serve up in a new version as an alternative to downloading all of the blocks in that range.
legendary
Activity: 990
Merit: 1108
January 09, 2025, 02:33:48 AM
#17
finished a full IBD from scratch up-to block 796033 in about 95h
When you say "from scratch", do you mean that you changed assumeValid to be 0 (instead of defaultAssumeValid) ?
hero member
Activity: 714
Merit: 1010
Crypto Swap Exchange
January 08, 2025, 09:19:35 PM
#16
~~~
Due to Segwit the blocksize isn't limited to 1MB only. The current average blocksize is around 1.7MB, see e.g. https://ycharts.com/indicators/bitcoin_average_block_size. Therefore yearly growth is noticeably larger than your calculated ~51.36GB. With a "Segwit factor" of around 1.7 we'd land at about ~87.3GB which is pretty close to what I got from a chart.

My earlier written value of around 91GB (not exactly sure if it's actually 91GiB) is based on values from a graph which displayed total blockchain size in GB over the last two years, see change in slope with the 3y- or 5y-view in my provided link. The chart had a slight rise in the growth slope around beginning of 2023, so I didn't use a larger observance window but rather the slightly worse growth rate of last two years (reason for worse growth rate is debatable).

I then simply calculated 200y times 91GB/y neglecting possible growth of transaction indices and UTXO set (chainstate). People send daily numerous coin pieces mostly to the Genesis block which bloats the UTXO set as I don't expect Satoshi to consolidate UTXOs "donated" to the Genesis block or block #9.

A larger UTXO set puts a burden on RAM need for a speedy IBD as it's beneficial to keep the UTXO set in RAM during IBD otherwise speed suffers from heavy I/O to storage media (will be painfully slow with low random IOps devices like mechanical harddrives).

I'm not too worried about the size and growth of the UTXO set over time.


Do you think the size of the blockchain will bloat more than my calculation suggests just 20 years from now?
You proposed a number after 200 years where you didn't explain how it was calculated and which I highly doubt to be any close. Why 200 years anyway? It's already hard to make predictions of technological advances for the next few decades.

As we don't know how Bitcoin will evolve, I find it already a bit difficult to make more or less accurate predictions for the next few decades.


I started by asking whether the continuously growing blockchain would become a burden on nodes in the future.
So far, I think it's managable and as computers evolve, too, I don't see yet a real problem.

My last experiment with a low power device, a Raspi 4B with 8GiB RAM, in June 2023 finished a full IBD from scratch up-to block 796033 in about 95h with a SATA 1TB SSD connected via an USB3-SATA adapter, network connection was Tor-only with a stable 100MBit internet connection. I expected it would take longer and was positively surprised.
member
Activity: 73
Merit: 31
January 08, 2025, 08:06:36 AM
#15
Can the block size exceed 1 MB? I don’t know the details about this.


Bitcoin block were limited to being just 1MB before the activation of SegWit. After SegWit was introduced blocks became adjustable unto 4MB
?
Activity: -
Merit: -
January 08, 2025, 07:00:38 AM
#14
Can the block size exceed 1 MB? I don’t know the details about this.

hero member
Activity: 714
Merit: 1298
January 08, 2025, 06:57:42 AM
#13


While older blocks are removed, the integrity and security of the blockchain remain intact. The summarization process ensures that all critical transaction data is still accessible and verifiable, and older data is still available in a summarized form for auditing or reference purposes.


This would need the existence of  let's call them super nodes that keep those older blocks which means the end of decentralization and likely between-nodes propagation delay as all other nodes will depend on a few  super nodes at the time when they consume relevant old data.

Storage technique is progressing with time thus it is better to  count the chickens after they are hatched.
member
Activity: 73
Merit: 31
January 08, 2025, 06:10:42 AM
#12
Do you think the size of the blockchain will bloat more than my calculation suggests just 20 years from now?

I started by asking whether the continuously growing blockchain would become a burden on nodes in the future.
If the blockchain becomes extremely large, the affordability reduces and in-turn only few nodes may be able to store and maintain it and this will have a direct impact on decentralization. The limit on block size as well as the average duration (time) between blocks impacts the blockchain's growth rate directly. That is if the block size remains constant and the interval for each block maintains the estimated 10 minutes, then the growth can be estimated fairly accurate.

Code:
# Constants for blockchain size growth calculation
block_size_mb = 1  # Average block size in MB (can be adjusted for SegWit or future changes)
blocks_per_day = 144  # Total number of blocks generated per day
days_per_year = 365.25  # Average days per year, also accounting for leap years
years = 20  # Number of years for projection

# Calculate daily, yearly, and 20-year blockchain growth
daily_growth_mb = block_size_mb * blocks_per_day
yearly_growth_mb = daily_growth_mb * days_per_year

twenty_year_growth_mb = yearly_growth_mb * years

# Convert MB to GB for easier interpretation
daily_growth_gb = daily_growth_mb / 1024
yearly_growth_gb = yearly_growth_mb / 1024
twenty_year_growth_gb = twenty_year_growth_mb / 1024

# Printing the Output
print(f'Daily Growth: Approximately {daily_growth_gb}GB ({daily_growth_gb * 1000})MB Per day')
print(f'Yearly Growth: Aproximately {yearly_growth_gb}GB ({yearly_growth_gb} * 1000)MB per year' )
print(f'Twenty Years Growth: {twenty_year_growth_gb}GB ({twenty_year_growth_gb} * 1000) per twenty years')
 
legendary
Activity: 2870
Merit: 7490
Crypto Swap Exchange
January 08, 2025, 04:54:03 AM
#11
I believe the minimum amount of information that must be stored is the last block, the UTXO set, and the timestamps of the blocks since the last difficulty adjustment.

This UTXO set and timestamps could be secured by adding a hash of them to every block. Something like this has been proposed in the past.

I think you're talking about "UTXO commitment", where it require miner to submit merkle root of UTXO set. It's never implemented to Bitcoin Core due to various reason (such as amount of data need to be hashed), but Bitcoin Core have similar feature called "UTXO snapshot"[1].

Do you think the size of the blockchain will bloat more than my calculation suggests just 20 years from now?

Why don't you just show your calculation step by step?

[1] https://blog.lopp.net/bitcoin-node-sync-with-utxo-snapshots/
?
Activity: -
Merit: -
January 08, 2025, 03:20:47 AM
#10
Do you think the size of the blockchain will bloat more than my calculation suggests just 20 years from now?

I started by asking whether the continuously growing blockchain would become a burden on nodes in the future.

Then, I proposed an approach to this community to gather opinions and discuss it together.

hero member
Activity: 714
Merit: 1010
Crypto Swap Exchange
January 08, 2025, 01:17:04 AM
#9
I recalculated the storage requirement for a full node over 200 years, and the data size is only 105 TB. Satoshi's design is truly remarkable.
I've some doubts that your prediction of blockchain size is correct.

My full node shows this current space consumption:
Code:
666G ./blocks
12G ./chainstate
66G ./indexes
For a more accurate prediction it's necessary to look at the slope of growth since the time when blockchain data spammers started to congest mempools (Ordinals crap and similar bs). While there were periods of mempool congestion before Taproot for other reasons, mempools got emptied below blocksize limit eventually. Since a few years after inception of Ordinals and NFT crap on Bitcoin blockchain, I haven't observed non-full blocks (didn't check every block though) with exception of miners publishing "empty blocks" (only with mandatory coinbase transaction) occasionally.

16 years yielding 666GiB, I'm aware it's not been 16 years of full blocks, I hardly can believe your calculation of landing at 105TB (I assume you meant TiB) after 200 years.

My rough estimate with a growth slope of about 91GiB/y for the last two years1, gives me a crude rough growth of 18,200 GiB in 200y which gives us a blockchain size in the ballpark of 18.x TiB. I'm pretty sure we'll have capable storage media and communication network technology by then to handle this easily.

[1] See graph with 3y- or 5y-view at https://ycharts.com/indicators/bitcoin_blockchain_size


But anyway, that's not my gripe with your proposal. You don't provide any solution how to implement what you propose, so it's kind of empty and leaving out a picture of the required added complexity.

What about retrieval and verification of old transactions? Bitcoin's transaction model is not based on balances. It seams incompatible to me to suddenly add balances in some sort of "summary".


2. Archiving and Data Removal:

Once the summarized data is generated, older blocks are archived or removed from the active blockchain data. These older blocks are still available for reference, but no longer need to be part of the active ledger that full nodes must store.
And exactly how do you want to achieve this in a trustless and decentralized way? Remember some of the key design features of Bitcoin?


I don't want to repeat or go further into details of your proposal for now as I don't quite see the necessity as critized by prior replies.



Post edit: added reference to blockchain size graph
legendary
Activity: 4522
Merit: 3426
January 07, 2025, 05:12:33 PM
#8
I believe the minimum amount of information that must be stored is the last block, the UTXO set, and the timestamps of the blocks since the last difficulty adjustment.

This UTXO set and timestamps could be secured by adding a hash of them to every block. Something like this has been proposed in the past.
newbie
Activity: 8
Merit: 0
January 07, 2025, 12:19:59 PM
#7
While reducing storage requirements and ensuring decentralization are critical for Bitcoin’s long-term sustainability, it’s vital to consider the potential tax and compliance implications of transaction summarization and data expiration.

As experts at Crypto Accountants, we recognize that detailed transaction histories are often necessary for audits, tax reporting, and legal compliance. Archiving older data in summarized formats may complicate these processes, especially for jurisdictions requiring granular transaction records.

Balancing scalability with compliance will be key to ensuring that innovations like this can benefit users without creating additional regulatory hurdles.
?
Activity: -
Merit: -
January 07, 2025, 04:06:26 AM
#6
I recalculated the storage requirement for a full node over 200 years, and the data size is only 105 TB. Satoshi's design is truly remarkable.
legendary
Activity: 2870
Merit: 7490
Crypto Swap Exchange
January 07, 2025, 03:28:45 AM
#5
"Thank you for the feedback! I'm aware of pruned mode. My proposal adds a mechanism for summarizing historical data and using backup nodes for optional access. This aims to balance scalability with data availability in the long term. Do you see any potential in this approach?"

No,
1. Pruned mode also offers all 5 benefits stated on your proposal, without the complexity.
2. We can rely on Bitcoin node which store whole blockchain data. So there's no need to add new category of node (backup nodes in this case).
legendary
Activity: 990
Merit: 1108
January 06, 2025, 06:54:10 AM
#4
My proposal aims to address this challenge by reducing the data storage requirements without compromising security.
Your proposal is lacking the most essential bit of information: the amount of savings. If you can only save a few % of data storage, then the added complexity is not worth it. So please answer the question: by how much do you propose to reduce data storage requirements?

Btw, are you aware of the Mimblewimble protocol? That allows forgetting almost all data of spent outputs, except for a ~100 byte kernel per transaction, with no impact on full verifiability.
?
Activity: -
Merit: -
January 06, 2025, 04:21:07 AM
#3
"Thank you for the feedback! I'm aware of pruned mode. My proposal adds a mechanism for summarizing historical data and using backup nodes for optional access. This aims to balance scalability with data availability in the long term. Do you see any potential in this approach?"

Pages:
Jump to: