Author

Topic: Data compression (Read 609 times)

hero member
Activity: 912
Merit: 747
November 10, 2017, 11:11:57 AM
#10
Most of the data in Bitcoin blocks and mempool transactions is incompressible
This thread got me curious, so I've tested it for myself using bzip2 (options -z -9). Result (in kB):
Code:
130820  blk00400.dat
106864  blk00400.dat.bz
The compressed file is 18.3% smaller. Considering the current cost of disk space, and the complications it would give to read back data (for a wallet rescan), I see no reason to implement this.

I backed up the blockchain data recently (late October) it was 160GB uncompressed, I used 7zip with normal compression and it is 112GB in 25 dvd sized archive files.
Why would you do this? Unless you have a very slow and very expensive internet connection, loading 25 DVDs is much more work than just downloading the blockchain again.

Data retention. In case of loss or unrecoverable error, you can use the backup, instead of having to download the entire blockchain. A backup lets you be back in sync within hours. The purpose of having dvd sized archives is to ensure file transfer ease and integrity, not to actually use dvd media as storage. Cheers!
hero member
Activity: 912
Merit: 747
November 10, 2017, 11:00:28 AM
#9
As discussed in the Introduction,data compression has wide application in terms of information storage,including representation of the abstract data type string and file compression.Huffman coding is used for compression in several file archival systems [ARC 1986; one of the adaptive schemes to be discussed in Section 5.An adaptive Huffman coding technique is the basis for the compact command of the UNIX operating system.
one could expect to see even greater use of variable-length coding in the future.

Really, isn't it amazing?

lol bot users?

I'm sure bitcoin could benefit similarly to websites using gzip to deliver content -if- it can be applied. 40+GB is a pretty big difference, so just offering up some data.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
November 10, 2017, 10:44:59 AM
#8
Most of the data in Bitcoin blocks and mempool transactions is incompressible
This thread got me curious, so I've tested it for myself using bzip2 (options -z -9). Result (in kB):
Code:
130820  blk00400.dat
106864  blk00400.dat.bz
The compressed file is 18.3% smaller. Considering the current cost of disk space, and the complications it would give to read back data (for a wallet rescan), I see no reason to implement this.

I backed up the blockchain data recently (late October) it was 160GB uncompressed, I used 7zip with normal compression and it is 112GB in 25 dvd sized archive files.
Why would you do this? Unless you have a very slow and very expensive internet connection, loading 25 DVDs is much more work than just downloading the blockchain again.
mda
member
Activity: 144
Merit: 13
November 10, 2017, 04:43:26 AM
#7
As discussed in the Introduction,data compression has wide application in terms of information storage,including representation of the abstract data type string and file compression.Huffman coding is used for compression in several file archival systems [ARC 1986; one of the adaptive schemes to be discussed in Section 5.An adaptive Huffman coding technique is the basis for the compact command of the UNIX operating system.
one could expect to see even greater use of variable-length coding in the future.

Really, isn't it amazing?
newbie
Activity: 9
Merit: 0
November 10, 2017, 12:50:38 AM
#6
I backed up the blockchain data recently (late October) it was 160GB uncompressed, I used 7zip with normal compression and it is 112GB in 25 dvd sized archive files.

Data compression;

When data compression is used in a data transmission application,the goal is speed. Speed of transmission depends upon the number of bits sent,the time required for the encoder to generate the coded message and the time required for the decoder to recover the original ensemble. In a data storage application,Although the degree of compression is the primary concern,it is nonetheless necessary that the algorithm be efficient in order for the scheme to be practical.

As discussed in the Introduction,data compression has wide application in terms of information storage,including representation of the abstract data type string and file compression.Huffman coding is used for compression in several file archival systems [ARC 1986; one of the adaptive schemes to be discussed in Section 5.An adaptive Huffman coding technique is the basis for the compact command of the UNIX operating system.
one could expect to see even greater use of variable-length coding in the future.
newbie
Activity: 9
Merit: 0
November 10, 2017, 12:25:35 AM
#5
I backed up the blockchain data recently (late October) it was 160GB uncompressed, I used 7zip with normal compression and it is 112GB in 25 dvd sized archive files.

Data compression;

When data compression is used in a data transmission application,the goal is speed. Speed of transmission depends upon the number of bits sent,the time required for the encoder to generate the coded message and the time required for the decoder to recover the original ensemble. In a data storage application,Although the degree of compression is the primary concern,it is nonetheless necessary that the algorithm be efficient in order for the scheme to be practical.
hero member
Activity: 912
Merit: 747
November 09, 2017, 09:49:11 PM
#4
I backed up the blockchain data recently (late October) it was 160GB uncompressed, I used 7zip with normal compression and it is 112GB in 25 dvd sized archive files.
hero member
Activity: 525
Merit: 531
November 09, 2017, 11:27:46 AM
#3
i "move" blk*.dat and rev*.dat file to a squashfs file, and remount back to bitcoin core, here is the stats:

Code:
# du -hs bitcoin-blocks?.squashfs
53G     bitcoin-blocks0.squashfs
56G     bitcoin-blocks1.squashfs
# du -hs blocks-ro?
71G     blocks-ro0
71G     blocks-ro1

both files contain 500-500 blk*.dat files, compression rate is 34%. so transaction data can be compressed.
member
Activity: 98
Merit: 26
November 08, 2017, 11:19:36 PM
#2

Most of the data in Bitcoin blocks and mempool transactions is incompressible, while some of it is trivially compressible (certain format fields, etc.) I am guessing that the Core devs would be interested in more aggressive compression using Merkle hashes. Since it is basically the case that the entire blockchain is an immutable, read-only structure (except just when a new block arrives), the only time you need to transmit raw data is for new transactions and the latest block. I doubt that the bandwidth for these is a bottleneck, so that 25% is probably not worth the cost of optimization. For other kinds of synchronization between nodes, all you need to transmit are the hashes.
mda
member
Activity: 144
Merit: 13
Jump to: