Author

Topic: Sending old blocks as compressed by default (Read 216 times)

staff
Activity: 4284
Merit: 8808
December 27, 2020, 08:00:05 AM
#10
Some people mention the trade-off between storage and CPU usage, but how about using LZ4 compression algorithm which is fastest compression algorithm with good compression ratio (some project even claim it's fast enough for real-time/on-the-fly decompression) and widely used by big companies/project?
Many (most?) users are currently CPU bottlenecked on validation, so any additional CPU will slow down their sync.  Changing the speed of the compression just changes the point at which the tradeoff moves.

[Aside, Lz4 doesn't do that much on transactions]

I hope I didn't make it sound like I think it shouldn't be done.  I think it should be done, and probably supported over P2P too-- as a negotiable feature.  Local usage could easily be optional, I expect most non-pruned users would want it on to save space unless their node is stored on a 10TB spinning disk and they don't care about the usage or they're on a small arm device and the extra cpu would be brutal while peers are syncing from them.

It's just not trade-off free, and AFAIK no one working on core is interested in working on it.
staff
Activity: 4284
Merit: 8808
December 27, 2020, 05:03:00 AM
#9
The blockstream satellite node software contains a compressor that shrinks transactions by about 20% operating a transaction at a time.  (More could have been achieved by working multiple txn at a time, but at a great increase in software complexity and some loss of functionality).

It exploits all the redundancies described by pooya87 above and quite a few additional ones. E.g. it knows how to reconstruct the hash in the redeemscript for P2SH embedded segwit, it knows how to convert 65 byte pubkeys to and from 32 byte pubkeys, etc.  I think, though my memory is kinda fuzzy, it exploits reused scriptpubkeys/redeemscripts within a transaction.

Though your network needs to be fairly slow or your CPU extremely fast before its worth the CPU cost (for satellite it's an obvious win).

Quote
Blocks are random data and can not be compressed with any algorithm other than one that focuses on bitcoin applications.
Technically thats not entirely true, due to things like pubkey and address reuse and common syntax elements, state of the art generic compressors do get some compression when working on one or several whole blocks at a time.  But not anywhere near as much compression as a data aware compressor can achieve. The two can be combined, of course.  But having to work one or more blocks at a time gets in the way of random access to data and potentially creates DOS vulnerabilities (e.g. from peers forcing you to decompress groups of blocks only to access one block in the group).

It's more efficient to use this stuff if you negotiated with peers sending the compressed form over the wire.  If all your peers were doing that, you could just keep the compressed form in memory and on disk and not have to decompress except for validation.

AFAIK no one is working on taking this work upstream, which is sad-- a 20% reduction in disk space would be nice, and post erlay this would also end up being nearly a 20% reduction in network bandwidth if used for transaction/block relay.

If it were just done for storage probably the best thing would be store every block using it,  and just keep a small cache of uncompressed blocks in memory for serving requests for recent blocks.  Any kind of rewriting of block data on disk is complicated due to the need to be able to recover from crashes.
legendary
Activity: 3472
Merit: 10611
December 27, 2020, 04:20:58 AM
#8
There probably aren't going to be much space savings in the smaller transactions that send one or two inputs with a total tx size of 100-250 bytes, which are the ones that need to be compressed the most because there are so many of them being made.
An example with 1 input 2 outputs with raw size of 380 bytes turning into 337 bytes which is 11.3% compression:
https://blockchair.com/bitcoin/transaction/af0bb1afcfbc5dca51ee0ce2b8ac55bdb44f57ee14880f9fb0458ae2cfe06d78

01 (compact int version)
01
9e7087f980ad31bfcdc9fee364b355d6487f9ea6931ef4ab10ca6fdfc366f0a4
0c (compact int index)
XX (a single byte code indicating type of the signature, here standard P2WSH multi-sig)
0203 (sig count and pubkey count, keep in mind this is standard based on XX)
06408e747ee916ab39935336f8bdc1140c483de92ab1fc2e8d2e01cdcdd5101f (r1)
09851440d9819f76796db8b146754476e5449a8d76be1f5845c14cd5c5f7e92d (s1)
01 (SigHashType)
756740dda9d136254b32d28d243d7f42023b2912116b0c0513f184b0c377ae85 (r2)
409acb15f52d12e2d32eb9901dd313c454c813db2d59c10d9864e7bf2bedf376 (s2)
01 (SigHashType)
0375e00eb72e29da82b89367947f29ef34afb75e8654f6ea368e0acdfd92976b7c (pub1)
03a1b26313f430c4b15bb1fdce663207659d8cac749a0e53d70eff01874496feff (pub2)
03c96d495bfdd5ba4145e3e046fee45e84a8a48ad05bd8dbb395c011a32cf9f880 (pub3)
ffffffff
02
fe10f92500 (compact int amount)
XX (a single byte code indicating type of the output, here a P2SH)
95e5f297804e7899d91e6bceb97d9251658c88ca (OP codes removed since we know them and 160 bit size)
fe4a9bdd0d (compact int amount)
XX (a single byte code indicating type of the output, here a P2WSH)
701a8d401c84fb13e6baf169d59684e17abd9fa216c8cc5b9fc63d622ff8c58d
00 (compact int locktime)


XX values above are like a flag, for example 00 could mean P2PK script, 01 P2PKH and so on.

Quote
This is unlikely to get implemented in Core,
Since this is optional and local, it doesn't have to be implemented by core.

Quote
but I had a thought of compressing the transaction with LZMA
Blocks are random data and can not be compressed with any algorithm other than one that focuses on bitcoin applications.
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
December 27, 2020, 02:57:13 AM
#7
This can aggregate to a bigger value and 20% of 1,300,269 bytes is 260,053 bytes and doesn't seem that far.

There probably aren't going to be much space savings in the smaller transactions that send one or two inputs with a total tx size of 100-250 bytes, which are the ones that need to be compressed the most because there are so many of them being made.

This is unlikely to get implemented in Core, but I had a thought of compressing the transaction with LZMA and prepending magic bytes at the beginning of the message that indicate this is an LZMA-compressed tx message, and reserve another magic bytes for an LZMA-compressed block message and so on. LZMA can shrink even already small inputs and the cpu time needed to compress it shouldn't be a problem because most full nodes have capable processors. What do you think of this?
legendary
Activity: 3472
Merit: 10611
December 27, 2020, 12:29:32 AM
#6
Wouldn't the process only work for values within the blockchain which has a certain fixed pattern, like first few bytes of block hashes. But that wouldn't be the case with signatures, UTXOs, TXID etc within the blockchain which doesn't necessarily have a fixed value? I would think 20% is quite hard to achieve and minuscule optimizations within the block hash and more fixed values within the blockchain could possibly be compressed.
It would but a lot of things in each block already have a very fixed pattern. For example the block I posted above has 841 transactions and 702 of them use locktime=0 and that is taking 4 bytes instead of 1. So that's another 3*702=2106 bytes saved.
740 of these transactions are paying to a P2PKH address which is a defined standard script and instead of 26 bytes it can be 21 bytes => 3700 bytes saved.
The biggest transaction in this block contains 800 inputs most of which are using P2SH-P2WPKH input types. There is 23 bytes wasted in signature script, 2 bytes in SegWit flag, 3 bytes witness count, (2+6/7) bytes wasted per signature. About 22,000 bytes can be removed from this tx.
And so on...

This can aggregate to a bigger value and 20% of 1,300,269 bytes is 260,053 bytes and doesn't seem that far.
legendary
Activity: 3038
Merit: 4418
Crypto Swap Exchange
December 27, 2020, 12:07:35 AM
#5
Each node receiving such "compressed" blocks has to "decompress" them first before verification which is a very cheap process. Eg. add 9 zeros in the hash above before searching for previous block header hash, add the input to coinbase tx.

I tested some ideas a while back and I believe the blocks could be compressed about 20% in some cases. I haven't had time to polish the idea though. A through research is needed to see what is the average amount of space saved to conclude whether this is worth it or not.
Wouldn't the process only work for values within the blockchain which has a certain fixed pattern, like first few bytes of block hashes. But that wouldn't be the case with signatures, UTXOs, TXID etc within the blockchain which doesn't necessarily have a fixed value? I would think 20% is quite hard to achieve and minuscule optimizations within the block hash and more fixed values within the blockchain could possibly be compressed.
legendary
Activity: 3472
Merit: 10611
December 26, 2020, 11:56:11 PM
#4
What you are describing doesn't need "validation" and it doesn't have to happen every X number of blocks but instead on every single block. It is just a communication contract and storage contract between full nodes. For example using your header idea (which is not nonce just the hash) the header of block 663148 could use
Code:
090160540a929f1c0147b1ce535f6e402a13e365cec0593d
instead of
Code:
0000000000000000000160540a929f1c0147b1ce535f6e402a13e365cec0593d
where 09 represents the number of zeros that were removed.

Of course we don't stop there since there is a lot more that can be "compressed". For example the coinbase tx can remove its input since we already know it is one input and it is an empty hash + max output index.
With 2 simple steps above we saved 44 bytes (0.002%).

Each node receiving such "compressed" blocks has to "decompress" them first before verification which is a very cheap process. Eg. add 9 zeros in the hash above before searching for previous block header hash, add the input to coinbase tx.

I tested some ideas a while back and I believe the blocks could be compressed about 20% in some cases. I haven't had time to polish the idea though. A through research is needed to see what is the average amount of space saved to conclude whether this is worth it or not.
legendary
Activity: 3038
Merit: 4418
Crypto Swap Exchange
December 26, 2020, 09:28:16 PM
#3
Depends on whether you want to trade storage space for CPU and/or IO. In addition, the main bulk of the data doesn't come from the block headers but the transaction. Trying to save on block header wouldn't provide any size savings in the first place.

More often than not, the bottleneck doesn't lie in the storage space nor the bandwidth, those has gotten relatively cheaper throughout the years. IO speed and CPU are usually the main concern for most. Compression won't help in this way and it's a hassle to decompress the entire blockchain when synchronizing and/or rescanning.
full member
Activity: 260
Merit: 129
December 26, 2020, 06:18:15 PM
#2
I think the "full node" philosophy is to have all data uncompressed to be fully available. I don't know how it works specifically but maybe they need to check every inputs/outputs to broadcast transaction ?

Some altcoin are doing pruning or "light nodes" but I don't think it's very good for decentralisation.

And compressed data = less cold storage space, but more processor work when you want to access old data. So less data for more computing, and data are cheap...
newbie
Activity: 21
Merit: 16
December 26, 2020, 05:35:42 PM
#1
Now, all blocks are sent in uncompressed form. They can be compressed by each node on their own, there are many ways of shrinking that data that could always be made losslessly, for example nonce can always be placed as four bytes in previous block hash instead of zeroes (because there always will be zeroes, 32 zero bits are mandatory). Many things are repeated in the blockchain, so if we create some new nodes, they have to download everything and validate uncompressed data. As validation take some time anyway and sometimes storage or network bandwidth is limited, maybe it is a good idea to send old blocks in compressed form, for example by making such "compressed block dump" every 210,000 blocks. In some edge cases where halving is nearby, we can wait 2 weeks before we compress blocks from last halving, just to make sure we are safe from reorgs just before or just after halving.
Jump to: