Can't the blockchain be compressed? | Bitcointalksearch.org

scintill

sr. member

Activity: 448

Merit: 254

Quote from: JeromeS on April 30, 2013, 03:21:50 PM

I thought the same thing before and came up with a few ideas on how this can be done.
The problem, basically, is that the data in the blockchain is already very dense. I remember doing a few tests and finding out that it can't be compressed to less than 60% of its size.

For anyone who's interested, these are the ideas I had:

Replacing the previous block's hash with the current block's height in the block format, saves 28 bytes/block, 6 megabytes overall, almost nothing.
Removing the Merkle root, can be deduced from the transactions in the block, saves 32 bytes/block, 7 megabytes, almost nothing.
Replacing the previous output field in each input with the output's block number and its transaction's order in the block, saves 26 bytes per input. assuming 1 input/transaction, and going by blockchain.info's 17000000 transactions, this would save 431 megabytes.
For 99.999% of outputs, there are four bytes (OP_DUP,OP_HASH160,OP_EQUALVERIFY,OP_CHECKSIG) that can be removed (i.e. they would be "implied" in the compressed format unless otherwise indicated). again, assuming 1 output/transaction, this would save 68 megabytes.
Repeat uses of the same address. If the average address receives 5 payments, then 4 of those five can be replaced by a reference to the first time it appears in the blockchain (4-7 bytes long). With each address taking up 20 bytes, this saves 13 bytes per repeat output (4/5th), possibly another 172 megabytes.

1) I don't think this works. It needs a pointer to the previous block to form the "chain" of blocks.
2) I think the merkle root is important to preserve for pruning, a theoretical way to cut down of blockchain size that hasn't been fully implemented. Why would there be a second hash if it were redundant?
3) If you do this, then it's not possible to construct a spend until the funding tx is in a block, and is deep enough that it will not be reorganized into another block. The person spending has to watch the network for reorgs and recreate their tx. With block-agnostic transactions, the network can hold their tx and just rebroadcast it for them if necessary.
4) I would rather see Script abandoned in favor of hard-coded transaction types, than to put in some hack like "if the script just pushes a pubkey-sized thing onto the stack, infer that it means DUP HASH160 EQUALVERIFY CHECKSIG."
5) You could also get everyone to use compressed public keys (saves ~32 bytes per spend, I'm looking at you SatoshiDice -- last I looked they are still using uncompressed addresses), and consider a scheme where the client can somehow infer the publickey to be used for a spend if it has been seen on the network before. This just feels dirty and could be difficult to make performant, though.

kjj

legendary

Activity: 1302

Merit: 1026

It really isn't hard to run the blockfiles through bzip2 to see how much compression is really possible. If I recall correctly, when I did it, I got about 30% compression on the block files.

JeromeS

newbie

Activity: 55

Merit: 0

I thought the same thing before and came up with a few ideas on how this can be done.
The problem, basically, is that the data in the blockchain is already very dense. I remember doing a few tests and finding out that it can't be compressed to less than 60% of its size.

For anyone who's interested, these are the ideas I had:

Replacing the previous block's hash with the current block's height in the block format, saves 28 bytes/block, 6 megabytes overall, almost nothing.
Removing the Merkle root, can be deduced from the transactions in the block, saves 32 bytes/block, 7 megabytes, almost nothing.
Replacing the previous output field in each input with the output's block number and its transaction's order in the block, saves 26 bytes per input. assuming 1 input/transaction, and going by blockchain.info's 17000000 transactions, this would save 431 megabytes.
For 99.999% of outputs, there are four bytes (OP_DUP,OP_HASH160,OP_EQUALVERIFY,OP_CHECKSIG) that can be removed (i.e. they would be "implied" in the compressed format unless otherwise indicated). again, assuming 1 output/transaction, this would save 68 megabytes.
Repeat uses of the same address. If the average address receives 5 payments, then 4 of those five can be replaced by a reference to the first time it appears in the blockchain (4-7 bytes long). With each address taking up 20 bytes, this saves 13 bytes per repeat output (4/5th), possibly another 172 megabytes.

With the current blockchain taking up 9 GIGABYTES of disk space and taking weeks to be downloaded and verified, this doesn't really change anything.

The size of the blockchain is a fundamental flaw in bitcoin and nothing is going to change that. It just happens to be the best technology we have now, but the moment something else is invented that has the same advantages and fewer drawbacks, bitcoin will be wiped out.

oakpacific

hero member

Activity: 784

Merit: 1000

Quote from: CasinoBit on April 30, 2013, 04:41:16 AM

Quote from: oakpacific on April 30, 2013, 04:36:06 AM

Quote from: CasinoBit on April 30, 2013, 04:17:31 AM

Quote from: Frozenlock on April 29, 2013, 06:18:23 PM

I don't download the blockchain everytime I login on a light wallet to see the 5 dollars I have in my personal account.

http://en.wikipedia.org/wiki/Private_information_retrieval

No I was actually genuinely saying that it is a valid point, people who are new to the Bitcoin economy always seem to go ahead and download Bitcoin-qt and wait for the whole chain to download just to send a couple of coins. Wonder what will they do in 3 years when this will only be viable for servers.

Yup, fixed.

The idea is Bitcoin enables you to either run your own bank or have someone else running it for you.

And running a bank/clearing house is never easy, despite how corrupt a lot of banksters are.

CasinoBit

sr. member

Activity: 364

Merit: 250

Quote from: oakpacific on April 30, 2013, 04:36:06 AM

Quote from: CasinoBit on April 30, 2013, 04:17:31 AM

Quote from: Frozenlock on April 29, 2013, 06:18:23 PM

I don't download the blockchain everytime I login on a light wallet to see the 5 dollars I have in my personal account.

http://en.wikipedia.org/wiki/Private_information_retrieval

No I was actually genuinely saying that it is a valid point, people who are new to the Bitcoin economy always seem to go ahead and download Bitcoin-qt and wait for the whole chain to download just to send a couple of coins. Wonder what will they do in 3 years when this will only be viable for servers.

oakpacific

hero member

Activity: 784

Merit: 1000

http://en.wikipedia.org/wiki/Private_information_retrieval

CasinoBit

sr. member

Activity: 364

Merit: 250

Quote from: Frozenlock on April 29, 2013, 06:18:23 PM

I don't download the blockchain everytime I login on a light wallet to see the 5 dollars I have in my personal account.

Frozenlock

sr. member

Activity: 434

Merit: 250

I don't download the blockchain everytime I login on a light wallet to see the 5 dollars I have in my personal account.

the founder

sr. member

Activity: 448

Merit: 251

Bitcoin

Quote from: Frozenlock on April 29, 2013, 04:36:50 PM

What do you mean it's huge?

How large do you think your bank's databases are?

I don't download the bank's database each time I login to see the 5 dollars I have in my personal account.

Frozenlock

sr. member

Activity: 434

Merit: 250

What do you mean it's huge?

How large do you think your bank's databases are?

cypherdoc

legendary

Activity: 1764

Merit: 1002

it can be pruned by lopping off all spent tx's.

uyjulian

newbie

Activity: 55

Merit: 0

I thought about making a minimal system just for managing bitcoins... but the blockchain is HUGE... Anyway to make it smaller ?

Topic: Can't the blockchain be compressed? (Read 1191 times)