Author

Topic: Can't the blockchain be compressed? (Read 1188 times)

sr. member
Activity: 448
Merit: 254
April 30, 2013, 03:58:58 PM
#12
I thought the same thing before and came up with a few ideas on how this can be done.
The problem, basically, is that the data in the blockchain is already very dense. I remember doing a few tests and finding out that it can't be compressed to less than 60% of its size.

For anyone who's interested, these are the ideas I had:
  • Replacing the previous block's hash with the current block's height in the block format, saves 28 bytes/block, 6 megabytes overall, almost nothing.
  • Removing the Merkle root, can be deduced from the transactions in the block, saves 32 bytes/block, 7 megabytes, almost nothing.
  • Replacing the previous output field in each input with the output's block number and its transaction's order in the block, saves 26 bytes per input. assuming 1 input/transaction, and going by blockchain.info's 17000000 transactions, this would save 431 megabytes.
  • For 99.999% of outputs, there are four bytes (OP_DUP,OP_HASH160,OP_EQUALVERIFY,OP_CHECKSIG) that can be removed (i.e. they would be "implied" in the compressed format unless otherwise indicated). again, assuming 1 output/transaction, this would save 68 megabytes.
  • Repeat uses of the same address. If the average address receives 5 payments, then 4 of those five can be replaced by a reference to the first time it appears in the blockchain (4-7 bytes long). With each address taking up 20 bytes, this saves 13 bytes per repeat output (4/5th), possibly another 172 megabytes.

1) I don't think this works.  It needs a pointer to the previous block to form the "chain" of blocks.
2) I think the merkle root is important to preserve for pruning, a theoretical way to cut down of blockchain size that hasn't been fully implemented.  Why would there be a second hash if it were redundant?
3) If you do this, then it's not possible to construct a spend until the funding tx is in a block, and is deep enough that it will not be reorganized into another block.  The person spending has to watch the network for reorgs and recreate their tx.  With block-agnostic transactions, the network can hold their tx and just rebroadcast it for them if necessary.
4) I would rather see Script abandoned in favor of hard-coded transaction types, than to put in some hack like "if the script just pushes a pubkey-sized thing onto the stack, infer that it means DUP HASH160 EQUALVERIFY CHECKSIG."
5) You could also get everyone to use compressed public keys (saves ~32 bytes per spend, I'm looking at you SatoshiDice -- last I looked they are still using uncompressed addresses), and consider a scheme where the client can somehow infer the publickey to be used for a spend if it has been seen on the network before.  This just feels dirty and could be difficult to make performant, though.
kjj
legendary
Activity: 1302
Merit: 1026
April 30, 2013, 03:45:04 PM
#11
It really isn't hard to run the blockfiles through bzip2 to see how much compression is really possible.  If I recall correctly, when I did it, I  got about 30% compression on the block files.
newbie
Activity: 55
Merit: 0
April 30, 2013, 03:21:50 PM
#10
I thought the same thing before and came up with a few ideas on how this can be done.
The problem, basically, is that the data in the blockchain is already very dense. I remember doing a few tests and finding out that it can't be compressed to less than 60% of its size.

For anyone who's interested, these are the ideas I had:
  • Replacing the previous block's hash with the current block's height in the block format, saves 28 bytes/block, 6 megabytes overall, almost nothing.
  • Removing the Merkle root, can be deduced from the transactions in the block, saves 32 bytes/block, 7 megabytes, almost nothing.
  • Replacing the previous output field in each input with the output's block number and its transaction's order in the block, saves 26 bytes per input. assuming 1 input/transaction, and going by blockchain.info's 17000000 transactions, this would save 431 megabytes.
  • For 99.999% of outputs, there are four bytes (OP_DUP,OP_HASH160,OP_EQUALVERIFY,OP_CHECKSIG) that can be removed (i.e. they would be "implied" in the compressed format unless otherwise indicated). again, assuming 1 output/transaction, this would save 68 megabytes.
  • Repeat uses of the same address. If the average address receives 5 payments, then 4 of those five can be replaced by a reference to the first time it appears in the blockchain (4-7 bytes long). With each address taking up 20 bytes, this saves 13 bytes per repeat output (4/5th), possibly another 172 megabytes.

With the current blockchain taking up 9 GIGABYTES of disk space and taking weeks to be downloaded and verified, this doesn't really change anything.

The size of the blockchain is a fundamental flaw in bitcoin and nothing is going to change that. It just happens to be the best technology we have now, but the moment something else is invented that has the same advantages and fewer drawbacks, bitcoin will be wiped out.
hero member
Activity: 784
Merit: 1000
April 30, 2013, 04:42:39 AM
#9
I don't download the blockchain everytime I login on a light wallet to see the 5 dollars I have in my personal account.



http://en.wikipedia.org/wiki/Private_information_retrieval

No I was actually genuinely saying that it is a valid point, people who are new to the Bitcoin economy always seem to go ahead and download Bitcoin-qt and wait for the whole chain to download just to send a couple of coins. Wonder what will they do in 3 years when this will only be viable for servers.

Yup, fixed.

The idea is Bitcoin enables you to either run your own bank or have someone else running it for you.

And running a bank/clearing house is never easy, despite how corrupt a lot of banksters are.
sr. member
Activity: 364
Merit: 250
April 30, 2013, 04:41:16 AM
#8
I don't download the blockchain everytime I login on a light wallet to see the 5 dollars I have in my personal account.



http://en.wikipedia.org/wiki/Private_information_retrieval

No I was actually genuinely saying that it is a valid point, people who are new to the Bitcoin economy always seem to go ahead and download Bitcoin-qt and wait for the whole chain to download just to send a couple of coins. Wonder what will they do in 3 years when this will only be viable for servers.
sr. member
Activity: 364
Merit: 250
April 30, 2013, 04:17:31 AM
#6
I don't download the blockchain everytime I login on a light wallet to see the 5 dollars I have in my personal account.

sr. member
Activity: 434
Merit: 250
April 29, 2013, 06:18:23 PM
#5
I don't download the blockchain everytime I login on a light wallet to see the 5 dollars I have in my personal account.
sr. member
Activity: 448
Merit: 251
Bitcoin
April 29, 2013, 06:14:26 PM
#4
What do you mean it's huge?

How large do you think your bank's databases are?

I don't download the bank's database each time I login to see the 5 dollars I have in my personal account.

sr. member
Activity: 434
Merit: 250
April 29, 2013, 04:36:50 PM
#3
What do you mean it's huge?

How large do you think your bank's databases are?
legendary
Activity: 1764
Merit: 1002
April 29, 2013, 04:25:48 PM
#2
it can be pruned by lopping off all spent tx's.
newbie
Activity: 55
Merit: 0
April 29, 2013, 03:49:35 PM
#1
I thought about making a minimal system just for managing bitcoins... but the blockchain is HUGE... Anyway to make it smaller ?
Jump to: