Pages:
Author

Topic: Where will stop, the size of the database bitcoin. 1GB+ (Read 5245 times)

legendary
Activity: 1008
Merit: 1001
Let the chips fall where they may.
There may exist coins minted/created in (split/merging) transactions since before the checkpoint. You can only prune coin obliterated by a split/merge.
sr. member
Activity: 277
Merit: 250
As far as I know there are hardcoded checkpoints, in the future they would just make sure you have data after said checkpoint.

However this leaves the system open to a possible conspiracy situation perhaps?
full member
Activity: 392
Merit: 100
(...)For now, there is neither the need nor the resources to do any such things, but if Bitcoin is as successful as some of us predict, those nodes will be cheap insurance; largely inaccessable to most forms of destruction.  (one or another, but not all, depending on the kind of destruction being considered)

I understand now. I thank everyone for their cooperation and commitment to this topic.

Added much information to my knowledge.

 Smiley
legendary
Activity: 1708
Merit: 1010
The hash of a pruned block will not match the hash in the header, and thus a pruned block cannot be verified. 

It doesn't need to be verified.  The client downloads the entire block, verifies it, and then proceeds to prune it to it's own liking.  That is what the merkle tree block structure is for.

Right, but that means that the full block must exist somewhere so the node can download it and verify it.  So while any node can prune, not every node can prune.

And that is why it is reasonable to assume that there will always be at least one person willing to run a full node that does not prune.  It's also why I said that most full clients won't prune anything recent.  As already noted, light clients don't need the full chain anyway, and can start their chain when they create their first address.  Most full clients won't really need a full chain either, and can prune quite extensively and/or start from the most recent trusted checkpoint hash.  All clients do full chains currently, in part, because the network is still very small compared to the expectations, and the many-copies-keep-data-safe method is employed.  The concept of placing a full "quiet" node in orbit, on Earth protected in a safe zone, and eventually on the Moon, as archival devices have been discussed already.  For now, there is neither the need nor the resources to do any such things, but if Bitcoin is as successful as some of us predict, those nodes will be cheap insurance; largely inaccessable to most forms of destruction.  (one or another, but not all, depending on the kind of destruction being considered)
kjj
legendary
Activity: 1302
Merit: 1026
The hash of a pruned block will not match the hash in the header, and thus a pruned block cannot be verified. 

It doesn't need to be verified.  The client downloads the entire block, verifies it, and then proceeds to prune it to it's own liking.  That is what the merkle tree block structure is for.

Right, but that means that the full block must exist somewhere so the node can download it and verify it.  So while any node can prune, not every node can prune.
legendary
Activity: 1708
Merit: 1010
There isn't any real need for a large number of nodes that keep a full copy of the blockchain.  There probably isn't any real need for any node to keep a full copy of the blockchain unpruned at all.  Pruning of transaction data that 1) is older than a certain period of time, say three months or15K blocks or so and 2) has been referenced (spent) and the referencing transaction has been referenced (i.e. the transaction is at least two transactions long spent) will eventually result in a fairly stable blockchain size that mostly varies by transaction volumes over those three months.  Some clients won't keep spent transactions unpruned at all, and will thus have a much smaller data footprint, growing only by the size of the block headers; which amounts to about 4 megs per year.

That said, some nodes will keep full copies of the blockchain, if only for archival reasons.  It's not neccessary that these nodes have high bandwidth or super powerful machines either.  I have a VPS that can keep up with the blockchain just fine, that I direct my other client(s) to bootstrap and update from, since I don't keep clients running either on my home machine nor my android phone and I don't want either to announce to the network or to the bootstrapping IRC channel that they exist.

The hash of a pruned block will not match the hash in the header, and thus a pruned block cannot be verified. 

It doesn't need to be verified.  The client downloads the entire block, verifies it, and then proceeds to prune it to it's own liking.  That is what the merkle tree block structure is for.
kjj
legendary
Activity: 1302
Merit: 1026
There isn't any real need for a large number of nodes that keep a full copy of the blockchain.  There probably isn't any real need for any node to keep a full copy of the blockchain unpruned at all.  Pruning of transaction data that 1) is older than a certain period of time, say three months or15K blocks or so and 2) has been referenced (spent) and the referencing transaction has been referenced (i.e. the transaction is at least two transactions long spent) will eventually result in a fairly stable blockchain size that mostly varies by transaction volumes over those three months.  Some clients won't keep spent transactions unpruned at all, and will thus have a much smaller data footprint, growing only by the size of the block headers; which amounts to about 4 megs per year.

That said, some nodes will keep full copies of the blockchain, if only for archival reasons.  It's not neccessary that these nodes have high bandwidth or super powerful machines either.  I have a VPS that can keep up with the blockchain just fine, that I direct my other client(s) to bootstrap and update from, since I don't keep clients running either on my home machine nor my android phone and I don't want either to announce to the network or to the bootstrapping IRC channel that they exist.

The hash of a pruned block will not match the hash in the header, and thus a pruned block cannot be verified.  Checkpoints will help this a bit, in that you'll have a verification chain (by a signature on the source code or binary) that says that someone "trustworthy" claims to have verified up to a certain point.  But you won't be able to prove it unless you have, or can get, the full chain, but you don't need to have all of it at once.
legendary
Activity: 1708
Merit: 1010
Correct me if I'm wrong but I believe that pruning the blockchain of fully spent transactions is "only" expected to yield about 70% savings as well.


That would be true about now, but that is also one reason pruning isn't high on the to-do list.  In a future, wherein Bitcoin processes transactions on the scale of Paypal or Visa, the pruning of spent transactions will be a very useful thing.

Quote
There is a reason that every node keeps a copy of the blockchain and while we probably can get away with a lot of light clients, I'd rather not see the full history only be kept by large centralized institutions.


Yes, there is a reason; and because of that reason, and the sentiment that you have expressed above, not every node is going to prune.  Thus, the entire blockchain will continue to exist somewhere on the 'net forever.  However, this isn't particularly useful for a node processing transactions in a live production environment.  If you truely feel this way, then you can feel free to commit some of your own personal resources to make certain that a node with the complete and unabridged blockchain continues to exist.

Quote
Having more efficient network protocols and storage formats would be great but there are other more important issues right now IMHO.

Agreed.
legendary
Activity: 910
Merit: 1001
Revolutionizing Brokerage of Personal Data
Yeah ...
Not even an order of magnitude ... probably not worth the trouble.
I agree, but then again - with the total blockchain size growing steadily, saving 60% might one day be worth the trouble.

Correct me if I'm wrong but I believe that pruning the blockchain of fully spent transactions is "only" expected to yield about 70% savings as well.

There is a reason that every node keeps a copy of the blockchain and while we probably can get away with a lot of light clients, I'd rather not see the full history only be kept by large centralized institutions.

Having more efficient network protocols and storage formats would be great but there are other more important issues right now IMHO.
legendary
Activity: 910
Merit: 1001
Revolutionizing Brokerage of Personal Data
Data in blkchain isn't very compressible: here's a bzip2 run on it:
Code:
  blk0001.dat:   1.268:1,  6.310 bits/byte, 21.13% saved, 554632957 in, 437434096 out.
  blkindex.dat:  1.598:1,  5.007 bits/byte, 37.41% saved, 236568576 in, 148074565 out.

For reference -here's the result of an lzma run:
Code:
  blk0001.dat:   32.17% saved
  blkindex.dat:  57.47% saved

And that's just a "dumb" entropy compressor, so I'd guess in combination with some low-level optimization of the data format we could easily get the size down to about 40-50%.
legendary
Activity: 1708
Merit: 1010

How about payment monthly for having the full blockchain with a decent upload speed? Nothing HUGE, but something to make them WANT to have all the blocks.

There isn't any real need for a large number of nodes that keep a full copy of the blockchain.  There probably isn't any real need for any node to keep a full copy of the blockchain unpruned at all.  Pruning of transaction data that 1) is older than a certain period of time, say three months or15K blocks or so and 2) has been referenced (spent) and the referencing transaction has been referenced (i.e. the transaction is at least two transactions long spent) will eventually result in a fairly stable blockchain size that mostly varies by transaction volumes over those three months.  Some clients won't keep spent transactions unpruned at all, and will thus have a much smaller data footprint, growing only by the size of the block headers; which amounts to about 4 megs per year.

That said, some nodes will keep full copies of the blockchain, if only for archival reasons.  It's not neccessary that these nodes have high bandwidth or super powerful machines either.  I have a VPS that can keep up with the blockchain just fine, that I direct my other client(s) to bootstrap and update from, since I don't keep clients running either on my home machine nor my android phone and I don't want either to announce to the network or to the bootstrapping IRC channel that they exist.
legendary
Activity: 1708
Merit: 1010
Why not this:

If the client is ran for the first time, only grab the most recent block. And, from there on out...grab the latest blocks.
-There is no need for new clients to go back and sift through all the blocks to see if they have a transactions.

BitcoinJ clients do this, because once they are started for the first time, they create their own addresses and have no logical need to assume that such addresses have existed prior to themselves.  They download the most recent blocks, and then keep up with the chain, but then I think that they only keep the block headers and discard the data in all blocks that do not relate to coins sent to themselves.  It's certainly possible, there just isn't a real need for a regular client that can do this just yet.
hero member
Activity: 560
Merit: 500
Remember that, due to the decentralized nature of Bitcoin, we need as many people as possible with the FULL blockchain, to relay all the blocks and transactions to others

1GB is almost nothing, you can buy 3TB hard disks for cheap... so as long as it doesn't become a real problem, it's better we keep the full blockchain
I see your point, and I raise you space detection. Smiley

If their HDD (the one that has Bitcoin Client on it) has enough space that will allow at LEAST 10% left on it, to go ahead and download the full blockchain.
Another option would be to ask the user if they would be so kind and help grow the Bitcoin economy by downloading the full blockchain.

How about payment monthly for having the full blockchain with a decent upload speed? Nothing HUGE, but something to make them WANT to have all the blocks.
legendary
Activity: 1148
Merit: 1008
If you want to walk on water, get out of the boat
Remember that, due to the decentralized nature of Bitcoin, we need as many people as possible with the FULL blockchain, to relay all the blocks and transactions to others

1GB is almost nothing, you can buy 3TB hard disks for cheap... so as long as it doesn't become a real problem, it's better we keep the full blockchain
hero member
Activity: 560
Merit: 500
Yes, can't the client "trust" the network with the majority of the block chain as a default, and only download the whole thing when specifically requested to do so?
Personally, I could care less how big the client is. However, following this method would allow users with little space to start a new client and send all Bitcoins to it.
legendary
Activity: 1106
Merit: 1001
Why not this:

If the client is ran for the first time, only grab the most recent block. And, from there on out...grab the latest blocks.

If the client has been ran before, only grab the blocks from when their client first started.

If the user so-happens to want all the blocks, they would easily just be able to do -rescan.



^ How about that?

Yes, can't the client "trust" the network with the majority of the block chain as a default, and only download the whole thing when specifically requested to do so?
hero member
Activity: 560
Merit: 500
Why not this:

If the client is ran for the first time, only grab the most recent block. And, from there on out...grab the latest blocks.
-There is no need for new clients to go back and sift through all the blocks to see if they have a transactions.

If the client has been ran before, only grab the blocks from when their client first started.
-Again, no need to go ALL the way back.

If the user so-happens to want all the blocks, they would easily just be able to do -rescan.
-Sometimes people want to have all the blocks.



^ How about that?
legendary
Activity: 1148
Merit: 1008
If you want to walk on water, get out of the boat
The Roaming/Bitcoin folder is 837MB big here with blk001.dat at 530MB and blkindex.dat at 217MB

legendary
Activity: 1666
Merit: 1057
Marketing manager - GO MP
Very simple: Future Clients might use a binary data format and / or compression, there are large amounts of redundant information.

Mmmh.

Data in blkchain isn't very compressible: here's a bzip2 run on it:

Code:
  blk0001.dat:   1.268:1,  6.310 bits/byte, 21.13% saved, 554632957 in, 437434096 out.
  blkindex.dat:  1.598:1,  5.007 bits/byte, 37.41% saved, 236568576 in, 148074565 out.
oops was a shot in the dark then  Embarrassed
sr. member
Activity: 677
Merit: 250
It probably isn't going to become a priority until the blockchain is around the 10-20 GB range.  Any computer newer than two years old and with a decent broadband connection can handle this kind of database.

The most common SSD drives are 64 to 80GB  Cry



you don't have to store it in %appdata%

-datadir=[old 2TB spinning disk]:\



That's what I'm doing on my desktop, but unfortunately the technique is not applicable for my laptop.

It's not a big concern for me though. I'm sure the standard the client will implement a "recent blockchain only" option or blockchain pruning soon. Any one of these new feature will solve the problem instantly, along with the slow client-up problem. 
Pages:
Jump to: