Pages:
Author

Topic: Will the block chain size ever become too large? (Read 20405 times)

legendary
Activity: 1708
Merit: 1010
Isn't there some way of limiting blockchain download so that each node only need download a portion. It's so obvious that it must be what you developers are working on.

Short answer, yes.

Longer answer, it's complicated, but yes.

One method currently available is to use a 'light client' that only downloads and keeps a copy of the block headers plus the blocks that contain confirmed transactions related to it's own wallet.dat balance.  The wallet.dat keeps a copy of the particular transactions anyway, so it's relatively easy to do, but I know of no desktop clients that don't use a full-client model.  The problem here is that, while light clients can prove that they owned those coins at one time, they are dependent upon the vendor's client being a full client (or have some other method of full verification) in order to spend coins; and light clients have issues with acceptance of coins, because they can't independently confirm that a transaction received is still valid.  Only that the coins were once owned by the sender, and (usually, but not absolutely) that the transactions offered should fit in the block where the sender claims they came from.

Another method is what I've been calling "naked block transmission protocol" but I'm sure that is not what Gavin and the developers are calling it.  Wherein a new block, even for a full client, is broadcast only with the header and the full merkle tree; thus stripped of the actual transaction data.  On the assumption that all full clients would have already seen the vast majority of the transaction data when it was released loose on the network up to 10 minutes prior.  This would improve the network throughput from between 2 to 10 times, or about 6 for the average block.  This wouldn't improve a full-client bootstrapping time at all, but, a kind of vendor's client could be developed from this that has more data than a light client but not as much as a full client.

There is also the parallel network method, that Stratum & a couple of android clients are using.

And for full clients that no longer need old data locally, there is the pruning protocol that has always been in Bitcoin, but never implimented; which would allow full clients to discard long spent transactions.
donator
Activity: 1218
Merit: 1079
Gerald Davis
Isn't there some way of limiting blockchain download so that each node only need download a portion. It's so obvious that it must be what you developers are working on.

If the Blockchain is to be handled by miners or other limited users, then, Bitcoin becomes centralised and open to attack as any centralised entity.  How about Clients only acting on 3 first characters worth of the Blockchain, randomly. IE, My client will only process transactions that begin with XYZ........ If this were hidden from users and done randomly, there'd be no way for anyone to know who was veryfying what portions of the Blockchain  and the blockchain would then itself effectively be decentralised.     So, the idea is, have each client only download transactions that begin  XYZ and will only verify transactions that begin XYZ.

It only means that when you send Bitcoin, the transaction sends out a verification request......ahhh, but how does it know where to send the request....
It would have to trawl the entire net to find a node that will verify it's XYZ transaction, and so would slow down verifications. 
And so Ideally, Any node should be able to verify a transaction, but of course it can't, because it doesn't have the entire blockchain.  (why do I get the feeling I'm painting myself into a corner here) 
What about then, if when you boot up the client it actively seeks out only those nodes which will deal with and verify XYZ transactions. It might mean a sync delay initially while it finds enough XYZ nodes to verify your transaction but once it's found them, anyway, My thoughts on the Blockchain size problem.   A distributed Blockchain, I'm sure that's what they are working on anyway since it's obvious so I've just wasted 5 minutes of my (and your ) life,       sorry about that  Roll Eyes     

No.  What you describe is a (poorly designed) DHT but DHT in general is a poor security structure for the blockchain.   A full node is a full node.  It has to be independent.  If it need to rely on other nodes then it isn't independent, if nobody is independent then that is a huge problem.  Not everyone needs to be a full node but the network needs independent full nodes.  If you don't want to run a full node then look to an SPV client.   

The blockchain can be pruned.  The pruned size is ~1/3rd of full blockchain and will decline as a % of full size over time.   That is the solution which is being worked on.  A pruned blockchain doesn't reduce the security of a full node, they can continue to operate independently.
full member
Activity: 474
Merit: 111
Isn't there some way of limiting blockchain download so that each node only need download a portion. It's so obvious that it must be what you developers are working on.

If the Blockchain is to be handled by miners or other limited users, then, Bitcoin becomes centralised and open to attack as any centralised entity.  How about Clients only acting on 3 first characters worth of the Blockchain, randomly. IE, My client will only process transactions that begin with XYZ........ If this were hidden from users and done randomly, there'd be no way for anyone to know who was veryfying what portions of the Blockchain  and the blockchain would then itself effectively be decentralised.     So, the idea is, have each client only download transactions that begin  XYZ and will only verify transactions that begin XYZ.

It only means that when you send Bitcoin, the transaction sends out a verification request......ahhh, but how does it know where to send the request....
It would have to trawl the entire net to find a node that will verify it's XYZ transaction, and so would slow down verifications. 
And so Ideally, Any node should be able to verify a transaction, but of course it can't, because it doesn't have the entire blockchain.  (why do I get the feeling I'm painting myself into a corner here) 
What about then, if when you boot up the client it actively seeks out only those nodes which will deal with and verify XYZ transactions. It might mean a sync delay initially while it finds enough XYZ nodes to verify your transaction but once it's found them, anyway, My thoughts on the Blockchain size problem.   A distributed Blockchain, I'm sure that's what they are working on anyway since it's obvious so I've just wasted 5 minutes of my (and your ) life,       sorry about that  Roll Eyes     
sr. member
Activity: 434
Merit: 252
youtube.com/ericfontainejazz now accepts bitcoin
It's interesting.  I have a background in computer architecture (Multicore CPU design), and the whole concept of bitcoin as a distributed time stamp network reminds me of the problem of multi-processor memory coherency (http://en.wikipedia.org/wiki/Cache_coherence).  Basically when there are many processors sharing the same memory space, there is the same issue about how to determine the correct global ordering of memory transactions.  For instance, if processor A writes to address X about the same time that processor B writes to address X, then what should be the final value of memory address X?  The problem arises because each processor keeps frequently accessed memory addresses in their local caches (and additionally each processor is most likely speculative and executes out-of-order), so it is difficult to determine what is the "correct" memory transaction ordering without some shared arbitrator.  But for your typical shared-bus multi-core CPU that most of us are proably using right now, there is a shared bus connected all the processors' private caches together with the last level shared cache or main memory, so it is relatively easy since the the shared bus can act as global arbitrator which determines the "correct" global order of memory transactions based on the order in which memory transaction appear on the shared bus.  And the individual caches have special cache coherence protocols (e.g. MESI) that snoop the shared bus and update cached memory values efficiently.

However, shared busses are not scalable since they become very slow as the number of connected components increases, making them impracticable for more than 16 cores or so.  Larger shared-memory multiprocessor systems must use slightly more clever systems for dealing with memory coherency since the interconnect fabric is much more complex and may not have any centralized arbitrators.  There are hundreds of research papers dealing with this in the past 30 or so years.  One such solution is distributed directory-based cache coherency, whereby each processor cache controller is responsible for handling the transaction ordering for a certain memory range.  For instance, if there are 256 cores on a chip arranged in a 2-D grid of 16x16 cores, then the memory address space would be divided up into 256 different sections.  Whenever a cpu generates a memory transaction, that transaction is forwarded to the cache that is responsible for keeping track of the ordering and sharing based on some bits of that particular memory addresses.

Regarding bitcoin, it seems the current design has the same drawback of a shared bus cache coherenece since miners must process every single transaction (well technically not "every" transaction, since if a transaction is not put in the block chain then that transaction never *really* happened Smiley ).  It seems to me that as the bitcoin network reaches the scale of millions of users, then it will become impractical to broadcast every transaction to every single miner.  A better solution would be along the lines of distributed directory-based cache coherency, whereby a particular miner is only responsible for handling transactions from coins belonging to a certain address range.  Basically, the ordering of two transactions that use different coins is not important to the problem of double-spending.  Rather, it is only important to maintain the proper ordering for transactions that use the same coins.  Two transactions that use an entirely mutually-exclusive set of coins can be reordered with respect to one-another in any manner without having any issue of double spending (of course if two transactions share some coins but not all, then care will needed to be taken to ensure proper global ordering).  So I would be interested in some modification of the bitcoin protocol whereby different miners may be responsible for hashing transactions of a subset of the coin address space.  Of course selecting which address space I am responsible for mining and broadcasting this info to all bitcoin clients is not trivial.  And of course eventually these mini-block chains (which are only handling part of the coin address space) would need to be reincorporated with the rest of the global block chain whenever there are transactions involving coins from multiple addresses spaces.  But anyway, once this network bandwidth problem becomes a real issue, then I am confident that such a fork could be implemented.  (One possible "solution" that doesn't involve *any* code rewriting would be to simply allow multiple block chains to exist simultaneously, and then simply perform a standard currency exchange between the different competing bitcoin block chains whenever there is a transaction involving coins from different block chains.  This would effectively divide up the coin addresses space into different mutually non-overlapping address chunks that could be processed entirely independently.  Overtime, the exchange rate between different bitcoin block chains would stabilize, and would become effectively one currency as far as the layman is concerned.  Of course, the downside is that each individual block chain would be weaker than a single global united powerful blockchain.).
member
Activity: 308
Merit: 10
I don't see how that's not compressing random data, which is impossible.

It's taking an idea and finding all the random letters and all the random words in the correct order to define it, which is possible. The random letters come from the blockchain. "Randomness" is a perception.

So how do you encode the locations of the data in such a way that it doesn't take up more space than just storing the data?
hero member
Activity: 630
Merit: 500
By the time it's a problem, Bitcoin will have thousands and thousands of hardcore enthusiasts, possibly small businesses. The network will be fine even if Joe Public doesn't need the full client.
legendary
Activity: 1526
Merit: 1134
Lightweight clients do exist. BitcoinJ is one such implementation. It's not quite ready for everyday use yet though, that is true. Gavin is talking about upgrading the C++ client to do lightweight mode as his next priority, so don't worry, it'll happen.
hero member
Activity: 717
Merit: 501
Just look a how big your blkindex.dat and blk0001.dat increase everyday.  That is 2 MB per day.  They will be 1 gb at the end of the year.  So a fresh install might take you 3 hours to download the whole index.  That is already too big imho, and will double soon after that.  Downloading 1 days blocks already takes 10 minutes.  The idea of letting the big companies run the show could work, but 1) it is not done yet 2) it centralizes bitcoin, my as well use paypal.   Then you talk about light clients, they don't exist.  Then you say disk space will increase faster than bitcoin, it does not matter.  The reality is if you take todays bitcoin economy and 10x it you are now talking 20 mb per day, making bitcoin unusable for most people and most would never download the massive "virus".

You can clean out all the history as you only really need to know the last owner.  You can require people to maintain the whole block if they are to get transaction fees. If you go to sizes less than 0.01 then you can increase the block even faster.  Most responses so far are to sweep it under the carpet.
member
Activity: 107
Merit: 10
I don't see how that's not compressing random data, which is impossible.

It's taking an idea and finding all the random letters and all the random words in the correct order to define it, which is possible. The random letters come from the blockchain. "Randomness" is a perception.
sr. member
Activity: 280
Merit: 252
I've been thinking about this problem as well.  If bitcoin is to go mainstream, there needs to be a way to deal with a huge number of transactions.  Here is an example of the math...

In a population of 300 million, estimate everyone does one transaction per day on average.  The current block generation is about one every 10 minutes, or 6 an hour, or 144 per day.  If you divide 300 million by 144, you get 2.08 million transactions per block.  If the minimum size per each transaction is around 100 bytes (I'm just guessing here), then each block is going to be around 200 million bytes.

So, I see the following issues:

1. Huge bandwidth problems.  (144*200 million bytes is 28 Gigs per day.)
2. Storage issues. (28 Gigs per day is 10 Tera bytes per year)
3. Computational issues. (The current block size is around 1-2k.  Computing a hash for 200 MB is going to be much different.)

Here are a few other things I've thought about:

* What happens in the future when you issue a request to send someone funds and do not attach a fee for processing?  If no-one takes your request to process the funds, will the funds stay in "limbo"?
* I was thinking a good way to deal with the huge number of transactions is to mix bitcoin transactions with something like the ripple project (http://ripple-project.org/), or perhaps semi-centralized bitcoin clearing entities.
* Given the huge number of potential transactions per day, perhaps the block size needs to be limited to 2-4k.  This would force the above idea and keep the block chain manageable.

With my home built computer and a 100mbps internet connection, I can (and do) easily download 28 gigs in a matter of minutes.

I also own a few terabytes of hard disk space. It's cheap, and prices are falling faster than the US dollar.

By the time we ever reach 200 million transactions every 10 minutes, the smallest hard drives will probably be a few terabytes in size - and we'll all be rocking 1Gbps dedicated connections!

I don't think the block chain size will ever really be an issure, even for the average joe.
administrator
Activity: 5222
Merit: 13032
I don't see how that's not compressing random data, which is impossible.
member
Activity: 107
Merit: 10
The block chain does not compress well because it is mostly keys and hashes (ie random data).

I'm not saying compress the block chain itself. Let me see if relating it to Boggle helps. At least somebody can see where my analogy breaks down. If you are an auditory learner, say what I'm writing out loud to yourself so you can "hear" me speaking it to you.

1) boggle board = block chain (seemingly random data)
2) word = data of interest (a pattern found within seemingly random data)
3) tracing the word on the board = an algorithm to encode or decode data (manipulation of random data)
4) a sentence using the words found = a complete data file (information of interest)
5) a word defining the sentence = a compressed file (an abstract idea)

Items 2-5 do not have to hold the entire information in one big data set. They can be utilized recursively to take a very complex idea and reduce it down to a small word.

Try describing something. What is "love"? What words did you use to describe it? What do those words mean? To get down to the real meaning of "love", you have to recursively define each word using other words. At the end of the recursion, you have a collection of random data that only has meaning because we both have access to the same data and a common knowledge to describe it. We understand this intuitively as a language to transfer an idea from one person to another. So what is the natural Bitcoin block chain language? Where are the linguists?
legendary
Activity: 1526
Merit: 1134
The block chain does not compress well because it is mostly keys and hashes (ie random data).
member
Activity: 107
Merit: 10
When transaction blocks start getting really big, this solution may ease the burden on the network.
http://programming.witcoin.com/p/515/If-Sun-can-do-it-wit-Java-why-cant-we-do-it-wit-Bitcoin

Theoretically, new transaction blocks could be encoded against the existing block chain to yield small keys that contain unpacking instructions. Clients only need to unpack it and check the block against a hash of itself.

I'll be able to present the idea more clearly as I learn the right words to use. My best analogy is that we would be creating a new binary language that lets entities communicate large and complex ideas in an efficient manner, like the word "love".
legendary
Activity: 1526
Merit: 1134
Quote
Of course this is easily solvable using GPUs. So not a bottleneck anymore i guess.

Hashing != ECDSA verification. I don't know of any algorithms that will make ECDSA much faster using GPUs. The best bet for speeding this up (assuming we care) is to exploit the special properties of the secp256k1 curve:

http://portal.acm.org/citation.cfm?id=1514739

http://www.hyperelliptic.org/tanja/KoblitzC.html
donator
Activity: 826
Merit: 1060
I cannot forsee a time when BitCoin is out of reach for (well paid) hobbyists.
If the time comes when it has becme impractical for a (non-well-paid) individual to process the block chain, this can only be because Bitcoin has become wildly successful, and that will have opened up millions of new niches for the Bitcoin hobbyist. Really cool things that haven't even been though of yet.
legendary
Activity: 1708
Merit: 1010
There will be some centralization, but what you really fear is a de facto monopoly.  This is, for all practical purposes, impossible.  The size of the blocks is irrelevant, since the only part of the block that is permanent is the 80 byte header.  The rest can be purged eventually, after the transactions have been referenced.

Perhaps I've not followed the block idea correctly. Where are the transaction stored?  If I send someone a bitcoin, it's history has to be recorded someplace to verify that it was not created out of thin air. (Ha... I just realized that they are created out of thin air...)

Zerbie

Transactions are stored within a block in a merkle hash tree, so the block can be pruned of long spent transactions and still be verified as being authentic by it's merckle root hash within the block header.  It only takes a dozen or so kept history to be certain that a transaction is real.
legendary
Activity: 1526
Merit: 1134
Zerbie, the transactions are stored in the block chain, but once all the outputs of a transaction are used up that tx can be safely deleted (or stored for historical interest). So disk space can be reclaimed if you are tight. Given that 1T drives are pretty cheap today though, it's unlikely anyone will ever actually need to do this. Storage capacity doesn't seem to be a bottleneck anytime soon.

Short story - don't worry about corporate takeover at large scale because of technical limitations. The places where corporates will dominate in future will be mining conglomerates, currency exchanges, fractional reserve banking and most of the places where they exist today. The companies that will have to find new roles in a hypothetical post BitCoin world will be MasterCard and VISA.
legendary
Activity: 1526
Merit: 1134
Even if we assume BitCoin grows to VISA scale it'd be possible to run a full network node using only a small number of computers. I cannot forsee a time when BitCoin is out of reach for (well paid) hobbyists. See my previous posts on this topic for why.
member
Activity: 72
Merit: 10
There will be some centralization, but what you really fear is a de facto monopoly.  This is, for all practical purposes, impossible.  The size of the blocks is irrelevant, since the only part of the block that is permanent is the 80 byte header.  The rest can be purged eventually, after the transactions have been referenced.

Perhaps I've not followed the block idea correctly. Where are the transaction stored?  If I send someone a bitcoin, it's history has to be recorded someplace to verify that it was not created out of thin air. (Ha... I just realized that they are created out of thin air...)

Zerbie
Pages:
Jump to: