Author

Topic: storage scalability via divide and conquer (Read 1018 times)

legendary
Activity: 1022
Merit: 1033
November 27, 2012, 08:49:48 PM
#8
Hm, I never thought of this. I guess this is solvable in a color-friendly alt; it would have a mechanism to associate a colored output with an uncolored output that will pay upkeep on its behalf.

Why not simply use utxo owner's resources for upkeep as I proposed here? It isn't particularly complex and likely is as implementable as mechanism to associate colored outputs with uncolored?

It can work like this:

To connect transaction inputs we look up outputs in unspent transaction output database.

Instead of performing this lookup locally we will first lookup it in local cache, and (if it is not found) perform a lookup in DHT. ("Cloud database".)

However we will have to check results we got from DHT, and to do this we will pull additional proving information and check whether it is consistent with current state we got via blockchain.

So now unspent transaction outputs can be stored in DHT (they are in the cloud now,  Cheesy), and we only have to store enough information to confirm results we get from DHT locally.

Of course, DHT is unreliable, but people who want to keep their utxos alive will save a local copy, and later they can populate DHT or act as DHT nodes to make sure transaction they made passes validation.

What's interesting it requires no changes to protocol as all of this can be done in a separate layer. I.e. nodes which wish to save on local storage (at expense of network traffic) will populating DHT and will start maintaining local check state.
donator
Activity: 2058
Merit: 1054
November 27, 2012, 11:44:37 AM
#7
Some people are interested in microtransactions, and we have to say that Bitcoin doesn't really support microtransactions due to blockchain bloat problem. (You can certainly send tiny amounts, but txn fees will eat you alive.)
When people mention microtransactions (or instant confirmation) I tend to refer them to my Trustless, instant, off-the-chain Bitcoin payments suggestion, which I believe is a foundation for solving both problems. Given this I have no qualms saying Bitcoin supports microtransactions.

Also, Meni's proposal of pruning away outputs which are still alive would be a major problem for colored coins/smart property ("oh, you cannot sell you car anymore, it was pruned away, miners won't approve this transfer now"). So I want to find some alternative to it Smiley
Hm, I never thought of this. I guess this is solvable in a color-friendly alt; it would have a mechanism to associate a colored output with an uncolored output that will pay upkeep on its behalf.
legendary
Activity: 1526
Merit: 1134
October 11, 2012, 10:44:15 AM
#6
So, like a lot of discussions around scalability, this one boils down to differences in base assumptions.

I don't believe that VISA+ traffic levels would centralize the network into a handful of commercial operators. You don't need to be a big company to run a single server. To me this is like saying the web should be built on a DHT because how can the web possibly scale to more and more users? Everyone with a website would have to be a giant corporation! Except it didn't work out like that. It's quite feasible to run even very popular websites cheaply. The web is still a decentralized system and we obviously benefit dramatically from its simple architecture.

Will you be running a node on a crappy Linode VM or your laptop if Bitcoin becomes a global scale currency? No. Of course not. These VPS systems are often less powerful than a mid-range desktop computer. Buy a real server if you want to take part, it won't break the bank. If there are ever millions of end users there will likely be hundreds of thousands of nodes run by merchants, exchanges, financial services companies, miners, hobbyists and so on, just like there are lots of independent web hosts today. If need be micropayments can be used to incentive running of high availability nodes, but I doubt it'll ever be an issue.

In short, I disagree anything needs to be done here beyond incremental optimization. Attempting to come up with a design that scales indefinitely is usually the wrong approach and is definitely the wrong approach for Bitcoin.

legendary
Activity: 1022
Merit: 1033
October 11, 2012, 09:45:35 AM
#5
By the way, I forgot to mention that it is sort of relevant to "colored coins" use case...

Some people are interested in microtransactions, and we have to say that Bitcoin doesn't really support microtransactions due to blockchain bloat problem. (You can certainly send tiny amounts, but txn fees will eat you alive.)

I even asked coblee, Litecoin maintainer, whether they would consider lowering transaction fees to allow microtransactions. He said no, it's not an option right now because they had a problem with uncontrolled blockchain growth due to dust transactions in past.

So I'd say that hypothetical blockchain growth already scares away potential users.

But if we had at least a theoretic solution for blockchain growth  we could tell people that microtransactions are possible, just not implemented right now. That might make a difference.

Also, Meni's proposal of pruning away outputs which are still alive would be a major problem for colored coins/smart property ("oh, you cannot sell you car anymore, it was pruned away, miners won't approve this transfer now"). So I want to find some alternative to it Smiley
legendary
Activity: 1022
Merit: 1033
October 11, 2012, 09:34:23 AM
#4
There have been many discussions on scalability over the years here. It's an old topic.

Yes, I'm aware of general scalability discussions... I wonder if there are schemes which can fundamentally reduce storage requirements, particularly through use of divide and conquer.

Quote
Have you read the wiki page here? It's out of date - scalability numbers got a lot better since:

  http://en.bitcoin.it/wiki/Scalability

Yes, I've read it.

A problem with this article is that it considers scalability from a point of view of a large commercial operator.

But scalability woes can easily prevent end-users from running their own full nodes. Which means that we can end up with just a handful of full nodes which belong to commercial operators. Which are profit-motivated and will run away as soon as costs exceed profits...

This might create security problems. It's much easier to pull off 51% attack if you can disable majority of full nodes via DDoS.

E.g. consider such scenario: there are only 5 mining pools. Operator of 1 mining pool disables 4 other pools via DDoS. Now he can easily do whatever he wants as miners will migrate to his pool.

And p2pool can't help here because very few enthusiasts can afford to run a high-end cluster for a node just for the hell of it.



So I think it's important to consider whether solutions to scalability problems, particularly to storage requirements, exist at least theoretically for blockchain-based cryptocurrencies.

If they do not exist, it would make sense to look for alternatives to blockchain. E.g. list of account balances, as described in Ben Laurie's paper.

Pruning is definitely not a fundamental solutions because it's likely that number of unspent transaction outputs would be proportional to number of transactions for quite a while, until we reach saturation.

There are some proposals of more aggressive forms of pruning, where we will cut outputs which still can be spent:
https://bitcointalksearch.org/topic/output-upkeep-costs-80435

This might help, but I wonder whether there is a better way to do this.

Quote
The bottleneck is not at any time storage. Terabyte hard drives are cheap

It depends how you look at it... Consider somebody running a web store store. A VPS with 1 GB RAM is usually quite adequate to run a web store.

But, for example, a VPS with 1 GB RAM from Linode has only 40 GB of storage. So blockchain might eat a significant portion of it, and thus it would discourage people from running full nodes.

Quote
Given the simplicity of the existing design yet the fact implementation issues are still present, I can't forsee a time when a dramatically more complex design will be desirable vs simply optimizing what we have.

A design sketch I presented above is definitely not much more complex. It is largely theoretical, but I'm fairly sure it can be implemented right now. (I mean, unless there is no major hole in my reasoning.)

Tracking spent transactions in separate chains and linking it from main blockchain is pretty much trivial to implement.

Using this information is also fairly easy and straightforward. It doesn't even require hard fork: nodes who have pruned away too much of their blockchain copy might just start using spent-transaction-chains for verification, and that's all.

legendary
Activity: 1526
Merit: 1134
October 11, 2012, 06:58:09 AM
#3
There have been many discussions on scalability over the years here. It's an old topic. In fact back in 2009 I asked Satoshi about this very topic and he said this:

Quote
The existing Visa credit card network processes about 15 million Internet purchases per day worldwide.  Bitcoin can already scale much larger than that with existing hardware for a fraction of the cost.  It never really hits a scale ceiling.  If you're interested, I can go over the ways it would cope with extreme size.

By Moore's Law, we can expect hardware speed to be 10 times faster in 5 years and 100 times faster in 10.  Even if Bitcoin grows at crazy adoption rates, I think computer speeds will stay ahead of the number of transactions.

Have you read the wiki page here? It's out of date - scalability numbers got a lot better since:

  http://en.bitcoin.it/wiki/Scalability

Updating it with the latest statistics is on my ever-growing todo list.

I don't believe any fancy schemes or protocol changes are required. After some more software optimizations, a single server-class machine with todays hardware should be capable of keeping up with traffic loads similar to all of VISA. But if that ever happens (which is a big if), we won't be using todays hardware, we'll be using tomorrows hardware.

The bottleneck is not at any time storage. Terabyte hard drives are cheap and pruning nodes need to keep only the unspent output set. Only a small number of nodes need to keep the entire block chain around so they can be used to bootstrap other nodes, though in practice I anticipate quite a few people will keep the entire block chain just because they can.

Even if you assume that usage patterns change such that we need to process many more transactions per second than VISA, there's plenty of scope for software optimization to change the game. As an example, by the time we reach such huge traffic levels we'll have needed to hard-fork the system to remove the 1MB per block limit. At the same time we do that, modern signature algorithms like ed25519 will have matured and become trustable. It can provide orders of magnitude speedups vs secp256k1 with standard ECDSA. Another example that doesn't require a hard fork is batched verification.

While it's interesting to consider alternative network designs, simplicity is really important. Bitcoin is already a very simple network and yet we sometimes struggle to maintain its integrity, eg, we've had problems last year with running out of sockets, and at the moment a big problem is the software doesn't detect and avoid overloaded/obsolete nodes. Given the simplicity of the existing design yet the fact implementation issues are still present, I can't forsee a time when a dramatically more complex design will be desirable vs simply optimizing what we have.
legendary
Activity: 1022
Merit: 1033
October 11, 2012, 01:49:27 AM
#2
Ah, OK: bin integrity can be verifiable if they are linked into a hash-chain, so that each element includes hash of the previous one. It is enough to keep  last hash of each bin on verifying node to verify integrity of any bin.

Thus we have such space trade-off:

1. Normally verifying node stores whole blockchain. For example with 1 billion transaction and 250 bytes per transaction verifying node needs 250 GB of storage (minus whatever pruning can save). To send his money one needs to broadcast only his transaction, which is like 250 bytes for example.

2. On the other hand, if we move burden of proof to client using binning divide-and-conquer strategy, verifying node needs only block headers and most recent hashes in each bin. Suppose we use number of bins N = 1,000,000. Then bin hash storage requires 32 MB. 10 years worth of hashes requires ~ 42 MB of storage. So 74 MB total.

Client needs to store parts of Merkle tree which link his transaction to the root (this depends on number of transactions in block but likely something on scale of kilobytes) + all invalidation messages from the bin transaction belongs to. For example, with one billion transactions and one million bins, there are approximately 1000 messages per bin. One message is around 100 bytes. So total amount of information stored on client is 100 around KB per transaction. This information needs to be submitted to verifying node (unless it already has it cached).

So we have

1. 250 GB / 250 bytes in the first case
2. 74 MB / 100 KB in the second case

If we choose larger N we reduce amount of information client needs to store, but increase storage requirements on server.

For a very large N we'll have approximately one transaction per bin, so we get back to a situation where verifying node keeps track of every transaction. But for smaller N we essentially cluster many transactions in each bin, thus reducing storage requirements of verifying node.

As I said, this trade-off is similar to one used in hash tables and Bloom filters.

This scheme helps only with storage space but not with bandwidth. But a modest 10 Mbit/s connection can transfer entire 250 GB long blockchain in ~55 hours. So I'd say bandwidth won't be a bottleneck in scenarios considered here.
legendary
Activity: 1022
Merit: 1033
October 10, 2012, 05:49:12 PM
#1
Disclaimer: I did not follow all developments in this area, so I apologize if it was discussed before.

It is expected that in the long term blockchain size will stabilize via pruning, but what if it will stabilize on a really high value, so high that few nodes will be able to chew it? Blockchain size is already a problem now...

I've heard that perhaps we could prune it harder, i.e. remove even those outputs which are still spendable. But then this aggressive pruning must be synchronized, otherwise if somebody prunes blocks before others do that he won't be able to verify blocks which mention UTXOs he pruned away...

Quite often performance problems can be solved through divide-and-conquer strategy. For example, in distributed hash tables large amounts of data can be spread among many nodes in such a way that each node carries only little amount of data.

But it won't work for blockchain because the whole point of blockchain is to prove lack of double spends, and apparently the only way to prove lack is to show the whole thing... Or is it?

I think some trade-off is possible here, remotely similar to trade-offs used in hash tables and Bloom filters...

So the problem we are trying to solve is giving person ability to prove that his transaction output is unspent, i.e. there was no transaction which spent it since it was introduced into blockchain. With existing approach such proof would consist of a full, non-pruned blockchain since transaction in question... Which might be a bit too big.

Can we make it smaller? Well, why not. Let's define N bins such that each transaction hash falls in one of them. (E.g. if N=10^20 bin would be identified by first 20 bits of txn hash.) With each block we'll include extra information which notes which transaction outputs were spent in this block for each bin. This information must be verified by miners in same way they verify transactions.

Now to prove that my output is not spent, I need to show only content of one bin, which is presumably something on scale of 1/N of total blockchain size... Which might be manageable for large N.

E.g. suppose we have 1 billion transactions, one spending notice is encoded in 40 bytes and we have N=1,000,000. Then  proof that output wasn't spent is just 40,000 bytes... Which is definitely manageable.

However, I did not account for one thing: we need to know which blocks have data for which bins, and this requires certain amount of bytes per block. higher N will generally requires more bytes... So there is likely some tradeoff where N is optimal for given amount of transactional activity.

Jump to: