Dan Kaminskys thoughts on scalability

finway

hero member

Activity: 714

Merit: 500

The Bitcoin database is 1GB now,
Bitcoin.exe is somehow pushing me to close it.

As we can see,
number of listening nodes shrinks to less than 10k,
number of total nodes shrinks to less than 50k.

Scalability is a real problem.
More and more people choose to close there clients.

mndrix

vip

Activity: 447

Merit: 258

Technical scalability optimizations, like Mike Hearn's and Steve's, buy us a lot of growing room. There are also social scalability optimizations. Most distributed systems naturally develop some centralized institutions. For example, payment with physical currency is a distributed system, yet many people prefer the convenience of banks and credit cards. In the US, about 90% of people have a bank account. The remaining 10% enjoy the benefits of anonymous cash.

As Bitcoin becomes more popular, a larger percentage of transactions will occur through centralized institutions and therefore take place outside the block chain. Large Bitcoin "banks" will likely settle payments with a single transaction at the end of each business day. The block chain is unlikely to bear the entire burden of Bitcoin's popularity.

I suspect that transaction volumes will level off as Bitcoin centralizes slightly, leaving plenty of room for amateurs to participate directly in the network as peers.

bcearl

full member

Activity: 168

Merit: 103

I thought about it once again:

Dan, it is not like banks. Even if you end up with large companies managing the block chain, the user has the ECC private keys - which means the user has the power over his money. This empowers him to dump a bank and go to another one in minutes.

It will not be fully decentralized in the current sense, and it may cost some fees. But it's still totally different from the current system, where banks can just burn your money.

markmark

newbie

Activity: 7

Merit: 0

I also see a problem with the amount of data that has to be stored on the file system
200MB might not be much right now, but a few GB are. especially on mobile systems etc.
one solution to this would probably be "trusted" servers that host the complete chain and offer the ability to look up what you need in this foreign chain.
so that the data is stored on a webservice base and you don't need them locally.
don't know if that makes sense, though.
please correct me if i'm wrong.

Steve

hero member

Activity: 868

Merit: 1008

I agree with the OP's analysis. I would further add that should a scalability limitation prove practically insurmountable (or practically insurmountable in relatively short time frame) you always have the possibility of off block chain transactions backed by bitcoin. And these off block chain transactions don't have to utilize database account entries (which we know are all too vulnerable). They could instead use privately issued block chains. These would introduce new coins when real bitcoins are deposited and destroy coins when they are redeemed. A central authority would be issuing the blocks for such chains (and wouldn't require the distributed proof of work...instead you rely on the central authority to ensure the chain is secure). Such chains would be inherently less trustworthy than the bitcoin chain, but would also be less costly to administer and address both scaling and transaction cost issues. For day to day purchases and micro transactions, I think such private chains would work quite well and no one would have to trust them with very much money for very long periods of time (they could settle on the public bitcoin block chain once a day or even hourly).

bcearl

full member

Activity: 168

Merit: 103

Quote from: Mike Hearn on August 05, 2011, 07:53:32 AM

Yes, obviously the amount of data you have to store goes up with the number of users. That's not exponential unless you have a very optimistic of Bitcoins future! At some point usage will stabilize and working set storage along with it.

I don't say that it must be a problem, as computer capacities grow exponentially as well.

I just wanted to say that pruning does no magic.

Mike Hearn

legendary

Activity: 1526

Merit: 1134

Yes, obviously the amount of data you have to store goes up with the number of users. That's not exponential unless you have a very optimistic of Bitcoins future! At some point usage will stabilize and working set storage along with it.

bcearl

full member

Activity: 168

Merit: 103

Quote from: Mike Hearn on August 05, 2011, 07:19:13 AM

Bitcoin doesn't track balances of addresses. That's not how it works.

I know that. But you have to, if you want to prune the chain.

Quote

You need to store unspent outputs. A transaction doesn't say "spend X coins from address Y". It says "import the value from output N of transaction T using script S". If an address has nothing on it anymore, that's another way of saying there are no unspent outputs containing it. If a new address gets some coins then spends them it can disappear entirely from the pruned chain, such that there's no evidence it ever existed (except in the records of nodes that deliberately choose to not prune).

Yes, but that does not happen. The number of addresses will still go up for a number of reasons, e. g. the increasing number of users and the BTC deflation.

Mike Hearn

legendary

Activity: 1526

Merit: 1134

Quote

Pruning doesn't help you that much, because not just the number of transactions but the number of addresses also goes up exponentially. You can get a different expon entially growing parameter, but it's still a problem. You are required to store at least the balance of each address at some point.

Bitcoin doesn't track balances of addresses. That's not how it works.

You need to store unspent outputs. A transaction doesn't say "spend X coins from address Y". It says "import the value from output N of transaction T using script S". If an address has nothing on it anymore, that's another way of saying there are no unspent outputs containing it. If a new address gets some coins then spends them it can disappear entirely from the pruned chain, such that there's no evidence it ever existed (except in the records of nodes that deliberately choose to not prune).

bcearl

full member

Activity: 168

Merit: 103

Quote from: Mike Hearn on August 05, 2011, 06:22:38 AM

with pruning and other efficiency improvements it's probably OK to store the entire chain either in RAM or on flash.

Pruning doesn't help you that much, because not just the number of transactions but the number of addresses also goes up exponentially. You can get a different expon entially growing parameter, but it's still a problem. You are required to store at least the balance of each address at some point.

Mike Hearn

legendary

Activity: 1526

Merit: 1134

Quote from: JoelKatz on August 05, 2011, 06:20:40 AM

That would help a lot if, and only if, someone could come up with a clever and secure way to move bitcoins to another block chain and them move them back so that they could be temporarily traded in another block chain.

Trading between chains automatically and securely is quite possible. There was somebody who proposed a secrets based protocol that allows you to trade between Bitcoin-like alternative chains. The coins become traded simultaneously so trust in your counterparty is unnecessary.

That said I'm not sure multiple independent chains for Bitcoin is a good idea, but it's possible.

Mike Hearn

legendary

Activity: 1526

Merit: 1134

Quote

You have to verify that the transaction really came from that block, which requires even more crypto than checking the transaction itself.

No, transactions are checked when they are broadcast. If a block is announced, you already checked most of the transactions. There might be a few you didn't see, you can request and check them then.

Verifying a merkle branch is very fast. The slowness in tx verification comes from ECDSA signature checking and from the need to load the dependencies (database lookups).

Quote

Maybe it is also time to specify a "light" protocol where mobile devices do not store a blockchain but only their private keys and cache their balance locally, then to spend and to check for incoming transactions they talk to such a supernode of their choice (without giving away any keys of course) I believe something like this has already been started, i don't remember where that was.

There are different ways to do it, depending on the level of lightness you want. BitCoinJ implements "simplified payment verification" where difficulty is used as proxy for validity, as described in Satoshis paper. You can connect directly to the P2P network that way, you have to download the full chain but don't have to verify it. With protocol improvements, in future you may need to only download parts of the chain rather than all of it (the parts that are relevant to you).

WebCoin implements an even lighter protocol than that, where you keep only your keys and talk to a server that indexs the chain more heavily than usual. When you want to spend you provide your pubkeys and ask for a transaction. The server provides you with one which you then sign, after checking it's what you asked for. The server can be relatively untrusted and stateless, ie, you could theoretically switch between such servers at will or have that functionality integrated into the full Bitcoin node software. The disadvantage is, the server needs to know all of your addresses, unless you keep track of what the address balances should be using some out of band mechanism.

Quote

Agreed that the signature checking burden can certainly be distributed. But, doesn't this clustering of the bitcoin node add a synchronization burden? How would it avoid race conditions with transactions sent to different nodes trying to connect the same inputs?

The backend nodes still have to update the spent flags for the connected outputs, even for transactions from trusted nodes. Otherwise they can't verify transactions from untrusted nodes in future. The part that is skippable is running the scripts (ie, checking the signatures).

To be clear, what I gave here is just a proposal. You'd want to experiment with different tradeoffs and work-sharing schemes to find one that is ideal. For instance, should the backend nodes all maintain their own copies of the block chain, or should they share a single copy? If they share it, do you keep it in a sharded SQL database or some other scheme? Or perhaps what you want is a single Bitcoin node that does almost everything itself, but passes the ECDSA checking off onto worker pools, much as mining is done. The bottleneck then would be disk/db IO - with pruning and other efficiency improvements it's probably OK to store the entire chain either in RAM or on flash.

Exploring these design tradeoffs might make for an interesting university project.

JoelKatz

legendary

Activity: 1596

Merit: 1012

Democracy is vulnerable to a 51% attack.

Quote from: twobits on August 05, 2011, 06:18:10 AM

I am curious what the thoughts are on merged mining and alternative blocks. Would they help with scalabilty issues by moving some of the load to other blockchains?

That would help a lot if, and only if, someone could come up with a clever and secure way to move bitcoins to another block chain and them move them back so that they could be temporarily traded in another block chain.

That way, you could have a 'fast' block chain that only ran for, say, six months that handled small, fast transactions. Bitcoins would leave the main hash chain to enter the fast chain, could be quickly traded, and then anyone could 're-import' their bitcoins into the main hash chain. But, alas, nobody has yet come up with a way to make this work.

JoelKatz

legendary

Activity: 1596

Merit: 1012

Democracy is vulnerable to a 51% attack.

Quote from: samr7 on August 05, 2011, 06:12:28 AM

Agreed that the signature checking burden can certainly be distributed. But, doesn't this clustering of the bitcoin node add a synchronization burden? How would it avoid race conditions with transactions sent to different nodes trying to connect the same inputs?

I think the fix is to have a node be trusted only to have checked the signatures. So the other (cheap) checks, such as for conflicting transactions claiming the same outputs, are still done.

twobits

sr. member

Activity: 574

Merit: 250

I am curious what the thoughts are on merged mining and alternative blocks. Would they help with scalabilty issues by moving some of the load to other blockchains?

samr7

full member

Activity: 140

Merit: 430

Firstbits: 1samr7

Quote from: Mike Hearn on August 05, 2011, 05:15:38 AM

Here's one way you could distribute tx checking load: add a new command line switch, call it -trustednode=1.2.3.4:8333. Transactions received from a trusted node would not be checked, they'd be implicitly assumed valid and included in the current block. Obviously it's highly dangerous to mark any node other than another one you run yourself as trusted. Bring up some nodes on different machines and set them all to connect to each other as trusted nodes. These backends are not exposed directly to the p2p network. Instead a load balancer is put in front of them. The load balancer is not a trusted node. It may not even be based on Satoshis code. Its job is to take a transaction and then send it to one of the backends based on some sharding function, eg, dividing by the tx hash. On receiving a tx from the load balancer, a backend machine verifies it, then broadcasts it out to the other backends which then skip verification. Nodes should stay in sync as they all see the same transactions.

Agreed that the signature checking burden can certainly be distributed. But, doesn't this clustering of the bitcoin node add a synchronization burden? How would it avoid race conditions with transactions sent to different nodes trying to connect the same inputs?

bcearl

full member

Activity: 168

Merit: 103

Quote from: Mike Hearn on August 05, 2011, 05:15:38 AM

Here's one way you could distribute tx checking load: add a new command line switch, call it -trustednode=1.2.3.4:8333. Transactions received from a trusted node would not be checked, they'd be implicitly assumed valid and included in the current block. Obviously it's highly dangerous to mark any node other than another one you run yourself as trusted.

Do you think an IP-Adress does the job? You have to verify that the transaction really came from that block, which requires even more crypto than checking the transaction itself.

apetersson

hero member

Activity: 668

Merit: 501

good read. would give +1

It would be interesting to try out your load balancing approach in Testnet, and simulating really high transaction load.

Maybe it is also time to specify a "light" protocol where mobile devices do not store a blockchain but only their private keys and cache their balance locally, then to spend and to check for incoming transactions they talk to such a supernode of their choice (without giving away any keys of course) I believe something like this has already been started, i don't remember where that was.

Mike Hearn

legendary

Activity: 1526

Merit: 1134

I'm going to respond to the scalability sections of Dan Kaminskys slides. Maybe now Dan is whitelisted he can comment here.

http://www.slideshare.net/dakami/bitcoin-8776098

I believe he presented this at Blackhat. As the author of the Scalability page, I was amused to read the "epic scalability quote" that came from me Wink

OK, first up, I really don't think building a distributed supernode would be very difficult. Perhaps my views on building distributed systems have been warped by spending ~5 years at Google, but Bitcoin nodes are one of the easier things I've seen to spread over a bunch of machines. In fairness, I didn't describe how to do that on the wiki page, and the page is itself a mishmash of things put together at different times. I'll try and clean it up a bit next week. In particular it mixes together stuff that's true today with stuff that could be trivially true in future, and it's not always clear which is which.

The core bottleneck in a Bitcoin node is verifying transactions as they come in from the broadcast network. Other things nodes do are either rare (handling re-orgs) or already distributed (mining) or potentially expensive in a few unusual situations, like if you're sending and receiving tons of spends from your wallet specifically (a big BitBank like InstaWallet). If you're a regular merchant, services provider or even hobbyist your node will spend 99% of its time checking transactions.

Here's one way you could distribute tx checking load: add a new command line switch, call it -trustednode=1.2.3.4:8333. Transactions received from a trusted node would not be checked, they'd be implicitly assumed valid and included in the current block. Obviously it's highly dangerous to mark any node other than another one you run yourself as trusted. Bring up some nodes on different machines and set them all to connect to each other as trusted nodes. These backends are not exposed directly to the p2p network. Instead a load balancer is put in front of them. The load balancer is not a trusted node. It may not even be based on Satoshis code. Its job is to take a transaction and then send it to one of the backends based on some sharding function, eg, dividing by the tx hash. On receiving a tx from the load balancer, a backend machine verifies it, then broadcasts it out to the other backends which then skip verification. Nodes should stay in sync as they all see the same transactions.

Would the load balancer become a bottleneck? I doubt it. Servers at Google are capable of handling thousands of RPCs per second per core. A single 16-core server could balance all the traffic Bitcoin needs for years of growth. If one day the traffic is so huge a single machine can't even load balance them, filtering can be added to the protocol so a connection can opt to receive only a subset of all traffic (eg selected by hash again).

This doesn't scale infinitely, but there's no such thing as a distributed system that has zero scaling bottlenecks at all. After doing that the next problem might become network bandwidth or processing addr broadcasts or handling giant wallets.

On storage, Dan doesn't mention the ability to do tx pruning (perhaps because it is, like so much else, not implemented yet). The savings were calculated to be around 70% of disk space given the current state of the economy, though changes in how Bitcoin is used would change that figure as well. Tx pruning makes outstanding storage linear in the number of unspent outputs, rather than the total history of the economy.

Dan asserts that supernodes are basically banks. I disagree. The word "bank" combines many different things together. A high capacity Bitcoin node has no properties of traditional banks - they don't make loans, take on customers, etc. They just track and verify transactions. Dans assumption is that supernodes would be often combined with other functions like hosting peoples wallets and mining. I don't see any technical reason that should be so, though it might end up that way for other reasons.

How would supernodes be funded? This is like asking how web servers can be funded. Some companies choose to run their own, others choose to outsource. Some people rent personal servers for personal reasons. Whatever route is chosen, there is enough scope for optimization that running a supernode is unlikely to tax any but the poorest of organizations.

Topic: Dan Kaminskys thoughts on scalability (Read 4231 times)