Author

Topic: Scaling the bitcoin protocol (Read 2665 times)

legendary
Activity: 1708
Merit: 1011
September 28, 2010, 01:30:33 PM
#5
So to restate the constraints:
  • We do not want to allow network segmentation
  • A node that generated a hash does have to know all transactions
  • Transaction fetching can be delayed until a hash has been generated
  • A block (hash + transactions) has to be broadcast to all nodes in the network

Advantages of a Hypercube with node clusters:
  • Highly redundant transaction tracking (depending on the size of each cluster)
  • Flexible dimension (should a cluster become too big we just increase the dimension)
  • Very efficient routing
  • Very efficient broadcast
  • Logarithmic storage need

And yes, I think the network topology should be documented and evolved on its own Wiki page :-)

In order to impliment a structured topology, and particularly anything similar to a hypercube topology, requires that nodes connected to each other can ask for the nodes that it's peers are connected to, and then be able to decide how to restructure the connections to fit the model.  This would not only be difficult and prone to malicious tampering, this data could also, theoretically, be used to associate particular transactions to particular clients, resulting in loss of autonomy.  Suddenly, an agent with the right resources could actually prove which transactions originated from a particular client.

Also, a true hypercube is not ideal on the Internet, as the advantages of a hypercube are muted by the fact that (most) nodes only have one physical connection to the Internet, and all virtual circuts must share that bandwith.  The largely random manner that Bitcoin forms a network is probably not ideal, but it's simple and effective.  Intentional network connections can be forced upon a client with the -addpeer switch, so major players (think Mybitcoin.com in another 3 years) can connect together forming a core topology however they choose.  However, I would doubt that a hypercube is what they would choose, since they are not constrained by a physical limit upon connections.  They are most likely to simply maintain connections to all of the major peers, and let the transactions flow as they may.
administrator
Activity: 5222
Merit: 13032
September 28, 2010, 01:07:54 PM
#4
Non-generating nodes only need the Merkle tree and block headers, which is not very much data. (Use of the abbreviated block chain isn't implemented yet, though.) Maybe someday generators will need a strong Internet connection, but since the average size of a transaction is 214 bytes and the average number of transactions per block is 2, this will not be any time soon.

Generators will eventually become the network's backbone. They will connect to each other with strong connections, and clients will connect to only a few generators. Like ultrapeers in Gnutella, but normal people will not typically run generators.

If it ever becomes difficult for even members of the backbone to download the entire block chain, most of the transactions can be forgotten. Only transaction end-points need to be kept.

Segmentation is not a serious problem unless it lasts for longer than 100 blocks (the network-enforced maturation time for generations). The network will heal itself perfectly for any shorter segmentations.
legendary
Activity: 1708
Merit: 1011
September 28, 2010, 01:07:18 PM
#3
Another beginner question.

Is there a wiki article somewhere discussing how the bitcoin system is going to scale to a large economy?  I have seen the stuff about coins being divisible so it can be used for any volume of traffic, but what about the blocks?

Currently each client has to collect signed blocks of transactions that are distributed across the network.  These blocks contain every transaction made by every user.  I understand that like bittorrent a large amount of data can be sent to every user, but ever user seeing every transaction isn't going to scale.  Moreover, if you are in generating-mode your client needs to receive all pending transactions to see if they can be combined to create a new block.  (I assume the pending transactions are not sent to clients that are not generating blocks.) That is twice the volume and is fragmented into many different pieces.  What is the plan for this to scale?


This is actually not required, and not all clients, generating or not, always see every transaction.  Mostly because this isn't a requirement, Bitcoin can scale very well.  The target block interval was chosen as a balance between future network latency and timeliness of confirmations.  The amount of data that a given client sees is partly related to the number of connections that it maintains, as the same transactions can be sent to a given client from multiple peers.  But a full client doesn't need more than two or three trusted peers, just to be able to verify that the blocks that it receives from a single source aren't fake.  A light client needs only one trusted connection, and may be able to do without the blockchain at all.  Clients not in the business of generating don't need transaction traffic, and light clients wouldn't see much traffic not their own.  Granted, the blocksize could grow very large when the number of transactions it is recording is in the 100's of thousands, but by that time the business of block generation would likely be confined to the computing resources of major financial institutions which have a deep vested interest in blockchain security.

Quote

My plan: (only half baked)

Here is now I would do it.  Really a client only needs to examine transactions that might potentially be directed to itself so it really doesn't need to track all transactions, just the subset of transactions that could contain the client's address. (Multiple addresses are considered later) So we allow the block chain to fragment to different regions depending on the hash of the recipient of the transactions.   This done in a decentralized fashion just like the current target stuff is handled.  Every N blocks the transaction volume is examined and if the volume exceeds some threshold then the block chain is split into two by the hash.  Then we have two parallel block chains depending on the receivers hash.   Then clients only need to be tracking those blocks that can potentially contain transactions for itself.  A client may also be tracking multiple chains, but only generating blocks on one chain so the detailed transactions are only needed in that region.



Division of the blockchain seems to come up regularly, but it won't work.  There is no way to maintain the security of the blockchain if there is any way to permit more than one concurrent blockchain.  There is no way around this without breaking the system as it is, and without losing the autonomy of cash-like currency.

Quote

This does mean that ideally when a client creates multiple addresses it should be able to make them all cluster around the same region of the hash so only a faction of the total block chains need to be tracked.


Introducing a bias into the encryption is also unwise, for entirely differnet reasons.


hero member
Activity: 489
Merit: 505
September 28, 2010, 09:54:06 AM
#2
The main problem right now is that the network is completely unstructured, there is no guarantee that the network will not split and then continue on two different chains. This is a problem since any coins generated after the split will not be accepted by the other networks. If this happens often enough the whole system becomes pointless (what's the point of having 1m coins if you have nowhere you can spend it?).

Also the fact that each transaction is broadcast to all clients will never scale. For Bitcoin to work properly the node that signs the block must have all transactions.

What I'd like to see is a structured network (think torus or hypercube), at whose joints we create highly connected clusters of machines to be redundant, then each of these cluster tracks a certain prefix. Once a hash has been found the node that wishes to announce it just fetches all unsigned transactions (echo-algorithm), signs it and floods the block (hypercubes have incredibly efficient flooding, remember?).

So to restate the constraints:
  • We do not want to allow network segmentation
  • A node that generated a hash does have to know all transactions
  • Transaction fetching can be delayed until a hash has been generated
  • A block (hash + transactions) has to be broadcast to all nodes in the network

Advantages of a Hypercube with node clusters:
  • Highly redundant transaction tracking (depending on the size of each cluster)
  • Flexible dimension (should a cluster become too big we just increase the dimension)
  • Very efficient routing
  • Very efficient broadcast
  • Logarithmic storage need

And yes, I think the network topology should be documented and evolved on its own Wiki page :-)
newbie
Activity: 20
Merit: 0
September 28, 2010, 08:59:44 AM
#1
Another beginner question.

Is there a wiki article somewhere discussing how the bitcoin system is going to scale to a large economy?  I have seen the stuff about coins being divisible so it can be used for any volume of traffic, but what about the blocks?

Currently each client has to collect signed blocks of transactions that are distributed across the network.  These blocks contain every transaction made by every user.  I understand that like bittorrent a large amount of data can be sent to every user, but ever user seeing every transaction isn't going to scale.  Moreover, if you are in generating-mode your client needs to receive all pending transactions to see if they can be combined to create a new block.  (I assume the pending transactions are not sent to clients that are not generating blocks.) That is twice the volume and is fragmented into many different pieces.  What is the plan for this to scale?

My plan: (only half baked)

Here is now I would do it.  Really a client only needs to examine transactions that might potentially be directed to itself so it really doesn't need to track all transactions, just the subset of transactions that could contain the client's address. (Multiple addresses are considered later) So we allow the block chain to fragment to different regions depending on the hash of the recipient of the transactions.   This done in a decentralized fashion just like the current target stuff is handled.  Every N blocks the transaction volume is examined and if the volume exceeds some threshold then the block chain is split into two by the hash.  Then we have two parallel block chains depending on the receivers hash.   Then clients only need to be tracking those blocks that can potentially contain transactions for itself.  A client may also be tracking multiple chains, but only generating blocks on one chain so the detailed transactions are only needed in that region.

This does mean that ideally when a client creates multiple addresses it should be able to make them all cluster around the same region of the hash so only a faction of the total block chains need to be tracked.

Then when a transaction is completed the DHT is used to find some other client to accept the transaction if the receiver is in a different region than the sender.  This does open up some security concerns because then clients might not get direct conformation that a transaction has been accepted.  Hmmm OK, so I do have some issues making sure money moving between chains are matched up..

This seems like it should scale nicely to very large transaction volumes without running into network traffic issues.

Hopefully my picture is clear.   It there an official plan like this?  Is this not really a problem like I think, if so why not?
Jump to: