Author

Topic: Numerically finding an optimal block size using decentralisation-utility (Read 789 times)

hero member
Activity: 714
Merit: 500
Martijn Meijering
I haven't personally chased this kind of approach because there will be so many judgements that the result won't really be interesting to anyone but the person who created it... but perhaps I'm wrong.

Making an interactive tool out of this could be very useful, because then it could be interesting to anyone who played with it. It will remain somewhat subjective, but at least it could inject some rational analysis into the debate, and has the potential to make some people change / refine their opinions.
staff
Activity: 4284
Merit: 8808
Assuming a user will use all of their bandwidth is probably not good, You probably need to assign a marginal utlity to running a node and then scaling their tolerance based on their marginal utility and the available bandwidth.

I haven't personally chased this kind of approach because there will be so many judgements that the result won't really be interesting to anyone but the person who created it... but perhaps I'm wrong.

sr. member
Activity: 433
Merit: 267
There's a lot of arbitrary assumptions here.

1.) All potential nodes are able and willing to use 100% of their upload speed for block propagation.

2.) 10,000 nodes is the ideal number of nodes.

3.) Western broadband speeds are the only networks we should consider. ("I was unable to find a good enough dataset for anywhere else.")

4.) The issue of funding the network post inflation will be solved in the future.

5.) Nodes are willing to download 20GB/month cumulatively ad infinitum.

6.) We should consider forks to Bitcoin that we suspect will probably require more forks in the future.

7.) Bad assumptions won't lead to a Bitcoin block size that is ultimately unsustainable.
member
Activity: 78
Merit: 11
Chris Chua

This is not true.  On average every node has to receive each transaction and block only once.  This means that on average each node has to send each transaction once.  Some (fast) nodes may end up sending transactions lots of times.  The closer your node is to miners, the more likely it is to have to send blocks multiple times.

Thank you for helping me find a better value for the bandwidth multiplication factor. My initial guesstimate of 10 was really handwavy and I agree with your methodology. But I do like to use real-world data, so:
  • Number of peers: 20 seems like a good minimum value for a healthy Bitcoin network - 8 for outbound connections, 8 for inbound connections, and a few for crawlers/leechers.
  • tx size: the average tx size of the last 20160 blocks, beginning from block 363900, was 560 bytes.
  • inv messages: My sniffer logged 13743 packets with 18374 inv messages. Including the various protocol overheads, the result was 98 bytes per inv.
  • tx messages: Overhead (as determined from sniffer) was about 90 bytes per tx message - so an average of 650 bytes per tx.
  • block messages: I'll assume overhead is negligible here, since blocks are bulk data. Thus block messages contribute an average of 560 bytes per tx.
The result is (20 * 98 + 2 * 650 + 2 * 560) / (560) = 7.8
I shall update my original post/graphs with this factor.

I'm including overhead because ISPs do count overhead towards caps.

Interesting that nearly half of bandwidth attributable to a single tx is due to inv messages. Looks like there's room for optimisation there.
legendary
Activity: 1232
Merit: 1094
and that block data needs to be transferred a total of 10 times. This is reasonable, as Bitcoin is a P2P gossip network and transactions/blocks may need to be retransmitted to peers.

This is not true.  On average every node has to receive each transaction and block only once.  This means that on average each node has to send each transaction once.  Some (fast) nodes may end up sending transactions lots of times.  The closer your node is to miners, the more likely it is to have to send blocks multiple times.

If you are connected to 20 peers and a new transaction is broadcast.  On average 10 of those peers will receive the transaction once.  Each of those peers will send you an inv message with the hash of the transaction.  You will then ask for one of them to send you the transaction.  Once you add the transaction into your memory pool, you will inform the other 10 peers about it and one of them will (on average) ask you to send it to them.

That means that you need to send or receive a hash digest to every peer.  Either they inform you that they got the transaction or you inform them that you have it.  Due to propagation delays, sometimes you will both send transaction digests at each other.

This means that a transaction costs

peers * digest_size + 2 * transaction size

The same logic applies to blocks, but less often.

With 20 peers and a 250 byte transaction, that costs

20 * 32 + 2 * 250 = 1140

You also receive the transaction a second time when you receive the block and you have to also forward the block to one peer.

That is another 500 bytes.

This means that a 250 byte transaction costs 1640 bytes or 6.56 times its base size.

If blocks are 1MB each, then the node needs a total bandwidth of 6.56MB to keep up.

If the node falls behind, then that reduces the chances that it needs to send transactions and blocks to its peers.  That reduces the bandwidth slightly.  Blocks and transactions may not have to be forwarded.
member
Activity: 78
Merit: 11
Chris Chua
Finding an optimal block size is about tradeoffs - too big, and you have centralisation, too small, and Bitcoin cannot support as many users. How can this be quantified? Here is a simple and general figure-of-merit, "decentralisation-utility":

decentralisation-utility = potential number of full nodes * potential number of users

Decentralisation is represented by the potential number of full nodes. Utility is represented by the potential number of users. The question of finding an optimal block size is then: how can decentralisation-utility be maximised? For example, it's better to reduce potential number of users by 20%, if that would mean 30% more potential full nodes.

Potential number of full nodes

For most residential full node operators, the limiting factors will probably be data caps and upload speed. It's difficult to get combined statistics on this, but I did find one excellent data set. The FCC's Measuring Broadband America Report (see https://www.fcc.gov/reports/measuring-broadband-america-2014) contains a wealth of data about broadband Internet connections in the US. Commendably, they have also made the raw data available from https://www.fcc.gov/measuring-broadband-america/2014/raw-data-fixed-2013. The data is supposedly representative of the general consumer population.

The raw data contains cross-referenced upload speeds and total usage (downloaded and uploaded bytes). I will use total usage to represent "proven" capacity - if a user is able to transfer 20 GB/month of any data, this demonstrates that they are capable of transferring 20 GB/month of Bitcoin block data, and that 20 GB/month fits within their data cap. Furthermore, I calculated the maximum amount that could be transferred at their upload speed, and if this was lower than their total usage, then that becomes their proven capacity.

Here are the results:

This graph represents the proportion of users who are capable of transferring the amount on the x-axis. For example, if we set block size limits so that Bitcoin needs 10 GB/month, then 98% of users can run full nodes. But if we set block size limits so that Bitcoin needs 100 GB/month, then only 67% of users can run full nodes.

I'll assume that there are 10,000# people out there wishing to run full nodes. As block size is increased, an increasing proportion of those people will be unable to run full nodes due to lack of data capacity.

Potential number of users

I'll assume that each "user" of Bitcoin requires 1000# bytes of on-chain space each day. This corresponds to 2 - 4 transactions per day, per user. Each MB of block size then supports 144,000 Bitcoin users. Layer-2 networks like Lightning will increase the number of potential users, as users will (on average) require less on-chain space

Combining these

There is still one more question: how does block size (in MB) influence data usage (in MB/month)? I'll assume completely full blocks every 10 minutes, and that block data needs to be transferred a total of 10 times. This is reasonable, as Bitcoin is a P2P gossip network and transactions/blocks may need to be retransmitted to peers. Transferring transactions/blocks a total of 10 times means that a full node can relay to at least 9 peers1 and that results in fast enough propagation (< 5 hops for 10,000 nodes). So for example, 1 MB blocks would require 44.6 GB/month of data capacity (1 MB * 4464 blocks/month * 10). Edit: see https://bitcointalksearch.org/topic/m.11793460, the bandwidth multiplication factor is probably closer to 7.8.

Combining all that, here is what decentralisation-utility looks like:


The optimal block size is about 3.8 MB.

Some issues with this analysis:
  • The dataset is for the US. I was unable to find a good enough dataset for anywhere else.
  • The dataset is from September 2013. Bandwidth capacity and usage has grown since then, and will continue to grow.
  • Proven capacity (i.e. usage) will underestimate actual maximum capacity. This figure of 3 MB is quite conservative.
  • The optimal block size is inversely proportional to the bandwidth multiplication factor. For example, assuming that the factor is 2 instead of 7.8 leads to the conclusion that the optimal block size is 15 MB. A bandwidth multiplication factor of 2 is the theoretical minimum that could be obtained via. improvements to the Bitcoin P2P protocol.

tl;dr: Trading off decentralisation and number of users in the US broadband environment results in an optimal block size of 3.8 MB. With P2P protocol improvements this could be up to 15 MB.

1I'm assuming a future where IBLT is part of the Bitcoin P2P protocol, so that transaction data doesn't need to be sent "twice". I'm also assuming that most non-relaying SPV nodes use bloom filtering, so that not every transaction needs to be relayed to them.
#All parameters marked with "#" are irrelevant to the optimisation. It turns out that tweaking these parameters ends up rescaling the y-axis of the decentralisation-utility graph (which we don't care about), but will leave the x-axis untouched (we do care about that as it has block size in it).
Jump to: