Author

Topic: Refreshed the scalability wiki page (Read 1539 times)

legendary
Activity: 1072
Merit: 1189
October 20, 2012, 09:32:10 AM
#18
I probably should have changed the name ultraprune a long time ago, as it is somewhat confusing. It does not prune blocks or transactions, and implements a full node. What it does is use an ultra-pruned copy of the block chain (in addition to the normal blk000*.dat files), for almost all operations, making it significantly faster. It also removes the need for a transaction index (so no blkindex.dat anymore). For serving blocks to other nodes, for rescanning, and for reorganisations it still needs the normal blocks to be present.

That said, this model of working will allow block pruning to be implemented relatively easily. It's almost trivial to do - just delete the block files that you think you won't need anymore for serving, rescanning or reorganising - but having such nodes on the network may have a very significant effect on the system. Doing this will be implemented, but needs some discussion first.

Somewhat longer term, I think we'll see a split between "fully validating nodes" and "archive nodes", where only the latter would serve arbitrary blocks to the network. This may be a problem (because fewer nodes serve all blocks), or it may improve things (as the nodes who still do serve the blocks are those who choose to and have the bandwidth for it).
hero member
Activity: 991
Merit: 1011
October 20, 2012, 09:18:16 AM
#17
one tiny requestion regarding ultraprune. are clients using ultraprune able to share their dataset at all? do they share with other clients using ultraprune? or, for example, to they store recent blocks and transmit them to full clients as well?
legendary
Activity: 1072
Merit: 1189
October 20, 2012, 08:58:59 AM
#16
Ultraprune's unspent transaction output database is around 120 MB is size now (including LevelDB indexes/overhead). Compressed, something around 80-85 MB.

Around block 204149:

LevelDB database:

$ du -sh ~/.bitcoin/coins/
117M /home/pw/.bitcoin/coins/


Raw UTXO data (not directly usable):

$ ls -1hs utxo*
101M utxo.dat
 72M utxo.dat.7z
 94M utxo.dat.bz2
 98M utxo.dat.gz
 70M utxo.dat.lzma

legendary
Activity: 1106
Merit: 1004
October 19, 2012, 05:11:22 AM
#15
You still haven't explained why it's misleading. Why don't you put a proposed rewrite of the bandwidth section here so we can read it?

I think he means that's misleading because it only consider the download side of it. If you're a full node you're also expected to be a relay, so you might have to upload what you download too.

It's hard to estimate how much you'll need to upload since you don't know how many of your peers will receive the data before you're ready to send them. Assuming only pool operators are full nodes and they're all interconnected, then you'll only have to upload transactions when the sender is directly connected to you and not to other full nodes. In this case you'd upload it as many times as you have full node peers. If neither the sender nor the receiver is connected to you (I'm assuming every thin client is using bloom filters), you may not need to relay the transaction at all.

But honestly, if nothing is done to create monetary incentives to relays, I believe those Microsoft researchers might be right and eventually full nodes will not relay transaction between themselves. They have no interest, actually they have a negative interest in doing so. In such scenario (which is not that bad btw), thin clients would better attempt to connect to every full node they manage to find.
Assuming this is the actual scenario, then perhaps we can estimate the total upload a full node would have to handle to be equivalent to the number of transactions times the average rate of false positives thin clients request. Say, if everybody requests 99 false positives for each relevant transaction, then a full node would likely upload 100 times what it downloads. But we should also consider that thin clients have an interest in dispersing their bloom filters in ways that no full node has the entire set. That would reduce the tps rate accordingly.
legendary
Activity: 1372
Merit: 1008
1davout
October 19, 2012, 05:03:06 AM
#14
Ultraprune's unspent transaction output database is around 120 MB is size now (including LevelDB indexes/overhead). Compressed, something around 80-85 MB.
You're a rockstar.
legendary
Activity: 1526
Merit: 1134
October 19, 2012, 04:44:16 AM
#13
You still haven't explained why it's misleading. Why don't you put a proposed rewrite of the bandwidth section here so we can read it?
hero member
Activity: 798
Merit: 1000
October 17, 2012, 04:00:15 PM
#12
I meant to say a 70-80% reduction, or 20-30% of the current size. Flipperfish's link would be pretty much all that is necessary to avoid doubt in the future.

The bandwidth section is still misleading though.
legendary
Activity: 1072
Merit: 1189
October 17, 2012, 03:56:27 PM
#11
Ultraprune's unspent transaction output database is around 120 MB is size now (including LevelDB indexes/overhead). Compressed, something around 80-85 MB.
jr. member
Activity: 39
Merit: 1
October 17, 2012, 03:37:37 PM
#10

"As of October 2012 (block 203258) there have been 7,979,231 transactions, however the size of the unspent output set is less than 100MiB"

You need to back this up, because from what I recall estimates were between 70-80% of the current block chain's size, which, even today, is definitely not 100MB.

You probably recall the correct percentages (70-80%) but that's the percentage of spent outputs, which could be forgotten.

I can confirm the following around block 202287 (using my BiRD client which only keeps track of unspent transactions):
  • 2443854 unspent transaction outputs, which is about 30% of all transactions (see 7.9M txs at block 203258)
  • the MySQL database containing all the necessary data to be able to spent those outputs, i.e. creating a valid (unsigned) tx, is about 316Mb in size when converted to an uncompressed CSV dump (simple text file)
  • compressing this CSV file yields only 110Mb of data.

You can download the client and some CSV's here.
hero member
Activity: 991
Merit: 1011
October 17, 2012, 12:36:15 PM
#9
with 0.5gb blocks, the blockchain will grow by 25tb every year.
thats 75 million 5 1/4 floppy disks, more than even the most modern c64s can handle.
therefore, bitcoin is doomed to fail.
sr. member
Activity: 350
Merit: 251
Dolphie Selfie
October 17, 2012, 09:30:24 AM
#8
At least here's a link to a post from Pieter with some graphs about these stats: https://bitcointalksearch.org/topic/m.1257750
legendary
Activity: 1526
Merit: 1134
October 16, 2012, 05:41:28 AM
#7
There is no link. The stats come from measurements taken from the software.
hero member
Activity: 798
Merit: 1000
October 16, 2012, 05:38:12 AM
#6
How about because I don't know the link?
legendary
Activity: 1372
Merit: 1008
1davout
October 16, 2012, 05:37:15 AM
#5
It is a wiki, why not put a link in?
Go for it.
hero member
Activity: 798
Merit: 1000
October 16, 2012, 05:30:00 AM
#4
I don't understand what you're talking about here.

Is it a peer to peer network, or is it bitcoin visa?

Quote
It's backed up by reality. Check out Pieters ultraprune branch and dump the stats from it yourself. Or you could just ask Pieter himself.

It is a wiki, why not put a link in?
legendary
Activity: 1526
Merit: 1134
October 16, 2012, 05:02:36 AM
#3
I made the transactions/sec -> tps notation more consistent. Of course you could have done that yourself, it being a wiki.

Quote
If the network were to fail miserably, this is accurate. Otherwise everyone needs more than just a single downstream connection.

I don't understand what you're talking about here.

Quote
You need to back this up, because from what I recall estimates were between 70-80% of the current block chain's size, which, even today, is definitely not 100MB.

It's backed up by reality. Check out Pieters ultraprune branch and dump the stats from it yourself. Or you could just ask Pieter himself.

hero member
Activity: 798
Merit: 1000
October 16, 2012, 01:47:11 AM
#2
Can you fix the "18 different people edited this" format?

"VISA handles on average around 2,000 transactions/sec, so call it a daily peak rate of 4,000/sec"

"Let's take 4,000 tps as starting goal."

"Let's assume an average rate of 2000tps, so just VISA."



"That means that you need to keep up with around 8 megabits/second of transaction data (2000tps * 512 bytes) / 1024 bytes in a kilobyte / 1024 kilobytes in a megabyte = 0.97 megabytes per second * 8 = 7.8 megabits/second.

This sort of bandwidth is already common for even residential connections today, and is certainly at the low end of what colocation providers would expect to provide you with."

If the network were to fail miserably, this is accurate. Otherwise everyone needs more than just a single downstream connection.

"As of October 2012 (block 203258) there have been 7,979,231 transactions, however the size of the unspent output set is less than 100MiB"

You need to back this up, because from what I recall estimates were between 70-80% of the current block chain's size, which, even today, is definitely not 100MB.

"Only a small number of archival nodes need to store the full chain going back to the genesis block. These nodes can be used to bootstrap new fully validating nodes from scratch but are otherwise unnecessary."

Yeah, unnecessary if there's a monopoly on mining and validating. Roll Eyes
legendary
Activity: 1526
Merit: 1134
October 15, 2012, 05:14:27 PM
#1
I rewrote the scalability wiki page with the results of the latest work:

http://en.bitcoin.it/wiki/Scalability

I also simplified it by taking into account very simple optimizations that were already done or partially prototyped, and removed the discussion of sharded supernodes as with ultraprune and more up to date OpenSSL performance figures it no longer seems necessary to shard a node over multiple machines.
Jump to: