Author

Topic: The Long Wait for Block Chain Download... (Read 5661 times)

legendary
Activity: 1596
Merit: 1100
February 20, 2013, 11:16:59 AM
#20
Update: the torrent moved to this thread.
legendary
Activity: 1120
Merit: 1164
February 20, 2013, 08:17:20 AM
#19
Startbitcoin.com is now offering the blockchain on DVDs that can be shipped for those who don't want to hassle with downloading it or those with data caps/bandwidth issues. It's a great way to get going if you lose your blockchain and would rather use your bandwidth for other things.

http://startbitcoin.com/blockchain-on-dvd/
 

Dammit, http://blockchainbymail.com was going to be my April Fools prank... Tongue

Anyway, if the startbitcoin.com guys want the domain, I'll happily give it to them for the 0.5BTC it cost me to register.
newbie
Activity: 42
Merit: 0
February 19, 2013, 10:33:38 PM
#18
Startbitcoin.com is now offering the blockchain on DVDs that can be shipped for those who don't want to hassle with downloading it or those with data caps/bandwidth issues. It's a great way to get going if you lose your blockchain and would rather use your bandwidth for other things.

http://startbitcoin.com/blockchain-on-dvd/
 
legendary
Activity: 1204
Merit: 1015
January 04, 2013, 12:41:28 AM
#17
Where can I read a summary of proposed solutions? I had an idea where a special block is created that contains the balance of all accounts and the next block in the chain (for old clients to maintain compatability) and that together forms the genesis of a new blockchain. It wouldn't affect old clients because old clients could choose to begin with either data source or people could just upgrade. I can't see how that idea is faulty (it probably is) but I wanted to read a summary of it and other ideas.. since it's surely been proposed before.
Of all of these types of suggestions, this is the most promising:
Ultimate blockchain compression w/ trust-free lite nodes
You'll have plenty of reading to do with just that, trust me.
staff
Activity: 4284
Merit: 8808
January 03, 2013, 01:06:12 PM
#16
I can't see how that idea is faulty (it probably is)
It is faulty in that it would undermine the unique security properties of Bitcoin. In Bitcoin you don't have to trust that the other participants (currently or before you joined) followed the rules— your software verifies that the rules were followed for itself— and because the software is open you can audit the software and verify for yourself that the software correctly enforces the rules you care about. (there is, perhaps, no end to this rabbit hole— but you can be as sure as you like).  This has many important implications, including the fact that it makes violation of the rules very unattractive to attempt— even by powerful parties— because verification by default means that they would almost certainly fail.
vip
Activity: 812
Merit: 1000
13
January 03, 2013, 12:03:05 PM
#15
Where can I read a summary of proposed solutions? I had an idea where a special block is created that contains the balance of all accounts and the next block in the chain (for old clients to maintain compatability) and that together forms the genesis of a new blockchain. It wouldn't affect old clients because old clients could choose to begin with either data source or people could just upgrade. I can't see how that idea is faulty (it probably is) but I wanted to read a summary of it and other ideas.. since it's surely been proposed before.
staff
Activity: 4284
Merit: 8808
December 31, 2012, 03:01:59 PM
#14
You however need "plenty" of cache, otherwise the database seek will be an orders of magnitude bigger problem than signature validation. The assumption that the UTXO set fits into memory is in danger for small footprint implementations or if fragmentation of coins increases exponentially through privacy options that avoid aggregating.
Fragmenting your coins into additional TXOUTs can actually  be very poisonous for privacy because you end up cross contaminating all your payments with linked sets of scriptpubkeys.  None of the clients I'm aware of make _any_ effort to aggregate TXOUTs at all, aggregation which intentionally duplicates common-scriptpubkey-input payments would both reduce txout set size and increase privacy.   Though a significant fraction of the current bloat has nothing to do with spending/aggregation patterns— it's mostly due to a single party using effectively-unspendably small (1e-8) txouts for "messaging".

On my own main wallet— (which does none of that crappy bloaty stuff, just regular real economic activity), I was able to reduce my unspent txout set from about 1000 in total to about 91 through taint aware aggregation while simultaneously greatly increasing the privacy of my future transactions.

On the subject of cache,

The UTXO set does not need to fit in memory to have high performance. Even without memory the current implementation should have O(log (N_UTXO))*seek_delay  scaling per txin queried, and could have ~O(1)*seek_delay if the backend db wasn't ordered.  Even if memory is not enough to fit the whole working set in ram having part of it (esp the upper tree levels in the ordered database) is quite helpful.  For future evidence of this, on many systems leveldb alone (without ultraprune) got most of the speedup of leveldb+ultraprune (as did ultraprune alone without leveldb)... much of the slowness of pre-0.8  reference clients is just due to BDB.
legendary
Activity: 2128
Merit: 1073
December 31, 2012, 11:38:38 AM
#13
You will need "nearly infinite bandwidth" anyway since you have to download the entire chain to verify it. 0.5GB consisting of 10MB data blocks means you store 50 blocks; with the right algorithm behind it (for example, look at the way Bittorrent distributes data) it shouldn't be that much of a drama. The choices are to let everybody store everything (recycling my previous example this would mean a replication count of 60,000) or distributed storage (replication count 5000) or a few (how many? 10? 100?) history nodes. Less replication means easier attacks: a combined government operation spanning over a few countries could take down the history nodes. Or just dDoS them.

As for your exploit: that's already common in many p2p networks and they still survive.
Possible solution: a client requesting a data block sends a random string together with the request and the sender has to deliver the block along with a checksum of block+string. The client verifies the checksum after it receives the block and then requests only the checksum from one or two other random nodes. If all checksums match, the seeding client is good. If the seeding client turns out to serve bad data, the requester stops communicating with it completely (or for x days to get around dynamic IPs). Obviously, it'd be better to have smaller data blocks, like 1MB instead of 10MB. Together with timeouts and a minimum bandwith requirement this could sort out the bad apples.
I'm not disagreeing with you. I was meekly trying to point out that this type of argumentation keeps reappearing in the Bitcoin millieu: the turtle is getting too big: replace it with stack of smaller turtles.

It all has origin in the forward-delta implementation of Bitcoin and the desire to avoid the hard fork. Until the paradigm shift happens (hard fork to reverse-delta implementation) there will not be much of meaningfull progress; but there will be a lot of activity.

I'm not the first to observe it; I won't be the last. I'll give you the link to my post from about half-a-year ago on this very subject; so you can search your own references using different vocabulary than "forward-delta" and "reverse-delta".

https://bitcointalksearch.org/topic/m.965877
hero member
Activity: 576
Merit: 514
December 31, 2012, 11:13:48 AM
#12
Now you traded "nearly infinite storage" problem with "nearly infinite bandwidth" problem. Your distributed storage scheme would need a protocol that is resistant to the common exploit: pretend to have block X; but stall (or disappear) when someone asks for it.

It is turtles all the way down, I tell y'all.
You will need "nearly infinite bandwidth" anyway since you have to download the entire chain to verify it. 0.5GB consisting of 10MB data blocks means you store 50 blocks; with the right algorithm behind it (for example, look at the way Bittorrent distributes data) it shouldn't be that much of a drama. The choices are to let everybody store everything (recycling my previous example this would mean a replication count of 60,000) or distributed storage (replication count 5000) or a few (how many? 10? 100?) history nodes. Less replication means easier attacks: a combined government operation spanning over a few countries could take down the history nodes. Or just dDoS them.

As for your exploit: that's already common in many p2p networks and they still survive.
Possible solution: a client requesting a data block sends a random string together with the request and the sender has to deliver the block along with a checksum of block+string. The client verifies the checksum after it receives the block and then requests only the checksum from one or two other random nodes. If all checksums match, the seeding client is good. If the seeding client turns out to serve bad data, the requester stops communicating with it completely (or for x days to get around dynamic IPs). Obviously, it'd be better to have smaller data blocks, like 1MB instead of 10MB. Together with timeouts and a minimum bandwith requirement this could sort out the bad apples.
legendary
Activity: 2128
Merit: 1073
December 31, 2012, 09:41:35 AM
#11
Instead of relying on a few nodes which act as history nodes (what increases the risk of data loss) Bitcoin could move to a distributed and replicated storage method: each node provides some storage and the protocol balances data-blocks across all online nodes to guarantee that each of those blocks is replicated at least n times across the entire network. So when a fresh client joins, it will download the chain, store those ~200MB it needs plus e.g. a flat file of 0.5GB in which those blocks with the lowest replication count get stored. Btw, I mean blocks as in chucks of data, e.g. 10MB blocks, not Bitcoin blocks because by using a fixed size it's easy to swap out blocks.

Let's say there are 60,000 active clients on the network with 0.5GB provided by each client. This would create 30TB of storage. With a blockchain size of 6GB the network could offer a replication count of 5000. If the minimum replication count would be 100, the blockchain could grow to 300GB; assuming the number of clients won't change. The currently required space of ~5GB would offer 300TB in this model (yes, I did round here and there a bit). Of course there still can be the option to work as a full history node which stores all blocks to act as a seeder simply by overriding the 0.5Gb with x GB.
Now you traded "nearly infinite storage" problem with "nearly infinite bandwidth" problem. Your distributed storage scheme would need a protocol that is resistant to the common exploit: pretend to have block X; but stall (or disappear) when someone asks for it.

It is turtles all the way down, I tell y'all.
hero member
Activity: 576
Merit: 514
December 31, 2012, 05:39:05 AM
#10
It uses a new database layout that does support pruning (= not storing the entire blockchain history), but this will not yet be available in 0.8 probably because of potential effects on the network if too many nodes do not provide history anymore. That would indeed mean storing something like 200 MB - but you'd still need to download the history to verify it, and build the database.
Instead of relying on a few nodes which act as history nodes (what increases the risk of data loss) Bitcoin could move to a distributed and replicated storage method: each node provides some storage and the protocol balances data-blocks across all online nodes to guarantee that each of those blocks is replicated at least n times across the entire network. So when a fresh client joins, it will download the chain, store those ~200MB it needs plus e.g. a flat file of 0.5GB in which those blocks with the lowest replication count get stored. Btw, I mean blocks as in chucks of data, e.g. 10MB blocks, not Bitcoin blocks because by using a fixed size it's easy to swap out blocks.

Let's say there are 60,000 active clients on the network with 0.5GB provided by each client. This would create 30TB of storage. With a blockchain size of 6GB the network could offer a replication count of 5000. If the minimum replication count would be 100, the blockchain could grow to 300GB; assuming the number of clients won't change. The currently required space of ~5GB would offer 300TB in this model (yes, I did round here and there a bit). Of course there still can be the option to work as a full history node which stores all blocks to act as a seeder simply by overriding the 0.5Gb with x GB.
hero member
Activity: 836
Merit: 1030
bits of proof
December 31, 2012, 01:21:08 AM
#9
Going forward verification time will likely dominate the bootstrap not network download.
The bottleneck in verification is also not signature checking but seeking for input referenced as the database grows.

On my own system, with plenty of cache, and git head (to-be-0.8 ) code, signature checking is 1-2 orders of magnitude slower than the rest. Parallel signature checking improves a lot upon this, but it remains a limiting factor: on my laptop, 12 minutes to reindex/verify 210000 blocks (without signature checking), 6 more minutes to fully verify 4000 more blocks.

You however need "plenty" of cache, otherwise the database seek will be an orders of magnitude bigger problem than signature validation. The assumption that the UTXO set fits into memory is in danger for small footprint implementations or if fragmentation of coins increases exponentially through privacy options that avoid aggregating.


Real remedies in my opinion are
1. pruning the database of spent transactions

Which is what the 0.8 code does.
Yes, it prepares for that. Pruning can however only be turned on on network scale if nodes specialize, so we have archiver that allow for bootstrapping, since you can not (yet) transfer UTXO without trust.

Not to lessen you effort, that I admire, but I believe my implementation will be easier to taylor to behave differently at different sites or to add to the protocol e.g. to transfer UTXO using the ideas developed in this forum.
legendary
Activity: 1072
Merit: 1189
December 30, 2012, 06:23:07 PM
#8
Going forward verification time will likely dominate the bootstrap not network download.
The bottleneck in verification is also not signature checking but seeking for input referenced as the database grows.

On my own system, with plenty of cache, and git head (to-be-0.8 ) code, signature checking is 1-2 orders of magnitude slower than the rest. Parallel signature checking improves a lot upon this, but it remains a limiting factor: on my laptop, 12 minutes to reindex/verify 210000 blocks (without signature checking), 6 more minutes to fully verify 4000 more blocks.

Quote
Real remedies in my opinion are
1. pruning the database of spent transactions

Which is what the 0.8 code does.
hero member
Activity: 836
Merit: 1030
bits of proof
December 30, 2012, 12:56:27 PM
#7
You may download from bittorrent and import via -loadblock, if you'd like:

     [BETA] Bitcoin blockchain torrent



Going forward verification time will likely dominate the bootstrap not network download.
The bottleneck in verification is also not signature checking but seeking for input referenced as the database grows.

Real remedies in my opinion are
1. pruning the database of spent transactions
2. containment of dust (e.g. mandatory or even predatory transaction fees on them or lower transaction fees for aggregation).
legendary
Activity: 1596
Merit: 1100
December 30, 2012, 12:04:59 PM
#6
You may download from bittorrent and import via -loadblock, if you'd like:

     [BETA] Bitcoin blockchain torrent

sr. member
Activity: 286
Merit: 251
December 30, 2012, 09:41:14 AM
#5
Peter, I understand and applaud the efforts you are making and that it has to be done gently and slowly. That also this is the right long-term solution.

But in the meantime, would there not be some merit in an automated optional backup ?

Its very low risk, ought to be easy to implement, and effectively does not change the feature set offered.

[ And yes I need to clean my computer, or something. I already did that though and its only crashed once in the last 36 hours, so difficult to pin down!! ]
full member
Activity: 150
Merit: 100
December 30, 2012, 09:29:34 AM
#4
Bitcoin 0.8 will help new users to bootstrap the network. You wil need to download about 200MB insgead of several gigabytes.

Ps : your computer may need some cleaning if it crashes (dust removing or add some thermal paste below your cpu)

How can that be?

From what ive read, you will still need the entire blockchain to do a full validation. The primary difference in 0.8 was supposed to be that you only store a small db with ONLY unspent outputs in memory to make the entire validation MUCH faster. This small DB is built as you download the blockchain and is all that is needed to verify new blocks that you download. However that does not mean that the transaction information in the blockchain will be deleted, else there will be no way to verify that your small DB in RAM is valid.
legendary
Activity: 1072
Merit: 1189
December 30, 2012, 09:23:11 AM
#3
Bitcoin 0.8 will help new users to bootstrap the network. You wil need to download about 200MB insgead of several gigabytes.

This is not true. It will still require downloading and storing the full blockchain, but the speed of processing should be an order of magnitude faster.

It uses a new database layout that does support pruning (= not storing the entire blockchain history), but this will not yet be available in 0.8 probably because of potential effects on the network if too many nodes do not provide history anymore. That would indeed mean storing something like 200 MB - but you'd still need to download the history to verify it, and build the database.
hero member
Activity: 540
Merit: 500
December 30, 2012, 09:18:13 AM
#2
Bitcoin 0.8 will help new users to bootstrap the network. You wil need to download about 200MB insgead of several gigabytes.

Ps : your computer may need some cleaning if it crashes (dust removing or add some thermal paste below your cpu)
sr. member
Activity: 286
Merit: 251
December 30, 2012, 09:08:01 AM
#1
Having had the machine with my bitcoin client on shutdown due to a thermal problem, I am sitting here watching the download happen. Yes, its slow, we all know that.  And actually I havnt yet solved the thermal issue.

The annoying thing is if I had another client on another machine on my local wired network, presumably this would be very fast indeed.

Whats also annoying (when the machine crashed again!!) is that you are almost guranteed a problem if you are running the client at all and something like this happens or if you are already downloading the blockchain its near 100% likely.

Got me thinking that a backup once a day of the block chain info would be easy to do and very useful. I could script that I suppose.

Ok, it would use double the space, but for many people 4G is still not a lot of space.

So here's a feature request for the Satoshi client. The client could incorporate an optional once per 24 hour backup of the downloaded blockchain info. In the event that on startup the client detects invalid blockchain info (which it mainly does because it says so) or if the user requests it, the backup would be used, and only a bit of catching up would then be needed.

Another advantage of incorporating this in the client is that the format of this data is client stuff and may change between client versions, so a new client could just upgrade the backup at the same time it upgrades the data.

I am still sitting here watching the little orange arrows spin...

Jump to: