Synchronizing with Blockchain I/O bound

Mushoz

hero member

Activity: 686

Merit: 500

Bitbuy

Quote from: Pieter Wuille on March 22, 2012, 12:40:28 PM

Quote from: Mushoz on March 22, 2012, 12:33:43 PM

Validating creates quite a bit of load on the CPU, so it will most likely be bottlenecked by the CPU. A fast connection should easily be able to download the entire blockchain within 30 minutes, as long as the client is well connected with a working uPnP setup. We'll have to wait and see I guess

Does anyone know how I can check whether this Pull is included in 0.6RC1? I would like to give this a shot, but I'm not capable of compiling Bitcoin myself. Thanks

It was only merged today. 0.6.0rc1 is 1.5 month old. 0.6.0rc4 is 6 days old. It will be included in 0.6.0rc5.

You shouldn't run outdated release candidates by the way - there's a reason a newer rc was created: the old one had (too many) bugs.

Ah, ok, thanks for the tip! Looking forward to RC5 then

Pieter Wuille

legendary

Activity: 1072

Merit: 1189

Quote from: Mushoz on March 22, 2012, 12:33:43 PM

Validating creates quite a bit of load on the CPU, so it will most likely be bottlenecked by the CPU. A fast connection should easily be able to download the entire blockchain within 30 minutes, as long as the client is well connected with a working uPnP setup. We'll have to wait and see I guess

Does anyone know how I can check whether this Pull is included in 0.6RC1? I would like to give this a shot, but I'm not capable of compiling Bitcoin myself. Thanks

It was only merged today. 0.6.0rc1 is 1.5 month old. 0.6.0rc4 is 6 days old. It will be included in 0.6.0rc5.

You shouldn't run outdated release candidates by the way - there's a reason a newer rc was created: the old one had (too many) bugs.

Mushoz

hero member

Activity: 686

Merit: 500

Bitbuy

Quote from: Pieter Wuille on March 22, 2012, 12:30:13 PM

Quote

This is awesome news! A full blockchain download in 33 minutes on a laptop is excellent! Thank you all who made this possible, you've just majorly lowered the bar to entry

Note that that is not including the downloading, only processing them - I imported it from a local file. I assume in normal circumstances it will still take an hour or two (and if you have bad luck, a lot more) to download.

Validating creates quite a bit of load on the CPU, so it will most likely be bottlenecked by the CPU. A fast connection should easily be able to download the entire blockchain within 30 minutes, as long as the client is well connected with a working uPnP setup. We'll have to wait and see I guess

Does anyone know how I can check whether this Pull is included in 0.6RC1? I would like to give this a shot, but I'm not capable of compiling Bitcoin myself. Thanks

Pieter Wuille

legendary

Activity: 1072

Merit: 1189

This is awesome news! A full blockchain download in 33 minutes on a laptop is excellent! Thank you all who made this possible, you've just majorly lowered the bar to entry

[/quote]

Note that that is not including the downloading, only processing them - I imported it from a local file. I assume in normal circumstances it will still take an hour or two (and if you have bad luck, a lot more) to download.

Mushoz

hero member

Activity: 686

Merit: 500

Bitbuy

Quote from: Gavin Andresen on March 22, 2012, 09:54:19 AM

Quote from: Pieter Wuille on March 21, 2012, 11:21:53 PM

By tweaking some caching settings, a rather spectacular speed increase for loading a block chain was obtained. This will probably end up in 0.6 still.

I pulled #964 for 0.6 this morning.

I had played with database settings several months ago and saw no speedup because there was another bug causing a bottleneck. That bug was fixed a while ago, but nobody thought to try tweaking the db settings again until a few days ago.

Pieter and Greg did all the hard work of doing a lot of benchmarking to figure out which settings actually matter.

PS: the database settings are run-time configurable for any version of bitcoin; berkeley db reads a file called 'DB_CONFIG' (if it exists) in the "database environment" directory (aka -datadir).

This is awesome news! A full blockchain download in 33 minutes on a laptop is excellent! Thank you all who made this possible, you've just majorly lowered the bar to entry

Gavin Andresen

legendary

Activity: 1652

Merit: 2316

Chief Scientist

Quote from: Pieter Wuille on March 21, 2012, 11:21:53 PM

By tweaking some caching settings, a rather spectacular speed increase for loading a block chain was obtained. This will probably end up in 0.6 still.

I pulled #964 for 0.6 this morning.

I had played with database settings several months ago and saw no speedup because there was another bug causing a bottleneck. That bug was fixed a while ago, but nobody thought to try tweaking the db settings again until a few days ago.

Pieter and Greg did all the hard work of doing a lot of benchmarking to figure out which settings actually matter.

PS: the database settings are run-time configurable for any version of bitcoin; berkeley db reads a file called 'DB_CONFIG' (if it exists) in the "database environment" directory (aka -datadir).

Pieter Wuille

legendary

Activity: 1072

Merit: 1189

By tweaking some caching settings, a rather spectacular speed increase for loading a block chain was obtained. This will probably end up in 0.6 still.

etotheipi

legendary

Activity: 1428

Merit: 1093

Core Armory Developer

Quote from: randomproof on March 21, 2012, 10:27:33 PM

What about changing how the database is stored on disk? It seems to me that using the Berkeley DB library might be causing the problem, but I don't know enough on how that works to be sure. Maybe using some other database library (like SQLite) would have less disk IO.

The blockchain is actually stored in a flat binary file. It's just one raw block after another, serialized into blk0001.dat. It's the wallet file that is stored using Berkeley DB.

randomproof

member

Activity: 61

Merit: 10

What about changing how the database is stored on disk? It seems to me that using the Berkeley DB library might be causing the problem, but I don't know enough on how that works to be sure. Maybe using some other database library (like SQLite) would have less disk IO.

finway

hero member

Activity: 714

Merit: 500

Quote from: antares on March 18, 2012, 03:36:12 PM

when I need a new BC, I usually create a ram disk of 3 or so gigabyte, and let bitcoind download the blockchain there. This way I get the entire chain in something below an hour, and once it's there, I simply shutdown bitcoind, and move the blockchain out. seems the best way to do it, until the devs here figure that initial blockchain download is a serious reason hindering new people from getting into bitcoin.

I should try this.

etotheipi

legendary

Activity: 1428

Merit: 1093

Core Armory Developer

Quote from: Mushoz on March 20, 2012, 05:21:04 PM

Quote from: notme on March 19, 2012, 11:32:22 AM

mmap appears to be the correct solution here (and possibly gmaxwell's solution as well)

Any developers of the satoshi client looking at this? I'd be willing to try my hand at a patch if someone can point me in the right direction, but I'm not familiar with the bitcoin client code or libdb (which may need altered if it doesn't already provide mmapability for databases).

That implementation sounds fantastic! Exactly what we need. Would be great if this could be implemented, I really think this is quite a high priority. Getting started with bitcoin should be as painless, fast and easy as possible for new users. Good luck with this Notme

I'm pretty sure the Satoshi client already uses something along these lines. In fact, I know I saw mmap in the source code for opening wallets...

Perhaps the difference is that Armory does a full rescan on every load (but does not do full verification, that would take a while), whereas the Satoshi client (I think) only re-scans the last 2500 blocks. This gives me an opportunity to get the whole blockchain into cache, whereas the Satoshi client won't get it into cache until the first time a scan is done. If I'm right and it does use mmap for the blockchain, then a second scan (for instance, on an address import) will go much faster if your computer has 4-8 GB of RAM.

Mushoz

hero member

Activity: 686

Merit: 500

Bitbuy

Quote from: notme on March 19, 2012, 11:32:22 AM

Quote from: etotheipi on March 19, 2012, 10:46:13 AM

Quote from: Mushoz on March 18, 2012, 07:08:47 PM

Very good post, and you're right! Back to the initial point then.

Except for some very basic stuff, I'm no programmer. But how hard do you think it is to implement a caching feature? I was checking the Bitcoin-qt process, and it looks like most of it's I/O activity was happening to the blkindex.dat file, which could quite easily fit in most people's RAM. Do you think it's feasible to cache that entire file into RAM? Of course, a smarter caching algorithm would be much better, but would also be quite a bit harder to implement. And we have to make sure sudden loss of power won't result in corrupted blockchains.

Btw, just for reference, I started writing Armory about 9 months ago when the blockchain was a few hundred MB. I asked the same question, and even built an experimental, speed-optimized blockchain scanner that holds the entire blockchain in memory. It has been remarkably successful for those that have enough RAM, but it's going to become unusable very soon. The blockchain has more than doubled in size since I started, and it's increasing in speed. I'm scrambling to get something in there so that systems with less than 4GB of RAM can use it...

Instead, I'm switching to an mmap-based solution which seems to give the best of both worlds. It's treating disk space like memory, and a memory access retrieves the data from disk if it's not in the cache. The nice thing about this is, if you have a system with 8GB+ RAM, it will just cache the whole blockchain and you get the benefits of the original implementation. But if you have less RAM, it will cache as much as it can, and supposedly intelligently. The caching is OS-dependent, but fairly optimized, as it's something that's actually implemented at the kernel layer. The only consideration there is that if you are going to some kind of structured access pattern of the file, then you can "advise" the mmap'd memory about it and it will optimize itself for it (i.e. - if you are going to access the whole file sequentially, it will start caching sector i+1 as soon as you read sector i).

The problem with "why not hold everything in RAM?" questions is that with Bitcoin, there is no limit on what "everything" will be. I don't know exactly what the blockindex holds, but there's no guarantee it won't get wildly out of hand -- maybe someone figures out how to spam the blockchain with certain types of bloat. Then, thousands of users who've been using the program for months, suddenly can't load the client anymore. Even with blockchain pruning, there's no guarantees.

So, my lessons from Armory were that you should never count on anything being held entirely in RAM. And I like gmaxwell's solution of having a SPV-node until synchronization completes, then switching. I've been pondering this a lot recently, but haven't come up with a good, robust (and user-understandable) way to implement it yet.

mmap appears to be the correct solution here (and possibly gmaxwell's solution as well)

Any developers of the satoshi client looking at this? I'd be willing to try my hand at a patch if someone can point me in the right direction, but I'm not familiar with the bitcoin client code or libdb (which may need altered if it doesn't already provide mmapability for databases).

That implementation sounds fantastic! Exactly what we need. Would be great if this could be implemented, I really think this is quite a high priority. Getting started with bitcoin should be as painless, fast and easy as possible for new users. Good luck with this Notme

notme

legendary

Activity: 1904

Merit: 1002

Quote from: etotheipi on March 19, 2012, 10:46:13 AM

Quote from: Mushoz on March 18, 2012, 07:08:47 PM

Very good post, and you're right! Back to the initial point then.

Except for some very basic stuff, I'm no programmer. But how hard do you think it is to implement a caching feature? I was checking the Bitcoin-qt process, and it looks like most of it's I/O activity was happening to the blkindex.dat file, which could quite easily fit in most people's RAM. Do you think it's feasible to cache that entire file into RAM? Of course, a smarter caching algorithm would be much better, but would also be quite a bit harder to implement. And we have to make sure sudden loss of power won't result in corrupted blockchains.

Btw, just for reference, I started writing Armory about 9 months ago when the blockchain was a few hundred MB. I asked the same question, and even built an experimental, speed-optimized blockchain scanner that holds the entire blockchain in memory. It has been remarkably successful for those that have enough RAM, but it's going to become unusable very soon. The blockchain has more than doubled in size since I started, and it's increasing in speed. I'm scrambling to get something in there so that systems with less than 4GB of RAM can use it...

Instead, I'm switching to an mmap-based solution which seems to give the best of both worlds. It's treating disk space like memory, and a memory access retrieves the data from disk if it's not in the cache. The nice thing about this is, if you have a system with 8GB+ RAM, it will just cache the whole blockchain and you get the benefits of the original implementation. But if you have less RAM, it will cache as much as it can, and supposedly intelligently. The caching is OS-dependent, but fairly optimized, as it's something that's actually implemented at the kernel layer. The only consideration there is that if you are going to some kind of structured access pattern of the file, then you can "advise" the mmap'd memory about it and it will optimize itself for it (i.e. - if you are going to access the whole file sequentially, it will start caching sector i+1 as soon as you read sector i).

The problem with "why not hold everything in RAM?" questions is that with Bitcoin, there is no limit on what "everything" will be. I don't know exactly what the blockindex holds, but there's no guarantee it won't get wildly out of hand -- maybe someone figures out how to spam the blockchain with certain types of bloat. Then, thousands of users who've been using the program for months, suddenly can't load the client anymore. Even with blockchain pruning, there's no guarantees.

So, my lessons from Armory were that you should never count on anything being held entirely in RAM. And I like gmaxwell's solution of having a SPV-node until synchronization completes, then switching. I've been pondering this a lot recently, but haven't come up with a good, robust (and user-understandable) way to implement it yet.

mmap appears to be the correct solution here (and possibly gmaxwell's solution as well)

Any developers of the satoshi client looking at this? I'd be willing to try my hand at a patch if someone can point me in the right direction, but I'm not familiar with the bitcoin client code or libdb (which may need altered if it doesn't already provide mmapability for databases).

etotheipi

legendary

Activity: 1428

Merit: 1093

Core Armory Developer

Quote from: Mushoz on March 18, 2012, 07:08:47 PM

Very good post, and you're right! Back to the initial point then.

Except for some very basic stuff, I'm no programmer. But how hard do you think it is to implement a caching feature? I was checking the Bitcoin-qt process, and it looks like most of it's I/O activity was happening to the blkindex.dat file, which could quite easily fit in most people's RAM. Do you think it's feasible to cache that entire file into RAM? Of course, a smarter caching algorithm would be much better, but would also be quite a bit harder to implement. And we have to make sure sudden loss of power won't result in corrupted blockchains.

Btw, just for reference, I started writing Armory about 9 months ago when the blockchain was a few hundred MB. I asked the same question, and even built an experimental, speed-optimized blockchain scanner that holds the entire blockchain in memory. It has been remarkably successful for those that have enough RAM, but it's going to become unusable very soon. The blockchain has more than doubled in size since I started, and it's increasing in speed. I'm scrambling to get something in there so that systems with less than 4GB of RAM can use it...

Instead, I'm switching to an mmap-based solution which seems to give the best of both worlds. It's treating disk space like memory, and a memory access retrieves the data from disk if it's not in the cache. The nice thing about this is, if you have a system with 8GB+ RAM, it will just cache the whole blockchain and you get the benefits of the original implementation. But if you have less RAM, it will cache as much as it can, and supposedly intelligently. The caching is OS-dependent, but fairly optimized, as it's something that's actually implemented at the kernel layer. The only consideration there is that if you are going to some kind of structured access pattern of the file, then you can "advise" the mmap'd memory about it and it will optimize itself for it (i.e. - if you are going to access the whole file sequentially, it will start caching sector i+1 as soon as you read sector i).

The problem with "why not hold everything in RAM?" questions is that with Bitcoin, there is no limit on what "everything" will be. I don't know exactly what the blockindex holds, but there's no guarantee it won't get wildly out of hand -- maybe someone figures out how to spam the blockchain with certain types of bloat. Then, thousands of users who've been using the program for months, suddenly can't load the client anymore. Even with blockchain pruning, there's no guarantees.

So, my lessons from Armory were that you should never count on anything being held entirely in RAM. And I like gmaxwell's solution of having a SPV-node until synchronization completes, then switching. I've been pondering this a lot recently, but haven't come up with a good, robust (and user-understandable) way to implement it yet.

Mushoz

hero member

Activity: 686

Merit: 500

Bitbuy

Very good post, and you're right! Back to the initial point then.

Except for some very basic stuff, I'm no programmer. But how hard do you think it is to implement a caching feature? I was checking the Bitcoin-qt process, and it looks like most of it's I/O activity was happening to the blkindex.dat file, which could quite easily fit in most people's RAM. Do you think it's feasible to cache that entire file into RAM? Of course, a smarter caching algorithm would be much better, but would also be quite a bit harder to implement. And we have to make sure sudden loss of power won't result in corrupted blockchains.

gmaxwell

staff

Activity: 4326

Merit: 8951

Quote from: Mushoz on March 18, 2012, 03:50:29 PM

Can't the initial download be handled in a torrent-like way? What we have to do is hardcode data that's usually included in a .torrent file in the client. Instead of downloading the blocks from the clients and manually verifying every transaction, why not hash the blocks and check those hashes against the hashes hardcoded in the client?

Because this violates the design of Bitcoin in an extreme way. Bitcoin is, for the most part, a zero trust system. You don't trust the developers to tell you about the right transactions, you trust only that software on your system (that you, or your agents, can audit) has independently validated that the rules have all been followed.

The fact that you and a great many other independent people running full nodes are doing this independent validation is also what enables things like SPV nodes (which don't do this checking) to also be fairly trustworthy.

This is all pretty important because if Bitcoin is to achieve it's goal of removing trust from money then it's not okay to replace state trust with a gaggle of developers and big bitcoin sites (e.g. Deepbit, Mtgox). I say this to insult them because they are trustworthy folks, but why would you trust a tiny cabal when you won't trust democratically elected states and regulated free market chosen banks?

In any case, this validation doesn't have to get in the way of using the software— Bitcoin could startup as a SPV node and become a full node at its leisure (and lapse back to SPV mode if it falls behind). It's just that the software for this hasn't been written yet. The fact that the validation will happen 'soon' confers almost all of the decentralization benefits, while providing all of the performance benefits.

Of course, none of this has anything to do with the OP's point which was that the synchronization is currently needlessly slow. He's absolutely right. If you run bitcoin in tmpfs on a fast machine you can do a full blockchain sync in only a half hour. There is no fundimental reason that it couldn't be just as fast while writing to disk, at least on systems with reasonable amounts of ram. This must be fixed, can be fixed, and the Bitcoin using community shouldn't allow the current brokenness to be used as an excuse to degrade the trust model of Bitcoin. Unfortunately, fixing it doesn't appear to be trivial— and so far everything that has been tried has not been successful (though improvements have been made).

Mushoz

hero member

Activity: 686

Merit: 500

Bitbuy

Can't the initial download be handled in a torrent-like way? What we have to do is hardcode data that's usually included in a .torrent file in the client. Instead of downloading the blocks from the clients and manually verifying every transaction, why not hash the blocks and check those hashes against the hashes hardcoded in the client? If they match, there's no need to manually verify every single transaction again. Only blocks created after the last block of which the hash was included in the client have to be checked the regular way. We could even create new ".torrent" like files, only a few kb in size, which include newer blocks. That way, if the client hasn't been updated in a while, we can still easily and quickly catch up with the chain by downloading that small file and opening it with the client. Thoughts/comments?

notme

legendary

Activity: 1904

Merit: 1002

Quote from: antares on March 18, 2012, 03:36:12 PM

when I need a new BC, I usually create a ram disk of 3 or so gigabyte, and let bitcoind download the blockchain there. This way I get the entire chain in something below an hour, and once it's there, I simply shutdown bitcoind, and move the blockchain out. seems the best way to do it, until the devs here figure that initial blockchain download is a serious reason hindering new people from getting into bitcoin.

+1

But, it needs to be handled by the client. We can't expect everyone to be able to set up a ramdisk.

I still think there should be downloads that include the blockchain (up to the latest lock-in block) available alongside the client-only downloads.

Mushoz

hero member

Activity: 686

Merit: 500

Bitbuy

Quote from: antares on March 18, 2012, 03:36:12 PM

when I need a new BC, I usually create a ram disk of 3 or so gigabyte, and let bitcoind download the blockchain there. This way I get the entire chain in something below an hour, and once it's there, I simply shutdown bitcoind, and move the blockchain out. seems the best way to do it, until the devs here figure that initial blockchain download is a serious reason hindering new people from getting into bitcoin.

Yes, but this proves that it's entirely possible to speed up the process by leaps and bounds. It should automatically be cached to RAM, instead having to do it manually with a RAM-disk.

antares

hero member

Activity: 518

Merit: 500

when I need a new BC, I usually create a ram disk of 3 or so gigabyte, and let bitcoind download the blockchain there. This way I get the entire chain in something below an hour, and once it's there, I simply shutdown bitcoind, and move the blockchain out. seems the best way to do it, until the devs here figure that initial blockchain download is a serious reason hindering new people from getting into bitcoin.

Topic: Synchronizing with Blockchain I/O bound (Read 4302 times)