Any way to get client startup faster?

Mike Hearn

legendary

Activity: 1526

Merit: 1134

It rechecks blocks so if a new client has a rule change/bugfix, the chain will be revalidated and blocks that no longer match the rules will be kicked out, triggering a re-org.

A simple optimization would be to only perform this rechecking process when the version has changed.

JoelKatz

legendary

Activity: 1596

Merit: 1012

Democracy is vulnerable to a 51% attack.

Quote from: 2112 on October 17, 2011, 07:51:41 PM

Quote from: JoelKatz on October 17, 2011, 07:19:29 PM

I did some benchmarking of client startup.

Did your profiling tool allocated some time to I/O wait? Seems like you've enumerated a CPU-time profile, not a wall-time profile.

It's wall time, but I made sure all the files were hot in the cache. At least on my machine, if no data is hot in the cache, disk I/O is about 12% of startup time.

2112

legendary

Activity: 2128

Merit: 1074

Quote from: JoelKatz on October 17, 2011, 07:19:29 PM

I did some benchmarking of client startup.

Did your profiling tool allocated some time to I/O wait? Seems like you've enumerated a CPU-time profile, not a wall-time profile.

Thanks.

JoelKatz

legendary

Activity: 1596

Merit: 1012

Democracy is vulnerable to a 51% attack.

I did some benchmarking of client startup. About 75% of it is validating the block chain (CTxDB::LoadBlockIndex). About 15% of it is loading addresses (CAddrDB::LoadAddresses).

Looking at the block chain load/verify process, about 40% of the time is CBlock::BuildMerkleTree. About 10% is SHA256. The rest is ReadFromDisk, GetBlockHash, InsertBlockIndex, and so on.

Looking at the loading of addresses, it's distributed over many functions. The only standouts are CDB::ReadAtCursor and std::_Rb_tree::_M_insert_unique.

Just not checking that the transaction's hashes are correct and match the block header would halve the startup time.

phorensic

hero member

Activity: 630

Merit: 500

Quote from: Alex Zee on September 20, 2011, 11:51:07 AM

Quote from: wumpus on July 24, 2011, 01:25:45 PM

It'd be better if it displayed some kind of startup screen instead of just staying silent

It would be better if it just created UI first and do the work later.

Bitcoin-qt does this, and very well. It will be merged with main client in v0.5.0. I wish it was sooner, but I don't mind compiling bitcoin-qt every once in a while.

Alex Zee

member

Activity: 112

Merit: 10

Quote from: wumpus on July 24, 2011, 01:25:45 PM

It'd be better if it displayed some kind of startup screen instead of just staying silent

It would be better if it just created UI first and do the work later.

etotheipi

legendary

Activity: 1428

Merit: 1093

Core Armory Developer

Sorry to revive an old thread, but I finally got my technique implemented for reading the blockchain, and I believe it represents a lower-bound on computational time for reading in the blockchain. I am posting because this is empirical data on what I believe to be the most efficient blockchain scanning technique possible.

Note: I recognize that a full-RAM implementation is not ideal for the general user, but it's feasible for modern computers (at current tx volume for the next couple years) and this was for the purpose of finding the lower-bound on processing time.

Allocate 500MB contiguous space in RAM, do a single file-read of the entire blk0001.dat file into this memory
Create maps of blockheader pointers and tx pointers indexed by hash (pointers reference memory locations in RAM).
Organize all blockheaders into a chain, calculate the longest one, create a headersByHeight vector of pointers
Read in a wallet of a few addresses and do a fresh rescan for relevant transactions
Does not do any ECDSA verification in the scan
Does not check merkle roots/trees
Does not check validity of anything -- assumes if it's in the blockchain, it's valid

I have a quad-core AMD CPU (64-bit), and ran this method on a WinXP virtual machine. I applied the above process to 135,000 blocks.
-- Read blockchain into memory: 4s (120MB/s HDD sounds about right)
-- Construct the header and tx maps from scratch in 9s.
-- Blockchain organization and finding longest chain from scratch: 0.5s.
-- The full wallet scan takes about 15s.

So, making the assumption that the longest chain contains only valid data, I can do a full rescan in about 30 seconds. With 145k blocks, I assume this will be more like 40-45s. I believe this is a very favorable result. I recognize there is no fair comparison to the existing client, because I believe the existing client does all the ECDSA signature verifications.

kjj

legendary

Activity: 1302

Merit: 1026

Quote from: vector76 on August 06, 2011, 12:45:03 PM

Ok just now I started it and CTxDB::LoadBlockIndex took 30 seconds to complete.

It seems to take longer if I haven't been doing anything with bitcoin for a while, perhaps because the disk data is not cached by the OS.

Don't most profiling tools have options to explicitly flush read caches before runs, for just this reason?

vector76

member

Activity: 70

Merit: 18

Ok just now I started it and CTxDB::LoadBlockIndex took 30 seconds to complete.

It seems to take longer if I haven't been doing anything with bitcoin for a while, perhaps because the disk data is not cached by the OS.

It looks like CTxDB::LoadBlockIndex scans through ALL the records within blkindex.dat looking for blockindex records (block headers) which it extracts.

It then sorts them by block height, and calculates the cumulative work for each block and establishes which one is the best chain.

Then it loads the full block data and verifies it using CBlock::CheckBlock().
[edit: it appears that it only loads the most recent 2500 blocks, not all of them.]

CheckBlock() checks size limits, proof of work, coinbase, merkle root, and also checks transactions within the block but it only does simple transaction sanity checks which do not include checking whether the inputs have been spent before or whether the signatures are valid.

It's amazing that all that can be done in so little time.

I guess the next question is why...

etotheipi

legendary

Activity: 1428

Merit: 1093

Core Armory Developer

I doubt it checks everything as I think that would take much longer than 12 seconds. But I don't really know what else it would be doing... unless it's loading the entire blockchain into memory. Seems like that could be done after the GUI loads, though...

On client startup speed: Mike Hearn mentioned that the biggest thing that can be done to improve speeds of... everything... is for everyone to upgrade. Now, this is a very difficult task with no automatic update/notify code in the client. Certainly, if it's that critical to the network for smooth operation, we should at least have the client notify the user they are out of date. But this doesn't work until we put the patch in and get most people to upgrade to that one, so there's a Catch-22.

I just emailed the owner of bitcoinwatch.com, and requested that he put a persistent, high-visibility link on his page that advertises the current client version with a link to bitcoin.org. He probably gets a high proportion of all BTC users as visitors to his site. However, I'm not too familiar with the other sites where BTC users go, besides the exchanges. If you know of such sites, please send them an email asking them to advertise as such, perhaps something along the lines of:

"Current Bitcoin version is 0.3.24. Please upgrade now!"

It sounds like this is the only chance we have of getting users to upgrade. I wouldn't have even known about the upgrade myself if I hadn't checked Bitcoinwatch on the day the update was released and saw it on "current news."

BitcoinBug

full member

Activity: 196

Merit: 100

Quote from: vector76 on August 05, 2011, 02:17:55 PM

No way it checks everything in all the blocks.

I'm guessing everything but signatures. I think loading time would be explained by this. Never looked at source though...

vector76

member

Activity: 70

Merit: 18

Quote from: Mike Hearn on August 05, 2011, 01:23:06 PM

I think you haven't implemented all the checks Bitcoin is doing. It doesn't just verify headers but full blocks in the best chain.

On startup?

No way it checks everything in all the blocks.

Mike Hearn

legendary

Activity: 1526

Merit: 1134

I think you haven't implemented all the checks Bitcoin is doing. It doesn't just verify headers but full blocks in the best chain.

etotheipi

legendary

Activity: 1428

Merit: 1093

Core Armory Developer

So, as an update: I finally got some of my tools together and I have implemented the blockchain loading. Reading all the headers from blk0001.dat, and organzing the chain, with orphan checking, etc, takes approximately 1 second on my system. I am running a AMD Phenom II X4 840, which is a modest quad-core system, though I suspect only one core is being used since I didn't implement any threading.

So it concerns me that Mike's post says it takes approx 13s to load the headers. Is there a bug in the client code for executing this process? Is it just simply inefficient? According to Mike's post, this is the biggest bottleneck to getting the GUI open, so perhaps it's worth looking at what is going on here. Perhaps Mike has the capability to profile what part of loading the headers is slow...?

etotheipi

legendary

Activity: 1428

Merit: 1093

Core Armory Developer

Thanks. I've mentioned repeatedly in other posts before -- I can't grok the C++ code. Despite my experience with C++, it is completely incomprehendable to me, and I was hoping someone already knew the answer to this. Yesterday I did try to look at it, I saw a CDB class using a "Db*" and all it's derived classes, with a large web of inheritance. I never actually found where the data was stored, or how it's accessed, or where various methods are defined (probably all over the inheritance graph). Perhaps data is stored indexed by pair objects, but I can't tell for sure.

The even more frustrating part is that even if I develop a solution that's remarkably faster (I'm just about there, I'm reading an organizing headers nearly instantaneously from file), there's no way I can contribute to that code base, because there's no way I could ever get in that code and make a meaningful patch. I'm not sure what to even do with my code, besides keep it to myself and try to develop my own client...

Sorry my C++ skills aren't up to par. But that's why we have forums to discuss things. So in case all you were suggesting was to look at the code... I've done that. I've tried countless times to read the code and never understand what I find. Perhaps you can offer suggestions for how to understand it better...

-Eto

error

hero member

Activity: 588

Merit: 500

Quote from: etotheipi on July 28, 2011, 06:04:04 PM

Does anyone know how the headers and tx data is stored in RAM? If everything is referenced by 32-byte hashes, accessing blocks/tx's by hash could be very slow if we're not using a good data structure. A binary search tree probably isn't even good enough. A radix/patricia tree would be ideal, since you can access any object in about 40 clock cycles or less.

If someone is already doing some kind of speed profiling of the code, I'm wondering how long it takes for block/tx access based on hash? Maybe you could pick out a block hash, and put in a for loop to get the version of that block a million times and time how long it takes. I thrive on data structures, pointers, and low-level optimizations like this, I'd love to contribute.

https://github.com/bitcoin/bitcoin

etotheipi

legendary

Activity: 1428

Merit: 1093

Core Armory Developer

Does anyone know how the headers and tx data is stored in RAM? If everything is referenced by 32-byte hashes, accessing blocks/tx's by hash could be very slow if we're not using a good data structure. A binary search tree probably isn't even good enough. A radix/patricia tree would be ideal, since you can access any object in about 40 clock cycles or less.

If someone is already doing some kind of speed profiling of the code, I'm wondering how long it takes for block/tx access based on hash? Maybe you could pick out a block hash, and put in a for loop to get the version of that block a million times and time how long it takes. I thrive on data structures, pointers, and low-level optimizations like this, I'd love to contribute.

Mike Hearn

legendary

Activity: 1526

Merit: 1134

The loading of wallets, block chains etc should happen after the GUI is brought up. If your Python script takes the same amount of time as the C++ all that shows is that the process is IO bound, which is what you'd expect.

etotheipi

legendary

Activity: 1428

Merit: 1093

Core Armory Developer

When you say "forward the port 8333," do you mean port forwarding on my router? I tried that (any traffic to my DMZ IP address at port 8333 will go directly to my computer's port 8333). It doesn't make much sense to me, but maybe that's what you meant. I'll try it when I get home.

If you look back to Mike Hearn's post, he suggests that my 0.3.24 client won't be able to get the blockchain except from other 0.3.24 clients... which is how I thought it worked, too (but I haven't gotten to the networking part of the protocol yet, so what do I know?)

And after all this discussion, have we determined if there is a bottleneck that can be addressed? Assume everyone magically upgrades to 0.3.24 today, and the throttling bug disappears. Do we still expect speed issues? Should we be brainstorming better ideas for multi-threading, make GUI more accessible while your system is working in the background?

And finally, any reason you know why it takes 12 seconds to read the headers from blk0001.dat? That's how long it takes my python script to do it (with hashing and blockchain organization), so I assumed it could be done dramatically faster in C++ code...

wumpus

hero member

Activity: 812

Merit: 1022

No Maps for These Territories

Quote from: etotheipi on July 27, 2011, 12:45:18 PM

Rather than disconnecting clients with different versions immediately, maybe give the opportunity to exchange getdata() requests.

I don't think any forced disconnection happens, ever. At least all the recent clients can communicate with each other, due to protocol versioning.

The bug Mike Hearn is talking about is unrelated to the version difference itself, but it is a throttling bug present in older versions.

Quote

P.S. - Perhaps I'm an outlier... does anyone else have these kinds of problems getting blocks from the network?

No problems getting blocks on any of my (Linux) hosts at the moment. A month ago it was problematic sometimes. Do you forward the port 8333?

Topic: Any way to get client startup faster? (Read 4368 times)