Bitcoin Core 0.12 full initial sync: results and analysis

mkabatek

newbie

Activity: 7

Merit: 0

Quote from: belcher on February 24, 2016, 08:00:03 AM

Did you use the -dbcache parameter? It can speed up sync by storing parts of the UTXO set in RAM instead of on hard disk. You have 4GB so there is space to keep at least some of the UTXO set.

I tried increasing dbcache and it definitely improved the speed. I'm seeing a similar time frame as OP without dbcache cranked, however with dbcache=4096, and prune=550, it was more like 13 hours. I was doing this over wifi, also I have another fully synced node running on the same network.

I'm running on a ubuntu VM:
CPU: i7-3720QM @ 2.6GHz
RAM: 5GB

jl777

legendary

Activity: 1176

Merit: 1134

Quote from: dooglus on February 26, 2016, 11:29:15 PM

Quote from: jl777 on February 25, 2016, 08:05:50 PM

without the complete blockchain preceding it, you can be sure it is already spent

You mean you can't be sure it isn't already spent I think?

correct, i need a new keyboard!

you cant be sure that it is still a valid input without knowing all prior tx.

dooglus

legendary

Activity: 2940

Merit: 1333

Quote from: jl777 on February 25, 2016, 08:05:50 PM

For SIGHASH_ALL this is exactly the case as each signature is based on the modified transaction that is composed of the entire transaction, plus the output script for the vin substituted in for the input script, with the other blanked out.

I didn't realise the vin's output script was used. That changes things.

Quote from: jl777 on February 25, 2016, 08:05:50 PM

without the complete blockchain preceding it, you can be sure it is already spent

You mean you can't be sure it isn't already spent I think?

jl777

legendary

Activity: 1176

Merit: 1134

Quote from: achow101 on February 25, 2016, 07:39:07 PM

Quote from: dooglus on February 25, 2016, 07:35:29 PM

Quote from: jl777 on February 24, 2016, 09:09:49 AM

while you can verify sigs a lot of times from the recent blocks, there are many times where you just dont know what the txid the vin refers to is, since you havent seen it yet during a parallel sync.

So you could verify a significant subset as it comes in during the first pass, but until the blockchain is fully there you cant verify all the sigs.

Are you sure that you need to know anything about the input transactions other than the txid and vout in order to verify a signature? I didn't think that was the case.

I don't see any reason why you couldn't verify the signatures out of blockchain order.

The sigs themselves can be verified out of order, but verifying the entire transaction cannot. The inputs of a transaction reference an output of a previous transaction and to verify that an input correctly spends the output, that output needs to be known so that its script can be combined with the input script to verify that input.

Pretty close.
For SIGHASH_ALL this is exactly the case as each signature is based on the modified transaction that is composed of the entire transaction, plus the output script for the vin substituted in for the input script, with the other blanked out. Yes, it is creating the big headaches. The wiki explains this.

Now yes, you can verify that the signature is correct and if you have the corresponding output you can even verify that it is validly signed. However, without the complete blockchain preceding it, you can be sure it is already spent. So it only would be a sig valid for that sig and not a tx is valid.

This would create even more partial states to track and things are already pretty complicated and I dont feel it makes sense to add such extra tracking for the incremental overlapping. In fact, it could end up slowing things down as things are already pretty close to saturating both the bandwidth and CPU usage, so additional computing offsets the usage of the idle CPU time (assumes we can use all available CPU without detriment).

The sigs that could be verifified without the rest would be the SIGHASH_SINGLE, but that is still a partial verification and it is so little used, it can be ignored in overall throughput calculations.

When performance optimizing, it is important to always keep the overall goal in mind and not zoom in on specific tasks that are "obviously" needing to be done. At full speed parallel sync, things like context switching from system calls have significant impact, so things behave in "illogical" ways. Just going to onetime memory allocation and reusing the buffers got a 30% performance boost.

Everything affects everything else as they are all sharing the same bandwidth, CPU and especially the HDD. A random seek pattern even just a few per second is all it takes to kill HDD bandwidth an order of magnitude.

James

achow101

staff

Activity: 3458

Merit: 6793

Just writing some code

Quote from: dooglus on February 25, 2016, 07:35:29 PM

Quote from: jl777 on February 24, 2016, 09:09:49 AM

while you can verify sigs a lot of times from the recent blocks, there are many times where you just dont know what the txid the vin refers to is, since you havent seen it yet during a parallel sync.

So you could verify a significant subset as it comes in during the first pass, but until the blockchain is fully there you cant verify all the sigs.

Are you sure that you need to know anything about the input transactions other than the txid and vout in order to verify a signature? I didn't think that was the case.

I don't see any reason why you couldn't verify the signatures out of blockchain order.

The sigs themselves can be verified out of order, but verifying the entire transaction cannot. The inputs of a transaction reference an output of a previous transaction and to verify that an input correctly spends the output, that output needs to be known so that its script can be combined with the input script to verify that input.

dooglus

legendary

Activity: 2940

Merit: 1333

Quote from: jl777 on February 24, 2016, 09:09:49 AM

while you can verify sigs a lot of times from the recent blocks, there are many times where you just dont know what the txid the vin refers to is, since you havent seen it yet during a parallel sync.

So you could verify a significant subset as it comes in during the first pass, but until the blockchain is fully there you cant verify all the sigs.

Are you sure that you need to know anything about the input transactions other than the txid and vout in order to verify a signature? I didn't think that was the case.

I don't see any reason why you couldn't verify the signatures out of blockchain order.

jl777

legendary

Activity: 1176

Merit: 1134

Quote from: pereira4 on February 25, 2016, 10:54:08 AM

For me it was really fast compared to the older version, the verification I mean, but the download speed is the same, rather slow I think.
Is there a way to see at what kb/s are you downloading the blockchain? I think it would be cool if it shows somewhere so you aren't as desperate about it. It's really confusing to not know if you are downloading at a decent speed or not. I think its downloading really slow compared to the maximum bandwidth of my connection. I have opened the required port and i get around 10 incoming connections so I don't now whats up.

In the Qt-app Debug window, network traffic

if you arent doing anything else bandwidth hungry, you can run bmon on unix systems

osx has bandwidth chart in the Activity Monitor

pereira4

legendary

Activity: 1610

Merit: 1183

For me it was really fast compared to the older version, the verification I mean, but the download speed is the same, rather slow I think.
Is there a way to see at what kb/s are you downloading the blockchain? I think it would be cool if it shows somewhere so you aren't as desperate about it. It's really confusing to not know if you are downloading at a decent speed or not. I think its downloading really slow compared to the maximum bandwidth of my connection. I have opened the required port and i get around 10 incoming connections so I don't now whats up.

_biO_

full member

Activity: 174

Merit: 102

Hardware: CPU: Intel Core i7-4790 @ 3.6GHz Disk: Samsung EVO 850 SSD 512MB Cache SATA 6.0Gb/s RAM: 16GB (-dbcache=4096) Software: Bitcoin Core v0.12 (official x64 Windows Release) Internet connection: 25 Mbyte/sec Sync time: 03h 54m

TierNolan

legendary

Activity: 1232

Merit: 1094

Quote from: jl777 on February 24, 2016, 11:43:56 AM

It doesnt take too long to process that far, but it starts really slowing down at around 350,000. I think that is when p2sh started being used a lot more and tx just got bigger and more complicated

P2SH should actually speed up parallel verification. You can fully validate a P2SH transaction without looking at any other transactions.

The only serial part is connecting it to the inputs. That requires verifying that the inputs >= outputs and also that running a Hash160 for each input to make sure that the sub-script matches the hash.

Quote

User veqtrus on Reddit informs me that checkpoints were phased out after v10.0 (headers-first). If that's so, then they're basically irrelevant now. Block #295000 is really ancient history.

Block 295000 has a difficulty of 6119726089. With a 4 GH/s miners, you could find a header that builds on that block every 200 years or so. For that effort, you can force everyone in the network to store 80 bytes of data.

Before headers first, you could force everyone on the network to store a 1MB of data, since you could force them to store a full block. With headers first, they will accept and store your header [*], but won't download the full block, since it isn't on the longest chain.

It is a cost tradeoff. The latest block has a difficulty of 1880739956. Even if that was the checkpoint, it would only be around 3 times harder to create fake blocks. Moving to headers first improved things by 12,500 (1MB / 80 bytes).

Headers first combined with 295000 as a checkpoint gives 3841 times better protection than blocks-only and 399953 as a checkpoint.

[*] Actually, I am not sure if they even commit it to disk. It might just waste 80 bytes of RAM.

mmgen-py

member

Activity: 112

Merit: 27

Quote from: jl777 on February 24, 2016, 11:43:56 AM

Quote from: mmgen-py on February 24, 2016, 11:39:00 AM

Quote from: jl777 on February 24, 2016, 11:29:00 AM

Also I think I only need to validate from the latest checkpoint, so that reduces significantly the sheer number of sigs that need to be verified.

Actually, I was running rc5 as it turns out, which had the checkpoint way back at block #295000. That's been updated in the final release, which of course will make things go much faster.

majority of tx since 295000
It doesnt take too long to process that far, but it starts really slowing down at around 350,000. I think that is when p2sh started being used a lot more and tx just got bigger and more complicated

User veqtrus on Reddit informs me that checkpoints were phased out after v10.0 (headers-first). If that's so, then they're basically irrelevant now. Block #295000 is really ancient history.

jl777

legendary

Activity: 1176

Merit: 1134

Quote from: mmgen-py on February 24, 2016, 11:39:00 AM

Quote from: jl777 on February 24, 2016, 11:29:00 AM

Also I think I only need to validate from the latest checkpoint, so that reduces significantly the sheer number of sigs that need to be verified.

Actually, I was running rc5 as it turns out, which had the checkpoint way back at block #295000. That's been updated in the final release, which of course will make things go much faster.

majority of tx since 295000
It doesnt take too long to process that far, but it starts really slowing down at around 350,000. I think that is when p2sh started being used a lot more and tx just got bigger and more complicated

mmgen-py

member

Activity: 112

Merit: 27

Quote from: jl777 on February 24, 2016, 11:29:00 AM

Also I think I only need to validate from the latest checkpoint, so that reduces significantly the sheer number of sigs that need to be verified.

~~Actually, I was running rc5 as it turns out, which had the checkpoint way back at block #295000. That's been updated in the final release, which of course will make things go much faster.~~

EDIT: Not so, as it turns out. Checkpoints are no longer being updated (since v0.10), so the last one remains at #295000.

jl777

legendary

Activity: 1176

Merit: 1134

Quote from: mmgen-py on February 24, 2016, 11:09:57 AM

Quote from: jl777 on February 24, 2016, 09:09:49 AM

while you can verify sigs a lot of times from the recent blocks, there are many times where you just dont know what the txid the vin refers to is, since you havent seen it yet during a parallel sync.

Which places rather tight constraints on parallelism.

I have the blockchain validating serially all throughout the parallel sync, so all the sig validations can take 30 minutes using N cores without affecting the final completion time. Also I think I only need to validate from the latest checkpoint, so that reduces significantly the sheer number of sigs that need to be verified.

As long as users can see balances, even if the validation isnt complete yet, just need to prevent any spends until all the inputs for the tx are validated. My guess is for most of the people most of the time, it can be done so the time for sig validation is not noticeable. Also for user tx before fully validated, I could select the oldest inputs and specificallly validate the things needed to make sure those are valid.

James

mmgen-py

member

Activity: 112

Merit: 27

Quote from: jl777 on February 24, 2016, 09:09:49 AM

while you can verify sigs a lot of times from the recent blocks, there are many times where you just dont know what the txid the vin refers to is, since you havent seen it yet during a parallel sync.

Which places rather tight constraints on parallelism.

LiteCoinGuy

legendary

Activity: 1148

Merit: 1014

In Satoshi I Trust

35.5 hours to download and verify the blockchain now? very good results.

jl777

legendary

Activity: 1176

Merit: 1134

Quote from: mmgen-py on February 24, 2016, 09:04:18 AM

Quote from: gmaxwell on February 24, 2016, 08:38:30 AM

Those pauses aren't pauses. You're going by log entries, which only note when verification has completed.

Of course that explains it. Otherwise there would be out-of-order entries in the log, which there never are. It doesn't explain the lack of CPU activity though during the pauses. Sigs could be checked in the order received, couldn't they?

Quote from: gmaxwell on February 24, 2016, 08:38:30 AM

If you were to run with a larger dbcache setting you would likely see vastly improve performance

I may try that -- and post the results for comparison purposes.

Quote from: gmaxwell on February 24, 2016, 08:38:30 AM

It can't really set it automatically, because, unless you had much more ram than you do, bitcoin would end up hogging most of your memory when you wanted to use your computer for other things

A dynamic dbcache setting might be OK if the limit were set conservatively. A default setting should never be seriously sub-optimal, in my opinion.

while you can verify sigs a lot of times from the recent blocks, there are many times where you just dont know what the txid the vin refers to is, since you havent seen it yet during a parallel sync.

So you could verify a significant subset as it comes in during the first pass, but until the blockchain is fully there you cant verify all the sigs.

as the highest block is advanced, whatever sigs that are not verified yet could be verified. But with the checkpointing, there isnt the sig verification needed until the last checkpoint anyway

mmgen-py

member

Activity: 112

Merit: 27

Quote from: gmaxwell on February 24, 2016, 08:38:30 AM

Those pauses aren't pauses. You're going by log entries, which only note when verification has completed.

Of course that explains it. Otherwise there would be out-of-order entries in the log, which there never are. It doesn't explain the lack of CPU activity though during the pauses. Sigs could be checked in the order received, couldn't they?

Quote from: gmaxwell on February 24, 2016, 08:38:30 AM

If you were to run with a larger dbcache setting you would likely see vastly improve performance

I may try that -- and post the results for comparison purposes.

Quote from: gmaxwell on February 24, 2016, 08:38:30 AM

It can't really set it automatically, because, unless you had much more ram than you do, bitcoin would end up hogging most of your memory when you wanted to use your computer for other things

A dynamic dbcache setting might be OK if the limit were set conservatively. A default setting should never be seriously sub-optimal, in my opinion.

jl777

legendary

Activity: 1176

Merit: 1134

Quote from: mmgen-py on February 24, 2016, 08:42:53 AM

Quote from: jl777 on February 24, 2016, 08:22:34 AM

The first 200K blocks are mostly empty and represents about 10% of total size, maybe less
The big increase in size happens around block 250k and then I think around 350k

So it is normal for it to zoom through the first 50% to 75% and then slow down.

Bitcoind's progress estimation is transaction-based, not block-based.

Quote from: jl777 on February 24, 2016, 08:22:34 AM

The pauses are most likely due to waiting for data for a specific block. Depending on how many blocks are requested to a peer, it could be a long pipeline and since the order of the blocks being returned is at the discretion of the other peer

Thanks, that probably explains the pauses. And thanks for the other points in your informative post.

most of the early blocks have very few transactions. Just not much crypto going on in 2010

~~And the estimate for the total number of transactions is probably updated based on the blocks it has seen so far, so it wont know they are blocks full of transactions until it gets there.~~

I take the above back, I am seeing the progress bar at block 350K which is 85%+ done but the progress bar is nowhere close to that, so it must be estimating the number of tx somehow.

mmgen-py

member

Activity: 112

Merit: 27

Quote from: jl777 on February 24, 2016, 08:22:34 AM

The first 200K blocks are mostly empty and represents about 10% of total size, maybe less
The big increase in size happens around block 250k and then I think around 350k

So it is normal for it to zoom through the first 50% to 75% and then slow down.

Bitcoind's progress estimation is transaction-based, not block-based.

Quote from: jl777 on February 24, 2016, 08:22:34 AM

The pauses are most likely due to waiting for data for a specific block. Depending on how many blocks are requested to a peer, it could be a long pipeline and since the order of the blocks being returned is at the discretion of the other peer

Thanks, that probably explains the pauses. And thanks for the other points in your informative post.

Topic: Bitcoin Core 0.12 full initial sync: results and analysis (Read 5264 times)