I've been glancing through the Bitcoin Core code, but it'll take me awhile to piece everything together. Hoping someone can drop some knowledge to help speed things up
My question is, during initial block sync, Bitcoin Core connects to a bunch of nodes. Okay. Now it needs to find the main chain. So say we send
to a bunch of them (
. Say some of them are lying and have fake chains. So the
you get back from different peers ends up being different.
Which chain do you choose to download? Suppose there are several fake chains you get presented with, and 1 real chain. How do you tell them apart without downloading all of them? Maybe the fake chains have more work for awhile, and then (eventually) less work than the real chain? What if the real chain has more work for awhile, but then less for a bit, until it eventually overtakes much later in the height. What's the logic that allows you to find the real chain in those scenarios?
One thing I've thought of so far is to keep track of, say, 8 header chains. Call
repeatedly on all 8 of your peers, potentially forming a variety of header chains. Do that until you've exhausted all potential chains. This should only consume ~320MB of disk space right now, so easy enough. Find the longest (most work). Start downloading the blocks for it. If you ever encounter an invalid block, store that block's hash in an invalid database, ban the peer that gave it to you and any peers that broadcast that header as part of their
responses. Delete that header chain. Now begin downloading the next longest chain. So forth and so forth until you find the longest valid chain. You should eventually find it, because each time you find an invalid block you ban at least 1 peer and thus connect to 1 new peer. So you'll sniff around the network until you find a trustable peer.
The only failure in that logic I see is if all 8 peers have totally valid, but truncated chains. Presumably you could continually swap at least 1 peer every hour or so, and eventually find the real chain.
But watching Bitcoin Core's network logs, it looks like it begins calling
to download chains immediately after it gets its first
back. So is it doing more-or-less what I describe above, but just begins downloading the current best header chain optimistically? I suppose the worse case there is it'll have wasted bandwidth and have to throw away a bunch of blocks.
Thanks!