P2P block download - use bit torrent like method

TierNolan

legendary

Activity: 1232

Merit: 1094

Quote from: JoelKatz on July 15, 2011, 09:00:39 AM

Right, but so what? Just because the header is valid you cannot conclude the block is valid.

The further from the end of the chain that a block occurs, the more likely it is to be valid. If you accept a transactions after 6 confirms, then you should accept a block that is 6+ back along the block chain.

To fake it, you need > 50% of the hashing power of the network since the point where you fork it.

Quote

The "longest chain" test only applies among valid chains. A longer chain that is not valid is not strongest.

The chain "timestamp"/certification process is secured by the hashing power of the network not by the digital signatures on the transactions.

If you were able to generate an alternative chain that was longer than the main chain, then you could just generate a chain which is 100% coinbase transactions all payable to you.

As long as a chain starts from the genesis block, has valid timestamps (increasing properly and not more than 2 hours in the future), correctly calculated difficulty updates and all hashes meet the difficulty target, then it can be assumed to be a valid chain. This can be confirmed entirely by looking at the headers.

JoelKatz

legendary

Activity: 1596

Merit: 1012

Democracy is vulnerable to a 51% attack.

Quote from: TierNolan on July 14, 2011, 07:31:57 PM

Quote from: JoelKatz on July 14, 2011, 07:21:33 PM

That creates a chicken and egg problem. You can't know the correct block X until you know the correct block X-1. So saying "if it's 1,000 blocks behind a valid block, it's valid" won't help.

The block chain can be verified first without verifying the transactions contained in it. Only the 80 byte block header is required to verify that each header is part of a chain.

Right, but so what? Just because the header is valid you cannot conclude the block is valid.

Quote

Suppose there are 136,000 blocks in the chain. You receive, from an untrusted peer, a block that purports to be block 53,000. How can you tell it's more than 1,000 blocks behind the block 136,000 that you don't even know exists yet because you have no way to verify it?

If you receive 100 different chains from 100 different peers, you can still check which one is the longest chain. That gets you all the block headers for the strongest chain.

The "longest chain" test only applies among valid chains. A longer chain that is not valid is not strongest.

Quote

If someone sends you money you can verify that it hasn't been double spent by seeing if it is included in the block chain, with 6 confirms, it isn't likely to be a double spend.

Again, chicken and egg problem. You can't tell if it's included in the block chain with 6 confirms until you know which chain is the valid one, and that requires checking the transactions.

TierNolan

legendary

Activity: 1232

Merit: 1094

Quote from: deepceleron on July 14, 2011, 07:58:16 PM

I'm suggesting having a deep queue of the next blocks ready for verification so the only bottleneck is processing them. If after verifying block #100 I have to hit the p2p network again, request block #101, and wait for the client I request a block from to respond (or not) and send it to me, that's adding that much more latency per block. Block #101 from the network is as likely to be valid if I get it on-demand, or if I've already downloaded blocks 102-500 too.

The current system requests blocks in groups of up to 500 blocks at a time. It possibly does download -> verify -> download -> verify.

The key point is that once you have the block headers, you can download the full blocks in any order. You can verify any block by checking that the header matches and that the merkle root is correct. This proves that it is a valid part of the chain. This can be done without checking any of the signatures.

deepceleron

legendary

Activity: 1512

Merit: 1036

Quote from: JoelKatz on July 14, 2011, 07:21:33 PM

Quote from: TierNolan on July 14, 2011, 05:58:20 PM

Quote from: JoelKatz on July 14, 2011, 05:16:35 PM

Now you're trying to do this 136,000 times. That's 18 hours at 2 blocks per second.

You could just trust any block that is more than 1000 blocks behind the head of the chain.

That creates a chicken and egg problem. You can't know the correct block X until you know the correct block X-1. So saying "if it's 1,000 blocks behind a valid block, it's valid" won't help.

Suppose there are 136,000 blocks in the chain. You receive, from an untrusted peer, a block that purports to be block 53,000. How can you tell it's more than 1,000 blocks behind the block 136,000 that you don't even know exists yet because you have no way to verify it?

Quote

If security is really required, then you could do the verification slowly once the entire chain has been downloaded.

So then why bother to download the blocks really quickly? There's nothing you can do with an unverified block -- you don't know it's valid.

I'm suggesting having a deep queue of the next blocks ready for verification so the only bottleneck is processing them. If after verifying block #100 I have to hit the p2p network again, request block #101, and wait for the client I request a block from to respond (or not) and send it to me, that's adding that much more latency per block. Block #101 from the network is as likely to be valid if I get it on-demand, or if I've already downloaded blocks 102-500 too. If I've filled my queue and still have network speed to burn, I could have even requested block #101 three times and MD5'd them or have a pool of potential blocks in case I need to sort out which is the longest block chain (although this is silly, the chance of picking up in-the-wild invalid early blocks seems so low that we can almost assume everything we get is good and ignore optimizing the queue around this possibility).

A basic proof to see if there is utility would be to bootstrap a fresh client on a local-only network with several nodes and see how close you can peg CPU to 100% (this still has the client idle requesting the next block). How long does it take to get all the blocks? Then do the same on the P2P network. If it takes longer, there is room for improvement.

TierNolan

legendary

Activity: 1232

Merit: 1094

Quote from: JoelKatz on July 14, 2011, 07:21:33 PM

That creates a chicken and egg problem. You can't know the correct block X until you know the correct block X-1. So saying "if it's 1,000 blocks behind a valid block, it's valid" won't help.

The block chain can be verified first without verifying the transactions contained in it. Only the 80 byte block header is required to verify that each header is part of a chain.

Quote

Suppose there are 136,000 blocks in the chain. You receive, from an untrusted peer, a block that purports to be block 53,000. How can you tell it's more than 1,000 blocks behind the block 136,000 that you don't even know exists yet because you have no way to verify it?

If you receive 100 different chains from 100 different peers, you can still check which one is the longest chain. That gets you all the block headers for the strongest chain.

You can use that to verify where each block slots into the sequence of blocks.

If someone sends you money you can verify that it hasn't been double spent by seeing if it is included in the block chain, with 6 confirms, it isn't likely to be a double spend.

JoelKatz

legendary

Activity: 1596

Merit: 1012

Democracy is vulnerable to a 51% attack.

Quote from: TierNolan on July 14, 2011, 05:58:20 PM

Quote from: JoelKatz on July 14, 2011, 05:16:35 PM

Now you're trying to do this 136,000 times. That's 18 hours at 2 blocks per second.

You could just trust any block that is more than 1000 blocks behind the head of the chain.

That creates a chicken and egg problem. You can't know the correct block X until you know the correct block X-1. So saying "if it's 1,000 blocks behind a valid block, it's valid" won't help.

Suppose there are 136,000 blocks in the chain. You receive, from an untrusted peer, a block that purports to be block 53,000. How can you tell it's more than 1,000 blocks behind the block 136,000 that you don't even know exists yet because you have no way to verify it?

Quote

If security is really required, then you could do the verification slowly once the entire chain has been downloaded.

So then why bother to download the blocks really quickly? There's nothing you can do with an unverified block -- you don't know it's valid.

TierNolan

legendary

Activity: 1232

Merit: 1094

Quote from: JoelKatz on July 14, 2011, 05:16:35 PM

Now you're trying to do this 136,000 times. That's 18 hours at 2 blocks per second.

You could just trust any block that is more than 1000 blocks behind the head of the chain.

All other blocks could be verified by just checking that the hashes match.

If security is really required, then you could do the verification slowly once the entire chain has been downloaded.

JoelKatz

legendary

Activity: 1596

Merit: 1012

Democracy is vulnerable to a 51% attack.

Quote from: deepceleron on July 14, 2011, 04:55:50 PM

Bittorrent can get me data as fast as my connection, why should it take half a day to get 200MB on Bitcoin?

Because you are performing multiple cryptographic operations to validate every single byte of those 200MB. For each block received, you must at least:

1) Check that the hash of the block meets the target.
2) Check that the block correctly links to the preceding block.
3) Calculate the Merkle tree for all the transactions and verify it against the block header.
4) Retrieve every transaction whose outputs are claimed by transactions in this block.
5) Check the difficulty against the block count and timestamps.
6) For every input of most transactions, you must check that the public key has the correct hash and that the signature is correct.

I picked a random block (136,301) and did the math, it's about 500 signatures and 800 hashes for that one block. A typical mid-range modern CPU can verify 800 signatures a second and compute 300,000 hashes per second. (Including the overhead to set up each operation. You can do them faster in a tight brainless loop where the code stays in caches, that's unrealistic.) So processing this one block would take about half a second. (In fairness, this was an unusually large block.)

Now you're trying to do this 136,000 times. That's 18 hours at 2 blocks per second.

deepceleron

legendary

Activity: 1512

Merit: 1036

I'm going to bump this because I used search and knew I couldn't be the first to think of this.

Bitcoin does need aggressive torrent-like out-of-order block downloading, especially on new client install or a restart after not being used for a while. If instead of requesting block 1, then block 2, etc, I request a simultaneous 500 blocks from 500 nodes I am going to get them much quicker. Think of this as a torrent-like look ahead. My client can start verifying it's way up the block chain as it starts assembling them and adding them to the block database, and if it doesn't get a few blocks after an expected time or finds invalid blocks, it can request them again from several clients.

Alternately, it could get bitcoin_blockchain_120000.zip or similar from a trusted source upon new installation.

Bittorrent can get me data as fast as my connection, why should it take half a day to get 200MB on Bitcoin?

TierNolan

legendary

Activity: 1232

Merit: 1094

Quote from: just_someguy on June 30, 2011, 05:11:36 PM

It will rescan for transactions.... but if the chain download lies to you and gives you an alternate chain you won't know.
It will become apparent though as soon as you get no new blocks.

Ahh, ok. I think there needs to be a -verifychain command.

just_someguy

full member

Activity: 125

Merit: 100

It will rescan for transactions.... but if the chain download lies to you and gives you an alternate chain you won't know.
It will become apparent though as soon as you get no new blocks.

TierNolan

legendary

Activity: 1232

Merit: 1094

Quote from: just_someguy on June 30, 2011, 04:54:46 PM

If you aren't verifying anything or looking for transactions that are relevant to you then you might as well just download the block chain all at once:
http://bitcoin.bluematt.me/bitcoin-nightly/blockchain-nightly/

Does the -rescan parameter just scan for transactions, or does it check the chain for validity?

just_someguy

full member

Activity: 125

Merit: 100

Quote

True, however, for the purposes of verifying that the headers form a valid chain, the Merkle root might as well be a random number.

It is used to confirm that the information contained in the blocks is valid.

If you aren't verifying anything or looking for transactions that are relevant to you then you might as well just download the block chain all at once:
http://bitcoin.bluematt.me/bitcoin-nightly/blockchain-nightly/

TierNolan

legendary

Activity: 1232

Merit: 1094

I am downloading the chain again, and the process is at 90% CPU, so I guess I was wrong about the bottle-neck.

TierNolan

legendary

Activity: 1232

Merit: 1094

Quote from: kjj on June 30, 2011, 02:02:21 PM

I'm pretty sure that verification time is the dominant factor when fetching a large number of blocks. For each block, the client verifies each transaction. That is a lot of work.

I could see that being true in the future, but it shouldn't be a massive deal at the moment. The client should still try to download the blocks as fast as possible anyway.

I have noticed that the client uses a lot of swap space though.

kjj

legendary

Activity: 1302

Merit: 1026

I'm pretty sure that verification time is the dominant factor when fetching a large number of blocks. For each block, the client verifies each transaction. That is a lot of work.

TierNolan

legendary

Activity: 1232

Merit: 1094

Quote from: just_someguy on June 30, 2011, 12:15:22 PM

That's a bit misleading.
Part of the 80 bytes is the merkle root of the transactions which is essentially the contents of the block.

True, however, for the purposes of verifying that the headers form a valid chain, the Merkle root might as well be a random number.

It is used to confirm that the information contained in the blocks is valid.

just_someguy

full member

Activity: 125

Merit: 100

Quote

The hash is calculated solely based on the 80 bytes of the block header (which contains the hash of the previous block), and not the contents of the block

That's a bit misleading.
Part of the 80 bytes is the merkle root of the transactions which is essentially the contents of the block.

TierNolan

legendary

Activity: 1232

Merit: 1094

Quote from: patvarilly on June 30, 2011, 10:41:48 AM

As is pointed out in the posts, it might be much faster to download lots of small parts of the chain from lots of peers separately, and then join them together on the client side, because the individual peer-to-peer speed might be quite limited.

Also, it means that there is no sharing, which is key to p2p (at least bit torrent).

The only people who want blocks from you are behind you in the chain. You give them blocks, but they never have any blocks you need.

This means that it is best to just be a pure leecher.

patvarilly

Activity: -

Merit: -

Quote from: XIU on June 30, 2011, 06:21:33 AM

Isn't the reason that the blocks have to verified linear anyway? As in part of the hash is the result of the previous hash?

The hash is calculated solely based on the 80 bytes of the block header (which contains the hash of the previous block), and not the contents of the block (https://en.bitcoin.it/wiki/Protocol_specification#Block_Headers). So you can verify the integrity of the block chain using only the headers. If you get block headers for which you don't have the previous block's header, e.g., not linearly, they'll be stored in memory as orphan chains. Eventually, you'll get a block that allows you to link the orphan chains to the main block chain. As is pointed out in the posts, it might be much faster to download lots of small parts of the chain from lots of peers separately, and then join them together on the client side, because the individual peer-to-peer speed might be quite limited.

Topic: P2P block download - use bit torrent like method (Read 4730 times)