Improved Block Relaying and Validation

MatthewLM

legendary

Activity: 1190

Merit: 1004

Thanks Pieter. The purpose of this proposal is not focused upon detecting invalid blocks. It's focus is upon the parrellel downloading of blocks from multiple peers. Providing that the decrease in time that it takes to transfer the segments from multiple peers, as opposed to the entire block from one peer, is more than the additional latency added (for obtaining merkle tree data and whatever else), this can be beneficial. The proposal was suggested in response to fears about "What if blocks get really big". Of-course, this proposal becomes more and more beneficial, the bigger blocks get.

I asked for some network data about latency times and transfer rates to try and determine just how big blocks would have to be to benefit from such a proposal but I do not have that data. There will be variation within the network, I know.

The main criticism I have got is "Blocks will never get really big because we will always limit them" so it's a waste of time to consider this. Well, that is a major issue: what do we do about the block size limit?

For larger pieces of data parrellel downloading is certainly beneficial in p2p networks as can be seen with bittorrent.

So yes, I do not think I properly explained the purpose of this proposal, which is to enable parrellel downloading of blocks in smaller data segments such as with bittorrent. I call them segments because calling them "data blocks" would cause confusion with "bitcoin blocks".

And I do not suggest that this proposal is important right now. I was thinking about the future and the increasing demand for block-space.

Pieter Wuille

legendary

Activity: 1072

Merit: 1189

The question is really weighing the risks and benefits.

Some methods for block relaying will have less data transfer in case of invalid data, and other methods will have fewer round-trips.

An extreme of the first case is sending every node of the merkle tree separately, verifying it, and requesting its children. Eventually, when the entire merkle tree is transferred and verified, request the transactions one by one, and verify their hash matches what is claimed by the merkle leaf nodes. This requires 2*N-1 round-trips, and will never download more than a single invalid transaction.

An extreme of the second case is what we do now: send the entire block at once. There is at most 1 round trip, and we'll never download more than a single invalid full block.

What you are suggesting is a compromise, which is very close to the first example. You risk downloading one invalid level of the merkle tree, but at the cost of log(N) round trips.

Time for some math: with a very slow EDGE (2G GSM mobile data) connection, you have up to 600ms of latency, and a bandwidth of 20-60 kbit/s. That means that during one additional round trip, you could have transferred 1.5-4.5 kbyte. Unless the actual amount of data traffic is important, you're most likely trying to optimize transfer time. The calculation above means that an additional round trip is not worth it if it saves you less than a few kilobytes (and in more modern networks, much more). As we don't expect to receive many invalid blocks/trees/transactions, the actual amount saved by systems with more round trips is almost nothing, but they can protect against a worst case.

The technique proposed by BIP 37 involves sending just the necessary part of the merkle tree + transactions not known to be known to the receiver. This saves some trivial bandwidth (the transactions already known to the receiver), but doesn't introduce additional latency. I believe this will in practice be significantly faster than what you're proposing.

MatthewLM

legendary

Activity: 1190

Merit: 1004

I see that many people misunderstood this proposal: http://bitcoinstats.com/irc/bitcoin-dev/logs/2012/09/13

You do not need to download the entire merkle tree. That would be stupid. You download a level of the tree. You verify this level, and then you can verify the subtrees for each hash at this level. Why not just download all the transactions and verify later? Well you could do this but if one of the nodes can you a bad set of transactions then the merkle tree will be incorrect. To determine the incorrect hashes you could perhaps request the merkle level afterwards. The way I wrote the proposal the level is downloaded before the transactions so validation can be completed for each segment received.

Of-course someone could still create an invalid block but that is more unlikely than someone giving bad transactions for a block, so the way I suggest to do it is better as it protects against bad nodes which would give bad transactions for otherwise valid blocks. With my suggestion you can validate each segment as you receive it, and you can relay them immediately with confidence they belong to a block.

But please explain why I'm wrong or a better way of doing things. I did say I woudl improve my proposal as I know it is far from perfect but I'm too bus right now, so I'll come back to it later.

Matt Corallo

hero member

Activity: 755

Merit: 515

Quote from: Mike Hearn on August 26, 2012, 09:08:23 AM

This isn't a new idea and Matt has already got it implemented on a branch (along with his bloom filtering work). It adds a new CRelayBlock type that contains only hashes. Nodes request transactions that they didn't already see, but of course most of the time those transactions were already received and verified.

The branch Mike is referring to is at https://github.com/TheBlueMatt/bitcoin/commits/bloom%2Brelayblock but its largely dormant.

MatthewLM

legendary

Activity: 1190

Merit: 1004

No, it is not 10 times the amount. It is 3 times the amount of latency. And you could skip gettreelevel (Do merkle tree validation at the end with an issue relating to dishonest nodes), so it is only 2 times the amount then.

Does anyone have data for the average latency on the bitcoin network? The average data transfer rate from single nodes? Lastly when up-to-date how many transactions does a node on average have that exist in new blocks?

That information will help explain if there is benefit to this protocol change today. But don't forget about tomorrow. The bitcoin network needs room to expand.

jgarzik

legendary

Activity: 1596

Merit: 1100

Quote from: MatthewLM on August 28, 2012, 01:02:19 PM

MT sig verf?

Multi-threaded signature verification

Quote

This is the first draft of the proposal: https://en.bitcoin.it/wiki/User:MatthewLM/ImprovedBlockRelayingProposal

Consider the number of P2P commands needed for your proposal, versus today. It's like 10x the number, with associated additional latency.

MatthewLM

legendary

Activity: 1190

Merit: 1004

MT sig verf?

This is the first draft of the proposal: https://en.bitcoin.it/wiki/User:MatthewLM/ImprovedBlockRelayingProposal

jgarzik

legendary

Activity: 1596

Merit: 1100

Quote from: MatthewLM on August 25, 2012, 08:06:56 PM

I was talking on the #bitcoin-dev chat (http://bitcoinstats.com/irc/bitcoin-dev/logs/2012/08/25#l4357862). On there some discussion about block relaying came up. In fact this is a very important discussion and I'm surprised I haven't seen it come up very much.

Here is an alternative solution to block relaying...

1. Nodes ask for block hashes though inventory messages as normal.
2. Nodes download the headers from several peers.
3. Nodes will validate headers as they come and fit them together.
4. Nodes will record download speeds for different peers and stop downloading from relatively slow peers and to download more important data from faster peers.
5. Nodes will download blocks in segments. How exactly is up to discussion but the merkle tree allows for download and validation of blocks segments. Blocks can be downloaded by multiple peers and blocks can be relayed before they are fully validated.

It would be nice to improve block relay latency but the above proposal introduces a large amount of additional latency to the optimal case (bitcoin on tmpfs w/ MT sig verf). We don't want to slow things down like that.

MatthewLM

legendary

Activity: 1190

Merit: 1004

Edit: I'm changing the proposal so that you can specify the index of a starting transaction and the length of the segment you wish to acquire. So now you can ask for any arbitrary segment.

Realpra

hero member

Activity: 815

Merit: 1002

Any swarm communication protocol needs to relay block segments that are connected to the root - a branch basically.

Further branches must be able to be requested so that a swarm client can request a specific part of a block.
Something like "right-left-left-right->all below" for instance.

Something like subscribing to updates about certain addresses also need to be possible.

Can't give this my full attention at the moment, but it seems more important than multi-threading though that is also very cool.

MatthewLM

legendary

Activity: 1190

Merit: 1004

That is nice, though I cannot find the CRelayBlock. I tried searching github for it as I assumed it would be on there but nothing came up. But basically it just provides transactions hashes for a block?

The improvement with my idea is that nodes can ask for an arbitrary level of the merkle hash. It could be that if level 0 is specificed in "gettreelevel", the deepest level (transactions) are returned but a node may just want to split the block into segments/chunks that is considered most efficient. For instance when downloading the block chain when the node is out-of-date it might not have many transactions in the blocks, so sharing transaction hashes may not be efficient until the block-chain is up-to-date, when the node can start asking for transaction hashes.

So allowing for specifying arbitrary depths in the merkle tree would make the protocol more powerful and specifying the merkle tree level depth would only need one byte.

Quote from: Mike Hearn on August 26, 2012, 09:08:23 AM

A while ago I added a ping/pong protocol so you can measure the responsiveness of a remote node more easily.

I think it would be more advantageous to measure the download speeds additionally to be able to calculate most efficient nodes. And it may be that nodes without the faster download times are more efficient for small messages since they respond quicker, in which case measuring the data download speeds as well as the initial response could be a good thing.

Quote

It's not used yet but as the Satoshi client is basically single-threaded at the network level.

From what I remember it uses several threads for network IO. I think it should use one IO thread for everything and then create worker threads if needed. Maybe it has changed since to one thread, in which case good thing.

Mike Hearn

legendary

Activity: 1526

Merit: 1134

This isn't a new idea and Matt has already got it implemented on a branch (along with his bloom filtering work). It adds a new CRelayBlock type that contains only hashes. Nodes request transactions that they didn't already see, but of course most of the time those transactions were already received and verified.

A while ago I added a ping/pong protocol so you can measure the responsiveness of a remote node more easily. It's not used yet but as the Satoshi client is basically single-threaded at the network level, ping/pong timing gives you a good approximation to remote load.

MatthewLM

legendary

Activity: 1190

Merit: 1004

I was talking on the #bitcoin-dev chat (http://bitcoinstats.com/irc/bitcoin-dev/logs/2012/08/25#l4357862). On there some discussion about block relaying came up. In fact this is a very important discussion and I'm surprised I haven't seen it come up very much.

Here is an alternative solution to block relaying...

1. Nodes ask for block hashes though inventory messages as normal.
2. Nodes download the headers from several peers.
3. Nodes will validate headers as they come and fit them together.
4. Nodes will record download speeds for different peers and stop downloading from relatively slow peers and to download more important data from faster peers.
5. Nodes will download blocks in segments. How exactly is up to discussion but the merkle tree allows for download and validation of blocks segments. Blocks can be downloaded by multiple peers and blocks can be relayed before they are fully validated.

Segments could be individual transactions, where you could only ask for transactions you do not already have (removing redundancy) but in this case all the transactions hashes would have to be advertised.

The bitcoin protocol could be changed to have four new messages "gettreelevel", "treelevel", "getsegment" and "segment"... or whatever people would like the command strings to be.

"gettreelevel" would ask for a level of the merkle tree at some depth. "treelevel" returns the set of hashes at that depth. "getsegment", would ask for a segment of a block that corresponds to a hash in the merkle tree and "segment" would contain the transactions for that section of the merkle tree.

So now nodes can divide the download of blocks as they see fit across different peers. Nodes can validate sections of blocks and relay them without waiting to fully validate a block. Nodes can compare peers and download from the highest quality peers. As blocks get bigger and bigger, this would be more and more beneficial.

All networking code should be on a single thread and block segment validation is done on a second thread. The network thread can access a list of segments in the validation thread and add segments, prioritising important segments (Earlier blocks, more complete blocks or whatever). If the list becomes too large, download may be temporarily halted. The validation thread continually validates the segments until they are all complete (This thread can also do work for relayed transactions). You only ever need two threads for downloading and validating the block chain (IO thread and processing thread with some synchronisation between the two).

Nodes should stop relaying segments when one of the segments contains a bad transaction.

I understand the way bitcoin uses merkle trees is a bit awkward so people would need to think about how it would technically work.

This is also a raw concept. Please discuss.

Thank you.

PS: If you would like to support the development of cbitcoin (and including the client that will be build upon it) and further development of new bitcoin ideas please donate here: 1D5A1q5d192j5gYuWiP3CSE5fcaaZxe6E9

Topic: Improved Block Relaying and Validation (Read 2321 times)