A bit of criticism on how the bitcoin client does it - page 2.

Pieter Wuille

legendary

Activity: 1072

Merit: 1189

Quote from: piotr_n on May 15, 2013, 03:18:12 AM

I have been looking at BIT37 and it seems that "merkleblock" is exactly what I need in order to divide a new block's download into small chunks and then distribute the block's download among different peers, using a bunch of "getdata 1" instead of one "getdata 2".

I don't think you can perfectly partition blocks using filtered blocks, as there are several matching criteria, and one match is enough to include a transaction. So if you send disjunct Bloom filters (nHashes=1, and disjunct bitsets to each peer), you'll still get doubles. It is perhaps an interesting extension to BIP37 to support such partitioning, especially for larger blocks.

However, one case that is explicitly (and intentionally) supported by BIP37 is requesting blocks (full-node wise) with the transactions already known to the peer filtered out. So contrary to what Mike says, I'm pretty sure you can have a full node that uses BIP37 to fetch blocks, and save download bandwidth using that.

piotr_n

legendary

Activity: 2058

Merit: 1416

aka tonikt

I have been looking at BIT37 and it seems that "merkleblock" is exactly what I need in order to divide a new block's download into small chunks and then distribute the block's download among different peers, using a bunch of "getdata 1" instead of one "getdata 2".

The only problem I see is that:

Quote

If no filter has been set on the connection, a request for filtered blocks is ignored

So I guess I will need to setup some dummy filter first, just to be able to receive this nice and useful messages.
But... now I wonder: if I setup such a filter, will I still be receiving invs, for freshly mined blocks...? Or what will be other side effects...

I will appreciate any advise here - I really want to implement it. Preferably today

kjj

legendary

Activity: 1302

Merit: 1026

Quote from: 2112 on May 14, 2013, 06:36:44 PM

Quote from: Mike Hearn on May 14, 2013, 06:07:13 PM

You're just confused, sorry. You have to download all data in every block to run a full node. This is fundamental. You can't reduce bandwidth usage by downloading parts of each block from different peers. This might reduce the upload bandwidth on their side, but it doesn't reduce the download bandwidth on your side. If you're talking about hosting blocks on Cloudfare then you're talking about download bandwidth. So your proposed change wouldn't impact anything.

The above is just another example how Mike Hearn spreads misinformation about Bitcoin protocol.

For normal operation of the Bitcoin network the majority of the "block" has already been previously transferred as a separate "transactions".

The obvious optimization of the bandwidth usage is for the clients to inspect the Merkle tree and ask the peer only for the transactions that weren't previously broadcast.

This gives close to 50% bandwidth savings for free.

I'm writing this to underline the fact that whenver you read something written by Mike Hearn you have to double check and verify for yourself the 3 broad possibilities:

1) he was correct;
2) he was incorrect because he didn't understand the question or doesn't understand the underlying tehnology;
3) he was incorrect intentionally to disparage other developers and spread misinformation.

Thankfully for now he still doesn't belong to the core development team.

You forgot #4, technically correct, about something pointless and intentionally ignoring the distinction to give the illusion of superiority. Odd that you'd forget it, since your post was a perfect example.

Your memory pool won't help you during the initial download, which is the time when people care about traffic. No one gives a shit about 8 gigs over 4 years. Everyone cares about 8 gigs today. Using a piecewise downloading system will save some traffic* for people that don't give a shit about traffic, and save not even a single byte for people that do.

* "close to 50%" savings needs a lot of assumptions to be true at the same time, when some of them are never true at all. A partial list: bitcoin traffic is evenly divided between block bodies and transactions, the list of transactions in a block can be transmitted for free, every node knows about every transaction.

2112

legendary

Activity: 2128

Merit: 1074

Quote from: jgarzik on May 14, 2013, 06:56:55 PM

heh, nothing is free. This proposal would add additional round-trips with associated latency, slowing block validation and block propagation.

As such, miners could lose their 25 BTC due to orphaning, if their block is slowed.

Actually you are wrong. The average block propagation latency would improve, both in case of honest miners and secretive miners.

Firstly, let me explain honest/secretive miner distinction. Honest miner broadcasts all the transactions that he's going to put in the block ahead of time. Then when he wins the block he sends all the transactions once again just to comply with the classic/inefficient block propagation protocol, wasting up to 50% of overall bandwidth.

On the other hand secretive miner tries to play games to disadvantage other miners and raise their orphan rate. He omits broadcasting private transactions separately and when he wins the block he sends those private transactions for the first time. It saves his bandwidth, but disadvantages the competing miners because they see the transactions late and have to spend time to verify them.

So for the network of mostly honest miners there simply isn't any additional round trips and the win is 100% clear. Sending just the block header and the Merkle tree is a no-brainer.

To understand why even the secretive miners case is a net win for the global network is somewhat more complicated. Firstly, the "round-trips" should be singular: only one additional round trip is required to ask for the private transactions of the uncooperative miner.

Secondly, you'll have to understand how the bandwidth is getting sold nowadays. It is very rare to actually linearly limit the bandwith sold to eg. 1ms per bit for 1 Megabit per second. Nearly every modern IP transport uses some sort of statistical limiter/multiplexer. When you buy e.g. 10Mbits per second your effective bandwidth may be 1Gigabit per second for the first, say 4kilobytes and the statistical multiplexer/limiter will throttle your remaining kilobytes to maintain the long-term average. When asked some ISPs will plainly state the setting for e.g. "fair-queue" command in Cisco IOS. Most ISPs will however consider those setting a trade secret; and frequently they change them according to the time-of-day, day-of-the-week or even to quell the packet storms after "network events".

The interested reader can either read up about the above: e.g. it is called PowerBoost with the DOCSIS cable modems. Anyone with two machines, two GPS receivers to NTP-sync the time on them and a copy of Wireshark can verify what I wrote above running their own experiments and as a side effect reverse-engineer the setting used by their ISP.

To quickly summarize the above two paragraphs: the latency to send e.g. first 4kB of 1MB block is much less than 4/1000 of the latency of the whole block.

I understand that implementing the changes suggested by piotr_n isn't trivial. But if implemented they will give at least two benefits to the global Bitcoin network:

1) information theoretic benefit of saving the bandwith, reducing the block propagation latency and thus reducing the orphan rate

2) motivate more miners to be honest about broadcasting transactions. This will probably have further domino-effect of improving overall game-theorethic strength of the Bitcoin network and may allow in the future to detect some attacks which we even haven't thought through.

In summary I wanted to stress that I enjoyed this brief, but honest exchange of technical arguments with you. Thanks.

jgarzik

legendary

Activity: 1596

Merit: 1100

Quote from: 2112 on May 14, 2013, 06:36:44 PM

Quote from: Mike Hearn on May 14, 2013, 06:07:13 PM

You're just confused, sorry. You have to download all data in every block to run a full node. This is fundamental. You can't reduce bandwidth usage by downloading parts of each block from different peers. This might reduce the upload bandwidth on their side, but it doesn't reduce the download bandwidth on your side. If you're talking about hosting blocks on Cloudfare then you're talking about download bandwidth. So your proposed change wouldn't impact anything.

The above is just another example how Mike Hearn spreads misinformation about Bitcoin protocol.

For normal operation of the Bitcoin network the majority of the "block" has already been previously transferred as a separate "transactions".

This is true. The current bitcoind client uses this knowledge in a signature cache, to avoid validating signatures twice (once upon TX reception, once upon block reception).

Quote

The obvious optimization of the bandwidth usage is for the clients to inspect the Merkle tree and ask the peer only for the transactions that weren't previously broadcast.

This gives close to 50% bandwidth savings for free.

heh, nothing is free. This proposal would add additional round-trips with associated latency, slowing block validation and block propagation.

As such, miners could lose their 25 BTC due to orphaning, if their block is slowed.

2112

legendary

Activity: 2128

Merit: 1074

Quote from: Mike Hearn on May 14, 2013, 06:07:13 PM

You're just confused, sorry. You have to download all data in every block to run a full node. This is fundamental. You can't reduce bandwidth usage by downloading parts of each block from different peers. This might reduce the upload bandwidth on their side, but it doesn't reduce the download bandwidth on your side. If you're talking about hosting blocks on Cloudfare then you're talking about download bandwidth. So your proposed change wouldn't impact anything.

The above is just another example how Mike Hearn spreads misinformation about Bitcoin protocol.

For normal operation of the Bitcoin network the majority of the "block" has already been previously transferred as a separate "transactions".

The obvious optimization of the bandwidth usage is for the clients to inspect the Merkle tree and ask the peer only for the transactions that weren't previously broadcast.

This gives close to 50% bandwidth savings for free.

I'm writing this to underline the fact that whenver you read something written by Mike Hearn you have to double check and verify for yourself the 3 broad possibilities:

1) he was correct;
2) he was incorrect because he didn't understand the question or doesn't understand the underlying tehnology;
3) he was incorrect intentionally to disparage other developers and spread misinformation.

Thankfully for now he still doesn't belong to the core development team.

Mike Hearn

legendary

Activity: 1526

Merit: 1134

You're just confused, sorry. You have to download all data in every block to run a full node. This is fundamental. You can't reduce bandwidth usage by downloading parts of each block from different peers. This might reduce the upload bandwidth on their side, but it doesn't reduce the download bandwidth on your side. If you're talking about hosting blocks on Cloudfare then you're talking about download bandwidth. So your proposed change wouldn't impact anything.

piotr_n

legendary

Activity: 2058

Merit: 1416

aka tonikt

Quote from: Peter Todd on May 14, 2013, 02:37:34 PM

Quote from: piotr_n on May 14, 2013, 02:17:19 PM

And the "block propagation" is eating up a hell lot of the poor's bitcoins users bandwidth - it might not be a problem for you, but it is a problem.

If you don't have enough bandwidth to be CPU limited, stop trying to run a node. SPV clients are just fine for any users needs unless you want to run a mining pool or maybe operate a big business. If you really want, go get a VPS server; $20-$100/month should buy a fast enough one, at least for another year or two.

so your advise is: don't run a bitcoin node.
?

because, you know, I would actually like to run a bitcoin node, just to support this fine network, but the current protocol is wasting my bandwidth - and that is my issue.
though, I understand that an unnecessary bandwidth usage is not something that is easy to be noticed, so I am not even surprised that nobody gives a shit about it

Peter Todd

legendary

Activity: 1120

Merit: 1164

Quote from: piotr_n on May 14, 2013, 02:17:19 PM

And the "block propagation" is eating up a hell lot of the poor's bitcoins users bandwidth - it might not be a problem for you, but it is a problem.

If you don't have enough bandwidth to be CPU limited, stop trying to run a node. SPV clients are just fine for any users needs unless you want to run a mining pool or maybe operate a big business. If you really want, go get a VPS server; $20-$100/month should buy a fast enough one, at least for another year or two.

piotr_n

legendary

Activity: 2058

Merit: 1416

aka tonikt

Quote from: Mike Hearn on May 14, 2013, 02:14:36 PM

You're trying to solve a non-existant problem: block propagation is not upload bandwidth limited today so why would anyone add such a protocol feature? That's why I'm confused. You're asking for something that just wouldn't speed anything up.

Well man, if that problem is non-existant to you, then I can only envy you the internet connection that you have at home

But even having such a great connection - if you want to create a software that would be able to import the entire block chain, from a scratch, within a few minutes - then I would rather suggest you looking into developing a hardware that supports elliptic curve math, because IMO that seems to be the weakest link in this process - not the network protocol.

And the "block propagation" is eating up a hell lot of the poor bitcoin users' bandwidth - it might not be a problem for you, but it is a problem.

Mike Hearn

legendary

Activity: 1526

Merit: 1134

You're trying to solve a non-existant problem: block propagation is not upload bandwidth limited today so why would anyone add such a protocol feature? That's why I'm confused. You're asking for something that just wouldn't speed anything up.

piotr_n

legendary

Activity: 2058

Merit: 1416

aka tonikt

Quote from: Mike Hearn on May 13, 2013, 06:08:23 PM

Quote

I still think a simple solution, like "give me this part of this block/transaction", would have a much better chance of success in a short term.

I don't understand your point - that is exactly what Bloom filtering provides. It is deployed and working for SPV clients for some months already. There have been no issues with it. You can't use it as a full node because a full node, by definition, must download full blocks as it must know about all transactions.

Well, you have just said it yourself: this Bloom filtering does not help at all, if you want to have a full node.
And I do want to have a full node - probably as well as most of you guys out there.
So how does it help us?

Quote

Incidentally, if you're going to make sarcastic comments implying Bitcoin hasn't improved, you should actually know what you're talking about. Bloom filtering launched at the start of this year, it's not something that was originally a part of the protocol - so there have been big improvements quite recently.

I think you were the one who did not know what I was talking about.

Improvements are worthless if there is no actual software, which people want to use, that gets advantage of them.

What I meant by "give me this part of this block/transaction", is literally "give me X bytes of block Y, starting at offset Z".
So, when a new block appears in the network and I need to download it, while being connected to a number of peers, I don't ask each one of them for the same megabyte of data - instead I can just split the work into, let's say 32KB chunks, and this way fetch the entire new block from my peers much quicker.

But that would only be useful before making the protocol to support fetching blocks from HTTP servers - which is the ultimate solution, which IMO should be implemented ASAP, if you guys really care about all these small bitcoin users and their internet connections. The mining pools should help here, because it is in their very best interest to propagate the blocks they have mined as quickly as possible across the network - and what could be quicker than a static file, served via http, from the pool's domain, through a clodflare-like infrastructure?

Quote

For distributing the block chain you can as well use Bittorrent or some other large file distribution mechanism rather than HTTP serving, it's already possible and there are already torrents distributing the chain in this way. They aren't designed for end users because end users should eventually all end up on SPV wallets which already only download partial blocks.

As I said before: I would prefer to focus on improving a behavior of a node that already is synchronized, rather than focusing on making it faster to setup a new one from a scratch.
Initial chain download, and the fact that it takes so long, is inconvenient, but it is not really such a big issue for the actual network.
Besides when you setup a node from scratch, and so need to re-parse the 236+k blocks, the network communication does not seem to me as much of an issue, as all the hashing and elliptic math that your PC needs to go through.

There is another thing that came to my mind, so I will just add it here, to this post.
I believe nodes should not relay transactions that use any inputs which exist only in the memory pool. It should only relay transactions that use inputs from an actually mined blocks.
This IMHO would improve the network's traffic pretty much. A regular user never needs (usually is not never able) to spend an output that has not been mined yet - while, at the other hand, relaying of such transactions takes a huge part of his network connection.

grau

hero member

Activity: 836

Merit: 1030

bits of proof

Yes, Bloom filtering is a significant improvement to the core protocol.

In addition to serving SPV clients it is used to optimize the BOP server's communication to lightweight clients connected to its message bus.

The BOP message bus also offers an API to get blocks.

Mike Hearn

legendary

Activity: 1526

Merit: 1134

Quote

I still think a simple solution, like "give me this part of this block/transaction", would have a much better chance of success in a short term.

I don't understand your point - that is exactly what Bloom filtering provides. It is deployed and working for SPV clients for some months already. There have been no issues with it. You can't use it as a full node because a full node, by definition, must download full blocks as it must know about all transactions.

Incidentally, if you're going to make sarcastic comments implying Bitcoin hasn't improved, you should actually know what you're talking about. Bloom filtering launched at the start of this year, it's not something that was originally a part of the protocol - so there have been big improvements quite recently.

For distributing the block chain you can as well use Bittorrent or some other large file distribution mechanism rather than HTTP serving, it's already possible and there are already torrents distributing the chain in this way. They aren't designed for end users because end users should eventually all end up on SPV wallets which already only download partial blocks.

piotr_n

legendary

Activity: 2058

Merit: 1416

aka tonikt

Maybe we should not be focusing so much on the initial blockchain download, but rather on limiting the bandwidth usage of a completely synchronized node.
As for relaying transactions, I would even go crazy enough to implement a web of trust - where you don't verify every transaction, but only random ones and then build a trust to the node that sends them to you - then you check them randomly, but less frequent.

But also for transactions - the data should be kept on WWW servers. There is no economic reason to fetch them from China

piotr_n

legendary

Activity: 2058

Merit: 1416

aka tonikt

I agree.
But in reality, the logic of what to fetch is only important during the initial chain download.
Later you just fetch whatever new there is...

So it is not really so important, is it? Wink

Pieter Wuille

legendary

Activity: 1072

Merit: 1189

Quote from: piotr_n on May 13, 2013, 03:29:50 PM

Quote

What protocol is used to actually fetch blocks is pretty much orthogonal to the logic of deciding what to fetch, and how to validate it, IMHO.

I disagree. If you are a node behind DSL, and you have a very limited upload bandwidth, you do not want to serve blocks, unless it is really necessary.
There are servers out there, connected to the fastest networks in the world - these you should use, as much as you can. Who is going to stop you?

I agree completely. But it still has nothing to do with your logic of deciding what to fetch and how to validate it. It's just using a different protocol to do it.

piotr_n

legendary

Activity: 2058

Merit: 1416

aka tonikt

Quote from: Pieter Wuille on May 13, 2013, 03:22:30 PM

Using Bloom filtering may not be entirely viable yet, I'll have to check. The big changes is first downloading and validating headers, and then downloading and validating the blocks itself. IMHO, it's the only way to have a sync mechanism that is both fast, stable and understandable (I have no doubt that there are other emchanisms that share two of those three properties...).

I still think a simple solution, like "give me this part of this block/transaction", would have a much better chance of success in a short term.
And I also think that it would be nice to have something in a short term

Quote

What protocol is used to actually fetch blocks is pretty much orthogonal to the logic of deciding what to fetch, and how to validate it, IMHO.

I disagree. If you are a node behind DSL, and you have a very limited upload bandwidth, you do not want to serve blocks (and maybe even transactions), unless it is really necessary.
There are servers out there, connected to the fastest networks in the world - these you should use, as much as you can. Who is going to stop you?

Pieter Wuille

legendary

Activity: 1072

Merit: 1189

Quote from: piotr_n on May 13, 2013, 03:13:27 PM

Why to ask 500 blocks back?

It doesn't, as far as I know. It asks for "up 500 blocks starting at hash X", where X is the last known block.

Quote

There is one strategy however that's pretty much accepted as the way to go, but of course someone still has to implement it, test it, ... and it's a pretty large change. The basic idea is that downloading happens in stages as well, where first only headers are fetched (using getheaders) in a process similar to how getblocks is done now, only much faster of course. However, instead of immediately fetching blocks, wait until a long chain of headers is available and verified. Then you can start fetching individual blocks from individual peers, assemble them, and validate as they are connected to the chain. The advantage is that you already know which chain to fetch blocks from, and don't need to infer that from what others tell you.

I saw getheaders and I was thinking about using it.
Now I think if you really want to combine the data you got from getheaders, with the parts of blocks acquired from you peers after they have implemented BIP37 (otherwise it won't be much faster) - then good luck with that project, man! Wink

Using Bloom filtering may not be entirely viable yet, I'll have to check. The big changes is first downloading and validating headers, and then downloading and validating the blocks itself. IMHO, it's the only way to have a sync mechanism that is both fast, stable and understandable (I have no doubt that there are other emchanisms that share two of those three properties...).

Quote

I mean, I would rather prefer baby steps - even extreme, like having a central sever from which you can fetch a block, by its hash. I mean: how expensive would be that? But how much bandwidth would it save for these poor people..

What protocol is used to actually fetch blocks is pretty much orthogonal to the logic of deciding what to fetch, and how to validate it, IMHO.

piotr_n

legendary

Activity: 2058

Merit: 1416

aka tonikt

Quote from: Pieter Wuille on May 13, 2013, 02:49:39 PM

Not sure what you mean by "as deep as possible". We always send getdata starting at whatever block we already know. The reason for starting from early blocks and moving forward is because validation is done is stages, and at each point as much as possible is already validated (as a means to prevent DoS attacks, mostly). As most checks can only be done when you have the entire chain of blocks from genesis to the one being verified, you need them more or less in order.

What I mean is when the client has already downloaded the full chain - and is basically waiting for a new block.
Why to ask 500 blocks back?

Quote

That's not true, we only ask for each block once (and retry after a timeout), but it is done to a single peer (not to all, and not balanced across nodes). That's a known badness, but changing isn't trivial, because of how validation is done.

OK - than I'm sorry.
It only proves how little I know about the bitcoin client, so I should not be changing it

Quote

There is one strategy however that's pretty much accepted as the way to go, but of course someone still has to implement it, test it, ... and it's a pretty large change. The basic idea is that downloading happens in stages as well, where first only headers are fetched (using getheaders) in a process similar to how getblocks is done now, only much faster of course. However, instead of immediately fetching blocks, wait until a long chain of headers is available and verified. Then you can start fetching individual blocks from individual peers, assemble them, and validate as they are connected to the chain. The advantage is that you already know which chain to fetch blocks from, and don't need to infer that from what others tell you.

I saw getheaders and I was thinking about using it. But it would basically only help for the initial chain download.
Now I think if you really want to combine the data you got from getheaders, with the parts of blocks acquired from you peers after they have implemented BIP37 (otherwise it won't be much faster) - then good luck with that project, man! Wink

I mean, I would rather prefer baby steps - even extreme, like having a central sever from which you can fetch a block, by its hash. I mean: how expensive would be that? But how much bandwidth would it save for these poor people..

Quote

BIP37 actually introduced a way to fetch parts of blocks, and it can be used to fetch a block with just the transactions you haven't heard about, so it avoids resending those that have already been transferred as separate transactions (though I don't know of any software that uses this mechanism of block fetching yet; once BIP37 is available on more nodes, I expect it will be).

Interesting.. thank you, then maybe that should be the next thing that I will add to my client.

Quote

There are many ideas about how to improve historic block download. I've been arguing for a separation between archive storage and fresh block relaying, so nodes could be fully verifying active nodes on the network without being required to provide any ancient block to anyone who asks. Regarding moving to other protocols, there is the bootstrap.dat torrent, and there's recently been talk about other mechanism on the bitcoin-development mailinglist.

I was talking more about single blocks being available via HTTP - at the very moment when they have been mined.
I think all you need is an URL - so it should be up to a peer to choose which URL to give you. As long as the hash of the data you download from there matches what you needed, you have no reason to question his method. Otherwise, just ban the bastard Wink

Topic: A bit of criticism on how the bitcoin client does it - page 2. (Read 2815 times)