A bit of criticism on how the bitcoin client does it

piotr_n

legendary

Activity: 2058

Merit: 1416

aka tonikt

Oh, I just figured out what I was doing wrong handling these "getblocks" requests with 29 different locator hashes up to the genesis block.

Originally I had thought that the only limit, for building the "inv" reply, should be either reaching the hash_stop or the max 500 records.
But now I figured that I should have ignored the remaining locators after matching on a first one from the list - which almost always points to the end of my tree, so in most cases the reply should be just empty, and not 500 hashes long.
At least that's how satoshi does it..
So after fixing this in my code, the upload bandwidth usage looks indeed much better - and the whole idea of 29 locator hashes makes much more sense, as well.

Anyway, sorry - my bad.
Mike's node might be indeed in a data center Wink

Mike Hearn

legendary

Activity: 1526

Merit: 1134

What I meant is, if you believe it's a bug and not intended behaviour, then feel free to investigate and figure out the cause. It's expected that remote nodes will sometimes send you block locators when new blocks are mined (if it's an orphan). I didn't see any evidence that isn't the cause here.

As we've already pointed out, it doesn't seem like many people are complaining about bandwidth usage at the moment. That's why it's not been worked on. If they were, then the best improvement available right now is full-match Bloom filters. Perhaps optimising addr/inv broadcasts might improve things too, it's a bit hard to know exactly where all the bandwidth goes right now as the software lacks instrumentation.

piotr_n

legendary

Activity: 2058

Merit: 1416

aka tonikt

Quote from: Mike Hearn on May 16, 2013, 02:24:43 PM

They were probably on a fork or had missed a block. I don't know why else it would happen - there might be a bug, if you spot such a thing let us know.

You mean: let you know, even more than I have been trying to let you know so far?
Sorry, no - I think pushing it even more would be just rude and completely unproductive.

Mike Hearn

legendary

Activity: 1526

Merit: 1134

They were probably on a fork or had missed a block. I don't know why else it would happen - there might be a bug, if you spot such a thing let us know.

piotr_n

legendary

Activity: 2058

Merit: 1416

aka tonikt

Quote from: Mike Hearn on May 16, 2013, 11:57:05 AM

You don't send a full locator every time a new block is mined. Nodes are supposed to send a getdata and only do a getblocks if what they get back is an orphan.

Of course I don't.
But I am saying that this is what I get from a bunch of nodes running the official client, often after a new block has been mined.
I have no idea why they do it, but feel free to explain it to me..

Quote from: Zeilap on May 16, 2013, 11:57:36 AM

Quote from: piotr_n on May 16, 2013, 11:05:58 AM

Again: this wastes a hell lot of bandwidth, generating lots of traffic in the network each time a new block is mined.

No. That's not how it works.

Are you saying that I am sending all these screwed up "getblocks" to myself, from all the different IPs - or are you saying that I am lying about what I see?
Or maybe I'm just crazy and what I see is not real...

Quote

If you do this after the last checkpoint, you're in for a surprise. Good luck.

Thanks, but I don't use any checkpoints, so I don't think I'm going to need any luck here.

It's very simple:
1) measure a time difference between now and when you received your last block - divide it by whatever period you like, to get a number (I use minutes)
2) then go back the chain, starting from the head, as many blocks up, as the number from point 1 was
3) pass the hash of the block you have reached in point 2 to "getblocks" - and voila.
Using such a simple way, you can recover from basically any fork, as long as it isn't longer than a couple of hundreds blocks. And if a fork would happen o be longer, I will surely have enough time to develop an improvement before it's actually needed

Zeilap

full member

Activity: 154

Merit: 100

Quote from: piotr_n on May 16, 2013, 11:05:58 AM

Again: this wastes a hell lot of bandwidth, generating lots of traffic in the network each time a new block is mined.

No. That's not how it works.

Quote from: piotr_n on May 16, 2013, 11:05:58 AM

I believe that I understand the purpose of the locators very well - and that is why my implementation always sends only one locator

If you do this after the last checkpoint, you're in for a surprise. Good luck.

Mike Hearn

legendary

Activity: 1526

Merit: 1134

You don't send a full locator every time a new block is mined. Nodes are supposed to send a getdata and only do a getblocks if what they get back is an orphan.

No, the increase in bandwidth wouldn't faze me. 10GB per day would be 300GB per month and I get 500 on the cheapest plan available via BitVPS, which costs hardly anything. But regardless, in future I anticipate lower bandwidth usage because nodes will become more stable. Right now a lot of regular users end up installing Bitcoin-Qt and sucking down the chain without really intending to run a full node, because they don't know about SPV clients. After we get a new MultiBit release out, I'm hoping we can point people towards that instead and the amount of chain that is served to new users will go down.

piotr_n

legendary

Activity: 2058

Merit: 1416

aka tonikt

Quote from: Zeilap on May 16, 2013, 11:01:33 AM

Quote from: piotr_n on May 16, 2013, 06:13:23 AM

Actually, if you look at this article it even clearly advises:

Quote

To create the block locator hashes, keep pushing hashes until you go back to the genesis block. After pushing 10 hashes back, the step backwards doubles every loop

Yeah, you are only at block 236k+, so just keep pushing all the hashes, starting from the top, until you reach the genesis block - a brilliant idea Wink

What's wrong with this? It sounds to me like you don't understand the purpose of the locators.

Again: this wastes a hell lot of bandwidth, generating lots of traffic in the network each time a new block is mined.

I believe that I understand the purpose of the locators very well - and that is why my implementation always sends only one locator... which surely does not point to a genesis block, while I am already at #236475. The only reason to send a locator pointing to a genesis block, in such a case, would be to "recover" from 4+ years long lasting fork... so whoever does that, I think he doesn't understand the purpose of the locators.

Zeilap

full member

Activity: 154

Merit: 100

Quote from: piotr_n on May 16, 2013, 06:13:23 AM

Actually, if you look at this article it even clearly advises:

Quote

To create the block locator hashes, keep pushing hashes until you go back to the genesis block. After pushing 10 hashes back, the step backwards doubles every loop

Yeah, you are only at block 236k+, so just keep pushing all the hashes, starting from the top, until you reach the genesis block - a brilliant idea Wink

What's wrong with this? It sounds to me like you don't understand the purpose of the locators.

piotr_n

legendary

Activity: 2058

Merit: 1416

aka tonikt

Quote from: Pieter Wuille on May 13, 2013, 03:22:30 PM

Quote from: piotr_n on May 13, 2013, 03:13:27 PM

Why to ask 500 blocks back?

It doesn't, as far as I know. It asks for "up 500 blocks starting at hash X", where X is the last known block.

I have just checked it again, putting a debug into my getblocks handler.
Each time my node receives "getblocks", there are like tens of different locator hashes, followed by a zero-filled stop.
This (from what I see) always forces my node to return the maximum of 500 blocks.
For instance:

Code:

getblocks hashes_inside=29 => returned 500 invs / 18003 bytes

and these are the locator hashes that were inside (note the genesis hash at the end):

Code:

00000000000000b1ffa04c08a1372309ee0f38d99a6ea7ac3d98c6bc76747b4d
00000000000000ebddd1d499c21ffe41a99c2fbefe809aecc1e31b7f5eb2a39d
000000000000003372bd8cb3d9cf78b86a0f64b2c30f8c0ce1f4e46f983d091c
000000000000017c700b2179c55343830a5ff07041d2155654c8ba77ef00677f
00000000000001365066267c0084347b180df7070ed112d2cac316c6565ccd2b
0000000000000013b9efa409b459baf11b4ccb43a60f853bdd8d25fb9b867ec3
00000000000000b8e0d6f2f5437cb2e03e973eac6653b1e2ee7c4bfd822ae1f5
000000000000002031feb915bfb85bfd138a0c9abaa53d3d577e022ca01ddacf
00000000000000a15f169467fbbab6bcf9ce7359fce414215e353b31aa7f17bc
00000000000000b705393b66eb3adf92e69cbbb04753357955b5b8dbaf9d273e
0000000000000127933cc0ad1e6a23a852bd391daa843e457ab40d5c226fed38
000000000000009d39a568e0bf5984fe5f3e9882813b686651b691bdfa07ea39
0000000000000142ad32a203b1627bee8126fa4bcd940b0da3f32bf1b5b07a24
000000000000001aef6e9c1ff8e5833c5f38bb1b660d3a3060169f10ba1a293f
00000000000000f5710fe34e22a0791afe7a817044f9c7da7a139295da995b06
0000000000000168cb9711469ccfa6704398c82fca5dd42bb700f11970946be1
00000000000000d9955f9fe2161524121f0e21f3e2b1a57f97578c98dc31d636
000000000000011f76c57b0f941acf03da10cce12cb989ed5c4b7b6722b89cc4
00000000000000d8c733f7bc4295a8af6e0b95f2bafc901076444505572353c1
000000000000009be2ca7e5f878c2235ac7b2ebf653d77c9d4d57b359e071ca2
000000000000014158e599c40a2e3fef433cf70cc1caedd9099c21028e4d97f7
00000000000000f8cc33dbc8c3c37f88598f33373fbb03b96a1eaf6426da7898
0000000000000008c3e7b684e2aa56bebba3beb267a8d932fb84217becc0406b
0000000000000057c46a1306febe518cdb6c1fc882d856bea07e5927a5f15dc3
0000000000000381683c0b95b17233f8683fb89060d6e4eb0d70f96bd51a539f
0000000000000201892dd6ed76236eb8f42e93bf0dad2606cc5aaa246f253bf1
00000000000003d8e22ddc9a6b3b2226dfe57830d793be012720e74db28bc6be
000000000001dbc957fbe83e9d38aed40c9e083b830c36890d538726b24ae1d0
000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f

Actually, if you look at this article it even clearly advises:

Quote

To create the block locator hashes, keep pushing hashes until you go back to the genesis block. After pushing 10 hashes back, the step backwards doubles every loop

Yeah, you are only at block 236k+, so just keep pushing all the hashes, starting from the top, until you reach the genesis block - a brilliant idea Wink

piotr_n

legendary

Activity: 2058

Merit: 1416

aka tonikt

Quote from: Mike Hearn on May 15, 2013, 07:39:17 PM

May I ask if this is a real problem for you today or just theoretical? The node I run uploaded around 2.7 gigabytes yesterday, spread out over the full 24 hours. I certainly wouldn't want to run this off my mobile connection but it's not a bank breaker for a node run in a datacenter.

The problem is real - trust me. Not everyone lives in a datacenter, some people just have homes, you know.

As for my measurements 2.7 GB per day seems fairly low (its 32KB/s in average), so I tend to disbelieve that you actually run your node from a datacenter.
Unless you only have 8 outgoing connections, in which case that number would make more sense to me.

But anyway: lets say that it is 2.7GB per day now - while the blocks are still far below the 1MB limit. But they will obviously be growing, like they have in the past. So expect 10 GB /day pretty soon. Still makes no impression?

Mike Hearn

legendary

Activity: 1526

Merit: 1134

What I wrote was correct - you have to download all data in every block. Yes, you don't have to download it twice in the steady state if you set a full match Bloom filter (which Matt short circuited already), but as kjj notes that isn't solving a problem that's important today so it was never finished. It might be useful to optimise block propagation amongst miners, but if you aren't mining it wouldn't have much impact.

I still don't understand what piotr_n is trying to do. Downloading block contents in parallel only helps if remote peers are upload constrained and you are network constrained. That isn't the case for any real nodes today, so why are you trying to optimise it? It won't reduce bandwidth usage one bit.

Edit: OK, I reread the thread and now I see what Piotr is getting at. You want to minimize upload bandwidth for yourself running a full node, not minimize download bandwidth. The references to requesting blocks from different peers made me think you wanted to optimise download. May I ask if this is a real problem for you today or just theoretical? The node I run uploaded around 2.7 gigabytes yesterday, spread out over the full 24 hours. I certainly wouldn't want to run this off my mobile connection but it's not a bank breaker for a node run in a datacenter. Given that there are 144 blocks per day and each one yesterday was less than half a megabyte, even downloaded twice for each node that's downloading from me that's only 144 megabytes of transaction data. If we suppose most bandwidth usage is tx data then perhaps I distributed data to 19 nodes for the day - not bad, I could have supported much more even on a budget VPS.

The problem is just sharding block download doesn't change the aggregate steady state bandwidth usage for the network. Finishing the full match Bloom filter work would, but Piotr already said he isn't going to do anything with bitcoind, just on his own Go node. Well, as most nodes today are Satoshi nodes and I doubt that will change, just implementing full-match filters in the Go node won't change upload bandwidth because remote peers will still request full blocks with redundant tx data in them. To reduce upload bandwidth you have to optimise the other end.

piotr_n

legendary

Activity: 2058

Merit: 1416

aka tonikt

Quote from: Pieter Wuille on May 15, 2013, 07:23:41 AM

you can't just fetch individual transactions from it, as that would require the peer to have full transaction index.

Exactly - and not that I did not know that... it just somehow slipped my mind

piotr_n

legendary

Activity: 2058

Merit: 1416

aka tonikt

Quote from: Pieter Wuille on May 15, 2013, 07:19:26 AM

Quote from: piotr_n on May 15, 2013, 07:17:29 AM

But what I wanted to achieve was downloading a block's payload in fragments (from different peers in parallel), using the current protocol; "merkleblock" followed by a bunch of "getdata 1"
And for a moment (or rather for the entire morning) I though that it would be possible...

I explained in #28 why that is not possible: even with disjunct filters, you'll get transactions matched by both.

I was assuming that all I needed would be a list of hashes returned by "merkleblock" - because this command seems to be returning all the transaction hashes for a requested block, without any filtering.

BTW, the spec on the wiki is different from the actual format of this message.
There is an extra var_length field between "total_transactions" and "hashes", carrying the same value as "total_transactions".

Pieter Wuille

legendary

Activity: 1072

Merit: 1189

Oh, I think I see your confusion.

The only way to request blocks is using getdata block or merkleblock, you can't just fetch individual transactions from it, as that would require the peer to have full transaction index. So what you hoped to do was send a merkleblock request to one peer, but without transactions, and then fetch the transactions themself from separate peers. That won't work.

Anyway, as said, in the future this may become a useful idea for an extension to the filtering protocol: adding a "only match the 3rd 1/5 of all transactions" to the filter specification.

Pieter Wuille

legendary

Activity: 1072

Merit: 1189

Quote from: piotr_n on May 15, 2013, 07:17:29 AM

But what I wanted to achieve was downloading a block's payload in fragments (from different peers in parallel), using the current protocol; "merkleblock" followed by a bunch of "getdata 1"
And for a moment (or rather for the entire morning) I though that it would be possible...

I explained in #28 why that is not possible: even with disjunct filters, you'll get transactions matched by both.

piotr_n

legendary

Activity: 2058

Merit: 1416

aka tonikt

Quote from: Pieter Wuille on May 15, 2013, 07:13:50 AM

Quote from: piotr_n on May 15, 2013, 06:27:55 AM

But if I make a long bloom filter, then it should never match and so I should be only getting "merkleblock" not followed by any "tx"...
Haven't given up yet

If you don't want any transactions at all, just use getheaders.

Yes, this I know, thanks.
But what I wanted to achieve was downloading a block's payload in fragments (from different peers in parallel), using the current protocol; "merkleblock" followed by a bunch of "getdata 1"
And for a moment (or rather for the entire morning) I though that it would be possible...

Pieter Wuille

legendary

Activity: 1072

Merit: 1189

Quote from: piotr_n on May 15, 2013, 06:27:55 AM

But if I make a long bloom filter, then it should never match and so I should be only getting "merkleblock" not followed by any "tx"...
Haven't given up yet

If you don't want any transactions at all, just use getheaders.

piotr_n

legendary

Activity: 2058

Merit: 1416

aka tonikt

Oh, I'm stupid. I forgot that "tx" only work for a not yet mined transactions, so how did I want to acquire a block's payload with them?

Now I have given up

piotr_n

legendary

Activity: 2058

Merit: 1416

aka tonikt

I see what you mean now.
I can trick the node into sending me "merkleblock" messages, but they are always followed by a bunch of "tx" messages, which completely ruins my concept of distributing requests for these "tx" among all the connected peers.

But if I make a long bloom filter, then it should never match and so I should be only getting "merkleblock" not followed by any "tx"...
Haven't given up yet

Topic: A bit of criticism on how the bitcoin client does it (Read 2815 times)