Bitcoin Binary Data Protocol, for mining, monitorblocks, etc. - page 2.

jgarzik

legendary

Activity: 1596

Merit: 1100

Protocol buffers for the P2P networking protocol have already been discussed. Not really realistic now, but "it would have been nice."

For the purposes of this thread, protocol buffers have three disadvantages:

Duplicates data structure layout of a C/C++ data structure. You wind up encoding C/C++ data structure -> another C/C++ data structure. Python hides this from the programmer, but it is the same net result.
Less flexible than JSON in describing dynamic, multi-dimensional data structures, because the structure of every data field is compiled into the application. Simple JSON is much more flexible, and data structures may be changed at will.
Like current JSON, requires a native -> encoded -> native encoding for each miner client

Basically, any packetized protocol -- let us say JSON-over-TCP (rather than JSON-over-HTTP) -- requires some amount of low-level raw binary parsing to locate message boundaries. To simply pass through data unchanged, rather than decoding compressed JSON, comes for free in such protocols.

Therefore, logically, any JSON-over-TCP protocol already has the ability to skip the pointless native->encoded->native step that each miner client performs, by using this pass-through method. Passing raw binary work directly from bitcoind to the client eliminates the possibility of bugs and problems related to the unnecessary encoding+decoding an additional JSON step would require.

adv

full member

Activity: 168

Merit: 100

Quote from: slush on February 17, 2011, 12:10:43 PM

From this summary, I prefer solutions in this order: 1. Protocol buffers 2. JSON 3. Binary protocol

We talk only about mining. The only significant thing for him - speed. No need of readability or understandability. You can use a heavy program optimization, go-to operator, or even assembler. : ^)
So please forget about JSON-RPC. If he so much needed to someone - it is always possible to use a standard getwork-interface over http. And the creation of a push-protocol over JSON -- a completely separate problem with its advantages and disadvantages. I think the choice is between only protocol buffers and pure binary protocol.

slush

legendary

Activity: 1386

Merit: 1097

Quote from: m0mchil on February 17, 2011, 08:46:31 AM

Or perhaps we should use something like protocol buffers, even for core bitcoin messages?

+1 for use of standardized binary protocol everywhere.

Short summary:

JSON-RPC (over TCP):
+ Already used in bitcoin
+ Standard protocol
+ Extremely easy to use
- Missing support for binary data
- General overhead (repeating of data field names)

Protocol buffers:
+ Easy to use
+ Standard protocol
+ Support for binary data
+ Bandwidth effective
+ May replace current binary protocol on P2P side
+ No external dependency needed (protocol compiler produces plain C++ code)

Proprietary binary protocol:
+ Extremely bandwidth effective
+ No external dependency needed
+ Support for binary data (noticed for integrity of this summary)
- Hard to learn & use (everybody have to implement support in his preferred language)
- Not reusable (P2P and push interface needs two different protocols)

From this summary, I prefer solutions in this order: 1. Protocol buffers 2. JSON 3. Binary protocol

Cdecker

hero member

Activity: 489

Merit: 505

Quote from: m0mchil on February 17, 2011, 08:46:31 AM

Or perhaps we should use something like protocol buffers, even for core bitcoin messages?

You mean the P2P protocol? I'd already be happy using JSON, but it won't work because the whole block hashing and transaction hashing relies on the binary representation of the message itself...

m0mchil

full member

Activity: 171

Merit: 127

Or perhaps we should use something like protocol buffers, even for core bitcoin messages?

Cdecker

hero member

Activity: 489

Merit: 505

I have to say that I agree with slush, after implementing the Bitcoin protocol in Java I hate binary protocols.

Encoding it in JSON would allow easier implementations since most languages have some form of understanding JSON. Compression is not needed but we might simply add another port that compresses all that goes through it (think OutputStream Chaining in Java). The size of the messages will double, yes, but as far as I'm concerned that's the only downside to using JSON.

jgarzik

legendary

Activity: 1596

Merit: 1100

Quote from: slush on February 16, 2011, 02:33:23 PM

Quote from: jgarzik on February 16, 2011, 02:16:51 PM

You are being too literal. Even python must do this step: work = json_result['data']

OK, I didn't told that the pointer lookup is not needed/performed inside, automatically. It is just done by standard libraries presented in every language and there is no need to implement binary stuff again.

There is no difference between this and struct.unpack() in python, except that the JSON decode is more work, more data, and less efficient.

Quote

Doing all the extra, pointless work of binary->text->compression->text->binary also increases the chances for programmer error.

Well, we are talking about personal opinions. My opinion is that high level programming is much easier and error prone than low level implementation. Thanks to this attitude, we're programming in high level languages and not in assembler.

And you still does not give the calculation of bandwidth savings against JSON RPC over TCP.

The precise packet sizes were given in first post. WORK message can easily be made smaller, too.

Quote

Once you have a binary, packetized protocol, the easiest

Correct. But I hope you don't want to say that

import json
json.decode(sock.read())

is harder to do than creating own parsing library for every language (C, Python, Java, PHP?) and unpacking binary data, right?

I don't want to be personal in any way, I think it's great that you open this topic. I'm just finding some equilibrium between hardcore lowlevel stuff and almost standard protocol implemented anywhere. I'm simply not convicted that this protocol need such heavy overoptimization.

If you are packetizing data for JSON-over-TCP, you are unpacking binary data to obtain message size of message sent via TCP. Furthermore, inside each work unit, existing miner clients are unpacking binary data in order to perform hashing (miners hash binary data!). High-level python language happily unpacks binary data -- we have miners written in Python and Java today. Or see ArtForz's bitcoin client: http://pastebin.com/ZSM7iHZw

"don't do additional, redundant, unneeded work" is not heavy optimization. It is reducing complexity in the miner client.

slush

legendary

Activity: 1386

Merit: 1097

Quote from: jgarzik on February 16, 2011, 02:20:49 PM

The protocol supports multiple use cases:

getwork polling (ie. how every single miner is written today). C:GETWORK S:WORK C:GETWORK S:WORK ...
push mining C:CONFIG(push mining) S:WORK S:WORK S:WORK S:WORK ...
monitorblocks C:CONFIG(monitor blocks) S:BLOCK S:BLOCK S:BLOCK ...

The protocol supports LAN or WAN, bitcoind or pool server.

If the miner client prefers polling over push mining, they may choose to do so.

Which is not against JSON over TCP. All of this you can perform also with my proposal. I just mentioned that the biggest saving (for mining) will be the step from current getwork() over HTTP to the TCP communication and push mining together, but not in saving single bytes in binary protocol.

adv

full member

Activity: 168

Merit: 100

Quote from: bitcoinex on February 16, 2011, 08:00:30 AM

What about increase time interval between requests? And ban for who will be use the frequent requests.

What about many NATed users?
I think you say "they must use different username/password". But they dont MUST do this and may write miner, that use different accounts for frequently requests...

slush

legendary

Activity: 1386

Merit: 1097

Quote from: jgarzik on February 16, 2011, 02:16:51 PM

You are being too literal. Even python must do this step: work = json_result['data']

OK, I didn't told that the pointer lookup is not needed/performed inside, automatically. It is just done by standard libraries presented in every language and there is no need to implement binary stuff again.

Quote

Doing all the extra, pointless work of binary->text->compression->text->binary also increases the chances for programmer error.

Well, we are talking about personal opinions. My opinion is that high level programming is much easier and error prone than low level implementation. Thanks to this attitude, we're programming in high level languages and not in assembler.

And you still does not give the calculation of bandwidth savings against JSON RPC over TCP.

Quote

Once you have a binary, packetized protocol, the easiest

Correct. But I hope you don't want to say that

import json
json.decode(sock.read())

is harder to do than creating own parsing library for every language (C, Python, Java, PHP?) and unpacking binary data, right?

I don't want to be personal in any way, I think it's great that you open this topic. I'm just finding some equilibrium between hardcore lowlevel stuff and almost standard protocol implemented anywhere. I'm simply not convicted that this protocol need such heavy overoptimization.

jgarzik

legendary

Activity: 1596

Merit: 1100

Quote from: slush on February 16, 2011, 02:05:24 PM

Quote from: slush on February 16, 2011, 01:52:59 PM

One request: (approx) 20 bytes of request, 300? bytes of response EVERY MINUTE ==> 320 bytes per minute per worker.

Well, I know that I'm again mixing protocol and getwork implementation. But there is no big point in supporting getwork over tcp and still sending job every 5 seconds. So I'm talking about real situation, about using TCP protocol and real pushwork implementation at once.

The protocol supports multiple use cases:

getwork polling (ie. how every single miner is written today). C:GETWORK S:WORK C:GETWORK S:WORK ...
push mining C:CONFIG(push mining) S:WORK S:WORK S:WORK S:WORK ...
monitorblocks C:CONFIG(monitor blocks) S:BLOCK S:BLOCK S:BLOCK ...

The protocol supports LAN or WAN, bitcoind or pool server.

If the miner client prefers polling over push mining, they may choose to do so.

jgarzik

legendary

Activity: 1596

Merit: 1100

Quote from: slush on February 16, 2011, 01:52:59 PM

Quote from: jgarzik on February 16, 2011, 01:31:33 PM

Because sending work as compressed JSON involves

encoding binary data to hexidecimal

Absolutely marginal overhead. Average CPU core can encode megabytes of data to hexadecimal per second.

Quote

storing that hexidecimal string in JSON structure

So 2x more data (two bytes for one raw byte) for payload itself. For one mining job, only few bytes is really required, most of current data sending to client are not used (source: m0mchil). Much more effective way is to change job payload itself.

Quote

compressing JSON

Which is also in your proposal, for storing message payload. Again, I don't see real trouble here.

Quote

receiving pointer to hex string

That's why I told I'm probably too highlevel. I really don't care about finding pointer in hex string. It is much more cost effective to leave this job on computers and high level libraries than fiddling with bits on low level protocol. Don't forget that this protocol have to be reimplemented in many languages, so using standard protocol, you save tens of hours of labour for programming and bug fixing.

You are being too literal. Even python must do this step: work = json_result['data']

Quote

It is obviously more simple -- less CPU usage and less bandwidth usage -- to send binary work data directly. Remember, binary data is the common case for mining.

You are right that raw binary protocol is really the most effective. But let's find some reasonable level of optimizations. It does not need to be _perfect_. It need to be effective AND easy to handle/debug. Don't forget that you are optimizing nanoseconds of CPU job and then perform one SQL request, which is 100x slower than any protocol parsing.

Doing all the extra, pointless work of binary->text->compression->text->binary also increases the chances for programmer error.

Once you have a binary, packetized protocol, the easiest, least error-prone thing to do is receive (or create, in bitcoind's case) a raw binary packet, and pass that directly to a connected miner.

slush

legendary

Activity: 1386

Merit: 1097

Quote from: slush on February 16, 2011, 01:52:59 PM

One request: (approx) 20 bytes of request, 300? bytes of response EVERY MINUTE ==> 320 bytes per minute per worker.

Well, I know that I'm again mixing protocol and getwork implementation. But there is no big point in supporting getwork over tcp and still sending job every 5 seconds. So I'm talking about real situation, about using TCP protocol and real pushwork implementation at once.

slush

legendary

Activity: 1386

Merit: 1097

Quote from: jgarzik on February 16, 2011, 01:31:33 PM

Because sending work as compressed JSON involves

encoding binary data to hexidecimal

Absolutely marginal overhead. Average CPU core can encode megabytes of data to hexadecimal per second.

Quote

storing that hexidecimal string in JSON structure

So 2x more data (two bytes for one raw byte) for payload itself. For one mining job, only few bytes is really required, most of current data sending to client are not used (source: m0mchil). Much more effective way is to change job payload itself.

Quote

compressing JSON

Which is also in your proposal, for storing message payload. Again, I don't see real trouble here.

Quote

receiving pointer to hex string

That's why I told I'm probably too highlevel. I really don't care about finding pointer in hex string. It is much more cost effective to leave this job on computers and high level libraries than fiddling with bits on low level protocol. Don't forget that this protocol have to be reimplemented in many languages, so using standard protocol, you save tens of hours of labour for programming and bug fixing.

Quote

It is obviously more simple -- less CPU usage and less bandwidth usage -- to send binary work data directly. Remember, binary data is the common case for mining.

You are right that raw binary protocol is really the most effective. But let's find some reasonable level of optimizations. It does not need to be _perfect_. It need to be effective AND easy to handle/debug. Don't forget that you are optimizing nanoseconds of CPU job and then perform one SQL request, which is 100x slower than any protocol parsing.

Rough calculation:

Now:
--------
One request: 300 bytes of HTTP request, 700 bytes of data ==> ~1 kB of data every 5 seconds for each worker. It is 12kB per minute per worker.

Json over TCP:
--------
One request: (approx) 20 bytes of request, 300? bytes of response EVERY MINUTE ==> 320 bytes per minute per worker.

By very simple optimization, you cut bandwidth to 2.5% of original size. Without any binary fiddling and proprietary stuff. How many % will be the savings between Json over TCP and binary over TCP?[/list]

jgarzik

legendary

Activity: 1596

Merit: 1100

Quote from: slush on February 16, 2011, 01:23:13 PM

I just read the sources and mixing binary protocol with json compressed data looks weird for me. Why not simply use (compressed) json RPC over TCP and define only RPC methods?

Because sending work as compressed JSON involves

encoding binary data to hexidecimal
storing that hexidecimal string in JSON structure
compressing JSON
(sent to client)
decompressing JSON
receiving pointer to hex string
decoding hex string to binary data

It is obviously more simple -- less CPU usage and less bandwidth usage -- to send binary work data directly. Remember, binary data is the common case for mining.

JSON is in the protocol for flexible feature negotiation and configuration. But we must to avoid today's binary->text->compressed->text->binary redundant data encoding, because the miners work on binary data.

slush

legendary

Activity: 1386

Merit: 1097

I just read the sources and mixing binary protocol with json compressed data looks weird for me. Why not simply use (compressed) json RPC over TCP and define only RPC methods? This should be way easier to implementation in any language, more standard, readable etc. But it still enable push features and will be more efficient because we get rid of HTTP overhead. Please don't reinvent the wheel.

Maybe I'm too high-level oriented, but encapsulating JSON RPC into proprietary binary protocol is very unusual.

For example, method 'login' should look like {id:'xxx',method:'login',params:['username','sha256 of username+password']}. One command can be finished by new line, or better, almost every language has support for streaming JSON (well, I know Java and Python libraries), because it is very easy to detect that message is complete.

jgarzik

legendary

Activity: 1596

Merit: 1100

Quote from: bitcoinex on February 16, 2011, 08:00:30 AM

What about increase time interval between requests? And ban for who will be use the frequent requests.

This is not a network protocol issue. You can easily add these rules to your pool server or bitcoind, once the protocol is deployed.

burtyb

newbie

Activity: 45

Merit: 0

Quote from: bitcoinex on February 16, 2011, 08:00:30 AM

What about increase time interval between requests? And ban for who will be use the frequent requests.

That might cause problems for those using a high number of CPU cores though?

bitcoinex

sr. member

Activity: 350

Merit: 252

probiwon.com

What about increase time interval between requests? And ban for who will be use the frequent requests.

jgarzik

legendary

Activity: 1596

Merit: 1100

Quote from: m0Ray on February 15, 2011, 05:50:49 PM

1) Neither login request nor solution are not lost in my proposal.

If they cannot be lost, then by definition they must be retransmitted. And you must build logic to determine how often to retransmit. When to stop retransmitting and give up. Reinventing TCP, in other words.

Quote

2) I don't think that loss of WORK message will seriously impact the performance. If I understand the principle, new WORK will be broadcasted with every transaction received by server. And it's often enough.
3) I mean only "push" protocol, which does not use GETWORK.

The loss of a WORK message can mean the loss of money, due to not working on the latest block etc. No miner will stand for this, therefore, WORK must be acknowledged by client, and retransmitted by server. TCP does this for us automatically.

Quote

4) UDP, as far as I know, is the same NAT-friendly as TCP. In both cases NAT box just maps the source port.

UDP has no notion of connections, so a heavily loaded NAT box must rely on timeouts and other hacks, unlike TCP. But in focusing on NAT you ignored "firewall"; TCP far more readily passes through firewalls than UDP. I've seen this at plenty of large corporate sites especially. They'll do a local DNS server, and no UDP traffic will traverse the firewall into the outside world. If you want universality, UDP is not the way to go. TCP is simply more likely to succeed.

Quote

6) Some features of TCP are really overhead when building a low-latency service. So it is sometimes better to reimplement some TCP features than use its full version. For example, in this case we don't need an acknowledge for every message.

Only if you don't mind losing money

Topic: Bitcoin Binary Data Protocol, for mining, monitorblocks, etc. - page 2. (Read 26098 times)