Author

Topic: Proposal for lightweight mining protocol over UDP (Read 2954 times)

hero member
Activity: 558
Merit: 500
I think it's a valid idea and worth doing...

If you know nodejs you can do it on top of https://github.com/bitcoinjs/bitcoinjs-server instead of messing with C++. It will be much faster.

sr. member
Activity: 266
Merit: 251
Those two quotes from your recent posts are mutually contradictory.

Packet loss is ALWAYS handled on miner's side - if no reply received, request is re-tried with some delay,
with exponential backoff.

Bitcoind / pool NEVER retransmits packets on its own.
There no such flaw if you look carefully on protocol.
Server CAN respond with msg_opcode == 12 (work is invalidated)
BEFORE Miner sent his work.

I understand that you are trying to reinvent the whell like ZeroMQ or DCOM; but specialized for Bitcoin mining. Based on my previous discussions here I'm more or less convinced that you are going to either fail or rediscover why 0MQ or DCOM are seemingly too complex and in the process reimplement most of it.

I had a long and productive discussion with Mr. slush here regarding more general protocol, but with the same requirement of two-way asynchronous messaging. I'm not going to reiterate my points here, just give a link:

https://bitcointalksearch.org/topic/stratum-overlay-network-protocol-over-bitcoin-55842

Thank you for your undestanding.

I have used ZeroMQ... It give some influence over frame design as well. And would also recommend to read about it to all who do not know what is that:
http://www.zeromq.org/

And also evolution - Crossroads.IO http://www.crossroads.io/ - where designer of ZeroMQ got in troubles with C++ and stability issues.

General problem - ZeroMQ is based on TCP transport. When I was running trial of work delivery to deepbit (200 Gh/s), instead I got about 195 Gh/s and also amount of work jumped up and down, up and down in the range 165 - 180 Gh/s. And that happened mostly, because when internet 3G reconnect, zeromq has delays on reconnecting underlying sockets. Bitcoind by the way has same problem with peers it connect, that way I get a lot of orphans.

That does not works well, when you still have to transfer data and internet works for say 20 seconds, and reconnects for another 20 seconds, giving to you new ip address every time, and when service is available with significant lags. This happens rare, but when it happens, performance of TCP significantly drops, as TCP measures network congestion by measuring packet loss - that's another point AGAINST TCP, so connections usually stalls and that's all, and then some time passes till thing reconnect.

As for slush discussion - I see that you wanted to invent more generic protocol there. But this is what I dislike, I prefer approach when one thing solves one simple task. So one protocol for mining, etc... They may be compatible, messages may be aggregated over same pipes, but parsers of each protocol should be clean and simple. Otherwise things would likely be EXPLOITABLE and have some unexpected behavior. I have strong opinion that the best feature - is not implemented feature. So according to my logic - it would be best to not implement such protocol like I describe here, but because it doesn't work for me other ways (like passing zeromq for example), and I still need solution and going to implement it, I am going to do it in simplest and most efficient form.

legendary
Activity: 2128
Merit: 1073
Those two quotes from your recent posts are mutually contradictory.

Packet loss is ALWAYS handled on miner's side - if no reply received, request is re-tried with some delay,
with exponential backoff.

Bitcoind / pool NEVER retransmits packets on its own.
There no such flaw if you look carefully on protocol.
Server CAN respond with msg_opcode == 12 (work is invalidated)
BEFORE Miner sent his work.

I understand that you are trying to reinvent the whell like ZeroMQ or DCOM; but specialized for Bitcoin mining. Based on my previous discussions here I'm more or less convinced that you are going to either fail or rediscover why 0MQ or DCOM are seemingly too complex and in the process reimplement most of it.

I had a long and productive discussion with Mr. slush here regarding more general protocol, but with the same requirement of two-way asynchronous messaging. I'm not going to reiterate my points here, just give a link:

https://bitcointalksearch.org/topic/stratum-overlay-network-protocol-over-bitcoin-55842

Thank you for your undestanding.
sr. member
Activity: 266
Merit: 251
Sorry Mr. V, but your design has a serious flaw. There isn't an equivalent of the existing long-poll. The miner has to be able to register itself with the mining pool to receive notification about the need to abandon the current work because new block had been found.

Therefore any strict master-slave, request-response, pool-server-always passive protocol will end up beeing inefficient.

Full two-way communication is essential for any effective design.

There no such flaw if you look carefully on protocol.
Server CAN respond with msg_opcode == 12 (work is invalidated)
BEFORE Miner sent his work.

So miner would instantly request new work using getwork.

However there exists flaw for getwork itself, as it should send some credentials!

So getwork request should look like:

struct msg_getwork {
  struct msg_header hdr;
  unsigned char cred[32];
};

where cred may contain 25-byte bitcoin address or some kind of hash-sum that server can recognize
or 25-byte bitcoin address and some anti-ddos protection challenge, that is easy to verify but difficult
to forge.


2 kano:
About UDP vs TCP. UDP is _stateless_ just as IP. And this is important feature, if you would transfer packets through multiple links to multiple servers for reliability. Reliability of such transmissions is way better than TCP. And work/answer simply do not require all features of TCP, like ordered and guaranteed delivery. Because that features comes with a cost - server/client both have to maintain TCP connection state. Even worse conditions happens with long-poll over HTTP, as long-polling is not the thing that HTTP designers had in their minds, they wanted originally HTTP to be stateless, but people later re-used HTTP against their original design idea, to implement AJAX, etc.
legendary
Activity: 4592
Merit: 1851
Linux since 1997 RedHat 4
Well ... I was going to do a simple TCP/UDP pool/LP protocol with cgminer many months ago but never got around to it due to very little interest in it Tongue
(if you look hard you'll find my post on the subject in the cgminer thread way back some time)

The cgminer RPC API (I wrote) uses simple TCP/IP sockets, not curl/http

But the term "JSON" isn't actually what is relevant in this, it's the simple socket TCP or UDP vs the http protocol overhead
UDP is of course the lowest overhead of TCP and UDP but is not as reliable ... but I don't think that is much of an issue these days

JSON sux IMO also (I only added it as a last resort option to the cgminer RPC API), but the JSON overhead is small compared to the http protocol overhead.

Back then, one of the pools (cant remember which) also implemented LP over UDP which would easily be better than the crap way LP is done at the moment.
hero member
Activity: 504
Merit: 500
FPGA Mining LLC
Generally a very good idea, I've been asking for this a long time ago already. JSONRPC is just awfully inefficient for this purpose.
However I don't quite like some of the implementation details yet. This will have to be thought through very carefully Smiley
hero member
Activity: 910
Merit: 1000
Items flashing here available at btctrinkets.com
Wrog thread, sorry.
legendary
Activity: 2128
Merit: 1073
Sorry Mr. V, but your design has a serious flaw. There isn't an equivalent of the existing long-poll. The miner has to be able to register itself with the mining pool to receive notification about the need to abandon the current work because new block had been found.

Therefore any strict master-slave, request-response, pool-server-always passive protocol will end up beeing inefficient.

Full two-way communication is essential for any effective design.
sr. member
Activity: 266
Merit: 251
Dear BitCoin developers and mining pool owners,

Looking @ work of http://www.tricone-mining.com/limp.html and having experience with high performance stuff, I would like to point that overall many of these ways to deliver shares are highly suboptimal. As basically getwork/shares submission do not require such facilities of TCP like ORDER of packets sent/received from miner to bitcoind or pool. So individual packet acknowledgement would work better, especially when miner is behind of poor internet connectivity like myself.

So - I have proposal to stop re-inventing wheel all of the time and make this once and for many years. Also light-weight protocol would make burdens of ddos mitigation much easier. Basically unlike implementing HTTP protocol, this protocol when implemented even by not very skilled programmer could easily withstand gigabits of incoming bandwidth, just throw-in better network card into your server.

So - lightweight getwork protocol.

Single UDP packet may contain multiple messages, building up to 1400 bytes in size. 1400 choosen because mostly when you get access with several VPNs, your MTU would be cut down from 1500. So 1400 is safe compromise value. But mining client could correct this value to one of his choice. And even more - server shall not reply with more messages in single UDP packet, than client requested! So one could get 8 works to getwork requests only if client requested 8 getworks at once.

So - UDP packet is basically concatenation of several messages of following format:
struct msg_header {
  unsigned char msg_size;
  unsigned char msg_opcode;
  unsigned char msg_context_uuid[16];
};

struct msg_work {
  struct msg_header hdr; /* This is basically C-way of inheritation... in C++ this would be msg_work : msg_header */
  unsigned char work[80]; /* last 32-bit is nonce, but can contain bitstream unlock sequence (special for EldenTyrell) */
};

Opcodes:
REQUESTING WORK:
msg_opcode == 1  - getwork request, typically answer is 10, 11 or 12
msg_opcode == 10 - structure with msg_work returned, miner shall process that work
msg_opcode == 11 - getwork negative acknowledgement - too many requests, this opcode may be sent to friendly nodes
                             in case of too many getwork requests, so miner would not ambush server more;
msg_opcode == 12 - work is invalidated (when new block calculations started);

GETTING ANSWERS:
msg_opcode == 20 - answer message type is msg_work;
msg_opcode == 30 - successfully accepted msg_work;
msg_opcode == 31 - share rejected - stale;
msg_opcode == 32 - share rejected - invalid;
msg_opcode == 33 - share rejected - unknown;
msg_opcode == 34 - share rejected - duplicate;

FEDERATION OF SERVERS INTERNAL COMMUNICATION:
msg_opcode == 40 - I want to posses right to process this getwork request;
msg_opcode == 50 - I have no objection, proceed with that getwork;
msg_opcode == 51 - I have already processed that getwork, drop it;

TYPICAL MINER <--> BITCOIND OR POOL COMMUNICATION:
msg_opcode == 1  - sent from miner to bitcoind taking about 46 bytes to request work;
msg_opcode == 10 - 126 bytes (just 126! not 2 KB!) returned from pool to miner msg_work;
msg_opcode == 20 - msg_work with found nonce returned from miner to bitcoind;
msg_opcode == 30 - share is good - returned from bitcoind to miner;

Packet loss is ALWAYS handled on miner's side - if no reply received, request is re-tried with some delay,
with exponential backoff.

Bitcoind / pool NEVER retransmits packets on its own.

For miners behind firewall, special proxy can be launched that converts messages from TCP to UDP protocol. However
such proxies will be subject to DDOS attacks due to TCP itself... beware...

THEN - how DDOS mitigation can be implemented in this protocol (don't know whether this good for pool or for bitcoind):

Server collects frame of requests for say 100 milliseconds. On 1 Gbps internet that can be as many as 6'000'000 incoming messages for getwork. Then server should get only 100 to 1000 best getworks in his opinion from that buffer and silently drop rest of requests. How then server may choose ? Quite simply - by priority sorting - first server would rank higher those, who sent more shares within last 60 seconds, then sort by ip address, where ip that requested less getworks would rank higher to cut off botnets. Then drop all of the rest 5'999'000 requests. But well... if such protocol used through all stages, it can easily process 10'000 getworks per second on simple VPS!

Then - why UDP ?! Is it more complex ?

Actually not ... because in UDP it is EASY to setup replication of communication... Say you add up to your mining rig connected to ADSL additionally GSM or 3G internet... And you can transmit same getwork request over 2 channels simultaneously, and to 3 servers of your beloved pool.... That would increase traffic, but also would increase reliability by MAGNITUDE... So while pools are fighting 1% more or 1% less, many of miners loose about 2-3% for internet outages, some with bad internet connectivity loose about 10-12% (like me).

For TCP on other side you would always have hassle with maintaining connection state, etc. While in UDP you would have to remember _multiple_ miner "return addresses" for about 60 seconds, while getting packets from him is actual. And you would broadcast answers to all of these addresses.

I am going to implement this protocol soon for own purposes, but it ridiculous when finally I convert all of that into HTTP-JSON based getwork.

So I wanted to ask:

1. Who also would support such initiative/have similar problems ?

2. How code to bitcoind should be submitted/approved ? Is this addition subject to be added to existing bitcoind ? As maintenance of separate patch-set could be quite boring... Also there may be questions on how this would be implemented in best way (say bitcoind mainly relies on boost::asio).

Kind regards,
V. // BitFury :-)
Jump to: