Pages:
Author

Topic: [20 BTC] Multithreaded Keep-alive Implementation in Bitcoind - page 11. (Read 31477 times)

sr. member
Activity: 403
Merit: 250
I think i just discovered the issue with dying bitcoind, read the log below Smiley

I this when bitcoind unexpectedly  died, and i tried to restart it
Quote
11:44:15 <@jine> EXCEPTION: N5boost16exception_detail10clone_implINS0_19error_info_injectorINS_6system12sys tem_errorEEEEE
11:44:15 <@jine> Address already in use
11:44:15 <@jine> bitcoin in ThreadRPCServer()
11:44:15 <@jine> terminate called after throwing an instance of 'boost::exception_detail::clone_impl >'

What actually happend:
Quote
11:48:12 <@jine> I can't start bitcoind while 3000+ connections tries to connect to :8332
(This is due to the fact that pushpoold is using up all ports, including tcp/8332 to try to connect to bitcoind)

11:48:44 <@jine> When 3000+ people tries to connect to pushpoold
11:48:52 <@jine> I cant stop it on the "correct" way
11:48:57 <+cereal7802> so i take it your going to kill pushpool, try the patched bitcoind again, then restart push?
11:49:11 <@jine> Have to kill it with; kill -9 (forcefully)
(So all my previous test have been unsuccuessfull due to pushpoold still running, still using up sockets, which makes bitcoind not start / dying while initializing)

Will try that in a moment, but it looks promising Smiley
legendary
Activity: 1596
Merit: 1012
Democracy is vulnerable to a 51% attack.
Actually, it proved to be really easy! Here's a diff:
http://davids.webmaster.com/~davids/rpc.diff.txt

If it doesn't patch cleanly against some version you need to patch against, let me know. I should be able to rebase it without too much trouble. By the way, I was using 0.3.23.
legendary
Activity: 1596
Merit: 1012
Democracy is vulnerable to a 51% attack.
Looks good! Trying it with a small private pool and no errors to report so far.

Pushpool is happily re-using connections and work is still being dished out - TIME_WAITs have disappeared Smiley

For anyone struggling to get this to build, by the way, use 0.3.22, not .23 or git. Any chance of a diff rather than the full file JoelKatz?
Glad to hear it. Unfortunately, the changes are really invasive. It will be hard to produce a clean diff. I'll see what I can do.
hero member
Activity: 546
Merit: 500
Looks good! Trying it with a small private pool and no errors to report so far.

Pushpool is happily re-using connections and work is still being dished out - TIME_WAITs have disappeared Smiley

For anyone struggling to get this to build, by the way, use 0.3.22, not .23 or git. Any chance of a diff rather than the full file JoelKatz?
legendary
Activity: 1596
Merit: 1012
Democracy is vulnerable to a 51% attack.
Thanks for the idea of using curl to test. I put up a new build. I didn't realize I was still sending 'Connection: close' in the success replies. Doh!

Now I ran this test:

curl -u MyUser:MyPass --keepalive --data-binary '{"jsonrpc": "1.0", "id":"curltest", "method": "getwork", "params": [] }' -H 'content-type: text/plain;' "http://test:[email protected]:8332/[00-99]"

Without my patch, 100 connections, .2 seconds.
With my patch, 1 connection, .09 seconds.

The [00-99] makes it retrieve 100 URLs -- bitcoind ignores the URL.

The only change between the version you have and my latest is this:

@@ -1571,7 +1571,7 @@ string HTTPReply(int nStatus, const stri
     return strprintf(
             "HTTP/1.1 %d %s\r\n"
             "Date: %s\r\n"
-            "Connection: close\r\n"
+            "Connection: keep-alive\r\n"
             "Content-Length: %d\r\n"
             "Content-Type: application/json\r\n"
             "Server: bitcoin-json-rpc/%s\r\n"
legendary
Activity: 1596
Merit: 1012
Democracy is vulnerable to a 51% attack.
If you start a new curl process for each request, there is no way it can reuse the connection. You'd have to ask a single instance of curl to make multiple requests.

If you try this:
   curl --keepalive-time 60 --data-binary '{"jsonrpc": "1.0", "id":"curltest", "method": "getwork", "params": [] }'
 -H 'content-type: text/plain;' "http://test:[email protected]:8332/[00-99]"

You'll see it retrieve 100 times without using 100 connections.

Without my patch: 100 connections, 5 seconds.
With my patch: 1 connection, .067 seconds.
They were all 'access denied' though. I didn't send a password. I'll retest with correct credentials.
sr. member
Activity: 403
Merit: 250
Suggestions? I'll try this one in my personal dev-environment and see if i can replicate the issue.
I'm guessing that there's some path that doesn't release the RPC mutex. I put up a new build that might fix it, but it's hard to be sure since I can't replicate the problem.

Update: If that fails, I can just pull the multi-threading stuff and only fix keep-alives. That's a no-brainer (two lines change in trivial ways) and is very unlikely to cause any problems.

Tried out the latest version in my dev-environment - couldn't get the keep-alive to work tho.
It seems to create a new connection for each request.

Tried with the following script...

Code:
jine@srv:~/bitcoin-0.3.23/src/src$ cat test.sh
while [ 1 ]
do
        curl --keepalive-time 60 --data-binary '{"jsonrpc": "1.0", "id":"curltest", "method": "getwork", "params": [] }' -H 'content-type: text/plain;' http://test:[email protected]:8332/
done

Directly against bitcoind that is, but i think curl should handle keep-alives well...
Any suggestions? :/ Could NOT get it to crash tho. Tried multiple times with huge number of requests etc... (not tried pushpoold yet tho)
legendary
Activity: 1596
Merit: 1012
Democracy is vulnerable to a 51% attack.
Suggestions? I'll try this one in my personal dev-environment and see if i can replicate the issue.
I'm guessing that there's some path that doesn't release the RPC mutex. I put up a new build that might fix it, but it's hard to be sure since I can't replicate the problem.

Update: If that fails, I can just pull the multi-threading stuff and only fix keep-alives. That's a no-brainer (two lines change in trivial ways) and is very unlikely to cause any problems.
sr. member
Activity: 403
Merit: 250
Something goes wrong when running your current patched bitcoind in our prod. environment.
Now the process just hung, couldn't get any coredump out of it or similar Sad

Suggestions? I'll try this one in my personal dev-environment and see if i can replicate the issue.
legendary
Activity: 1596
Merit: 1012
Democracy is vulnerable to a 51% attack.
The "right way" seems pretty awesome to me... Smiley
Okay. If I get the existing asio work for bitcoind, I'll work on that. I've grabbed the source code to the mining daemon and am looking at how it implements long polling.

Update: It looks like pushpoold already has a way to do this, with blkmond. So the only issue is the connection buildup, which I think I've already fixed.
sr. member
Activity: 403
Merit: 250
Lets try this out right now. Compiling the code as we speak.
Give you an update and/or coredump in a while Smiley

The "right way" seems pretty awesome to me... Smiley
legendary
Activity: 1596
Merit: 1012
Democracy is vulnerable to a 51% attack.
Even so it would make much more sense to do it properly (so we end up having a useful pull request against bitcoin); the asio route appears to be the way to go (patch is there, yet bugged) instead of spawning multiple threads.
Oh, do you know where the patch is? There's a good chance I could debug it. I've been meaning to learn about boost anyway. (My day job involves high-performance, multi-threaded TCP server code.)

Asio won't actually gain you much over my patch. The main advantage of asio is when you have large numbers of connections, most of which aren't very active or when large numbers of them become active at the same time. You would want it in a mining controller. Think about the large number of connections, the long periods of inactivity, and the sudden burst when a new block comes out. Without asio, you have to have a context switch for each connection. With asio, you do not.

That said, there's basically no downside, and it's also the right thing to do.
ius
newbie
Activity: 56
Merit: 0
In truth, none of these are the right solution. I have some ideas for the 'right' solution (bitcoind should push changes to the mining controller so it doesn't have to poll), and I'll try to get them thought out and proposed as modifications to the official source. (Think of it as extending long polling back one more link in the chain.)

Even so it would make much more sense to do it properly (so we end up having a useful pull request against bitcoin); the asio route appears to be the way to go (patch is there, yet bugged) instead of spawning multiple threads.
newbie
Activity: 28
Merit: 0
JoelKatz seems to be a Hero...
legendary
Activity: 1596
Merit: 1012
Democracy is vulnerable to a 51% attack.
There's another new version up. I realized the JSON code wasn't re-entrant, which creates a problem when you try to use it in more than one thread. Unfortunately, the 'getblock' code isn't re-entrant, and the simplest way to deal with that is to wrap all the RPC handlers in a big mutex. It doesn't seem to have any effect on performance though, so I'll leave it that way because it's much safer.

This version makes the JSON code re-entrant, but single threads calls to do the actual RPC. This is slightly less than optimum, but in all of my tests it made no difference. Multi-threading the actual RPC calls would carry a significant risk that some part of that code would break for no significant benefit and unless the 'getblock' code was pessimized for the most common case, it wouldn't benefit anyway. (Plus, there would have to be invasive changes to the code that handles when you successfully find a block, and that scares me because it's so critical and so hard to test.)

Please test this version. It should solve the problem.

I also have a version with UNIX domain sockets available if anyone's interested (it's not up at the moment, but PM me if you want it). It's very ugly right now because I haven't had time to polish it, but it does work. It supports a
'-unixsocket=' option. The protocol is a single line query and a single line response, no headers, no authentication (so put the socket in a directory only the authorized user can access). There is also code to issue RPC calls over the UNIX-domain socket, so you can see how to do it and see that it works. The biggest issue with it right now is that if you make any errors, it just closes the socket. You can issue multiple requests over a single connection though, and of course there are no stale socket issues.

The biggest ugliness is that I couldn't figure out how to bind a basic_istream to a local::stream_protocol::socket. So I had to use a 'receive' call instead of 'getline'. If anyone knows how to do that, I'd appreciate a PM. (I always meant to learn boost.)

In truth, none of these are the right solution. I have some ideas for the 'right' solution (bitcoind should push changes to the mining controller so it doesn't have to poll), and I'll try to get them thought out and proposed as modifications to the official source. (Think of it as extending long polling back one more link in the chain.)
legendary
Activity: 1596
Merit: 1012
Democracy is vulnerable to a 51% attack.
It must be a problem that only appears under load or under some particular combination of requests. I'll try to do some more troubleshooting. I don't have much time left today, but I should have a few hours tomorrow that I can dedicate.

Update1: Do you compile bitcoind with any non-standard settings? And do you start it with any command line flags other than '-daemon'? Do you enable RPC over SSL?

Update2: I made a few cleanups and fixed a few very minor issues. But I can't replicate your issue, which means either I solved it or it requires something unique to your setup to replicate. If you get a chance to try my latest (same place) please do. If you can compile it with '-g' (I believe that's provided by default), make sure not to strip the executable, do a 'ulimit -c unlimited' before you execute it, and if it crashes, run 'gdb' on the core file like this: 'gdb /path/to/bitcoind /path/to/core.filename' and then message me the output of a 'where' command. (You may have to hit 'enter' a few times to get the full output. It'll be the last few lines that will be the most helpful.)

As soon as you have the core file, you can restart the original bitcoind. You don't need to have any downtime. Just make sure to pass 'gdb' the path to the bitcoind executable that generate the core file.
sr. member
Activity: 403
Merit: 250
Same URL? Will try it out in a moment.
Yes, same URL.

Quote
Thats actually better, BUT it would require modifications to both bitcoind and pushpoold.
This was the previous main target, but i couldn't find anyone able to implement it.
It's better in some ways and worse in others. Alone, it won't help the fact that bitcoind can't respond while it's waiting for another connection to respond. But it would eliminate the TIME_WAIT states that pile up because UNIX domain sockets don't have them. You'd need a multi-listening patch like mine to listen on a UNIX-domain socket and a TCP socket at the same time (or you could call select, but that's ugly). I haven't seen the code on the other side yet, so I don't know what's involved in making it use a UNIX domain socket.

Of course, if you have multi-threaded listening, keep alives, and UNIX domain sockets, that should *really* solve the problem.


Unfortunately, i could not get this version to work in our production environment either.
The strange thing is that this DO work if i run it separately... Downloaded nightly build of blockchain, tried it out on port 1337 and it started, i could issue getwork-requests to it and it seems to have reused the socket to.

On the other hand, when i replace our bitcoind with this patched version - it starts, runs for 4 seconds and then silently dies. I was able to issue one getinfo requests from it's client - but then it died.

Straaaaange... but i really think we're onto something here.
legendary
Activity: 1596
Merit: 1012
Democracy is vulnerable to a 51% attack.
best would be to remove the network way from bitcoind to pushpoold.

on one hand shitty to force both running on one machine, but the network bottleneck would be gone by sharing that data via ram instead of network
It is shared by RAM. Connections between two processes on the same machine don't actually flow over any network. It just emulates a network, and with that comes both good things and bad things.
hero member
Activity: 826
Merit: 500
best would be to remove the network way from bitcoind to pushpoold.

on one hand shitty to force both running on one machine, but the network bottleneck would be gone by sharing that data via ram instead of network
legendary
Activity: 1596
Merit: 1012
Democracy is vulnerable to a 51% attack.
Same URL? Will try it out in a moment.
Yes, same URL.

Quote
Thats actually better, BUT it would require modifications to both bitcoind and pushpoold.
This was the previous main target, but i couldn't find anyone able to implement it.
It's better in some ways and worse in others. Alone, it won't help the fact that bitcoind can't respond while it's waiting for another connection to respond. But it would eliminate the TIME_WAIT states that pile up because UNIX domain sockets don't have them. You'd need a multi-listening patch like mine to listen on a UNIX-domain socket and a TCP socket at the same time (or you could call select, but that's ugly). I haven't seen the code on the other side yet, so I don't know what's involved in making it use a UNIX domain socket.

Of course, if you have multi-threaded listening, keep alives, and UNIX domain sockets, that should *really* solve the problem.
Pages:
Jump to: