Pages:
Author

Topic: [20 BTC] Multithreaded Keep-alive Implementation in Bitcoind - page 9. (Read 31406 times)

legendary
Activity: 1750
Merit: 1007
1 TH/sec is where idles started coming back for my DE server, which is an i7-920 system.  I believe the new bottleneck is the getwork processing.
legendary
Activity: 1750
Merit: 1007
800 GH/sec and counting on the BTC Guild stress test.
UPDATE:  900 and counting.

BTC Guild, where test runs are done on production servers!
sr. member
Activity: 406
Merit: 250
I have a new version out at:
http://davids.webmaster.com/~davids/bitcoin-3diff.txt

It has the turbo RPC changes, the new hub mode to reduce stale/lost blocks, and native long polling support. Please read the documentation at the top of the file.

CAUTION: This code was just finished and has not been well-tested. Test results are appreciated. Bug reports and success reports are equally welcome.

Update: Version 0.4 is now up. It fixes a bug that could cause bitcoind to hang when directed to shutdown.

Currently testing v0.5 on live server.... (a whopping 3 users since relaunch). Will let you know how it turns out.

So far so good. LP notifications seem faster than a certain unnamed big pool Wink Also it seems to favor my pool during new getwork causing my pool to get a higher share of hashrates than said big pool when using hashkill.
legendary
Activity: 1596
Merit: 1012
Democracy is vulnerable to a 51% attack.
I have a new version out at:
http://davids.webmaster.com/~davids/bitcoin-3diff.txt

It has the turbo RPC changes, the new hub mode to reduce stale/lost blocks, and native long polling support. Please read the documentation at the top of the file.

CAUTION: This code was just finished and has not been well-tested. Test results are appreciated. Bug reports and success reports are equally welcome.

Update: Version 0.4 is now up. It fixes a bug that could cause bitcoind to hang when directed to shutdown.
sr. member
Activity: 403
Merit: 250
Forgot to mention, thanks a lot for your hard work.  Shocked
We'll remember this in the future, that you can be sure of.

sr. member
Activity: 403
Merit: 250
Compiled it and pushed it live now.
No segfault or errors in the beginning anyway Wink

We'll see in a few hours. I'll post here if anything happens.
legendary
Activity: 1596
Merit: 1012
Democracy is vulnerable to a 51% attack.
It's a resource leak, and it's not in my changes! It's in the 'CreateThread' function. I guess I use that more heavily than anything else, so I exposed the resource leak. Here's the fix:

--- orig/util.h 2011-06-28 08:28:03.070006598 -0700
+++ new/util.h  2011-06-28 19:51:51.186449938 -0700
@@ -624,7 +624,10 @@ inline pthread_t CreateThread(void(*pfn)
         return (pthread_t)0;
     }
     if (!fWantHandle)
+    {
+        pthread_detach(hthread);
         return (pthread_t)-1;
+    }
     return hthread;
 }

Sorry about the rocky road. This one was just bad luck though. This change should be merged with the official client -- it's an obviously-correct, obviously-safe change.
sr. member
Activity: 403
Merit: 250
Quote
root@bitcoins:/usr/local/sbin# file bitcoind
bitcoind: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, not stripped
(Running on a 32-bit debian 6 release, latest stable)

root@bitcoins:/usr/local/sbin# ulimit -u
unlimited

I do have some requests to bitcoind from PHP (php-fpm) but so many. A few requests per minute - MAX (payouts, some stats which are cached in memcache etc)

EDIT:
Code:
root@bitcoins:/usr/local/sbin# ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 16382
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 131072
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) unlimited
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
legendary
Activity: 1596
Merit: 1012
Democracy is vulnerable to a 51% attack.
It sounds like you either ran out of threads or ran out of address space. Sad

Do have your 'ulimit -u' value set to something unusually low? Or do you have lots of things other than pushpoold that also talk to bitcoind? Is the bitcoind a 32-bit executable or 64-bit? The 'file bitcoind' command will tell you.

sr. member
Activity: 403
Merit: 250
I think this was from when bitcoind hung:

Quote
sending: addr (31 bytes)
sending: addr (31 bytes)
sending: addr (31 bytes)
sending: addr (31 bytes)
received: addr (61 bytes)
received: addr (31 bytes)
Error: pthread_create() returned 11










*restarted*




Bitcoin version 0.3.23-beta
Default data directory /root/.bitcoin
Bound to port 8333
Loading addresses...
sr. member
Activity: 403
Merit: 250
A couple of minutes ago bitcoind became unresponsive, used up 25-45% of the dedicated 8 cores (2xE5504 quad) and halted.

Code:
Mem:   3085764k total,  1642032k used,  1443732k free,   154552k buffers
Swap:   240632k total,        0k used,   240632k free,   955448k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 7573 root      20   0 3052m 135m 7584 S   47  4.5  46:18.87 bitcoind
 7646 root      20   0 31180  23m 2116 S   10  0.8  20:21.11 pushpoold
  866 www-data  20   0 23204  14m 1692 S    4  0.5   2:06.18 nginx
  851 nobody    20   0 67752  23m  796 S    2  0.8  21:02.75 memcached
13181 www-data  20   0 25724 6512 3244 S    2  0.2   0:00.02 php5-fpm
13192 root      20   0  2468 1100  796 R    2  0.0   0:00.01 top

Nothing ordinary in dmesg / syslog nor netstat what i can see...
EDIT: This is with the ordinary rpc.cpp patch for keep-alive & threading - not the long-polling one.
sr. member
Activity: 403
Merit: 250
Compiled it and started it, seems to run fine so far... I'll keep you updated.
legendary
Activity: 1596
Merit: 1012
Democracy is vulnerable to a 51% attack.
I believe I found your bug. All builds have been updated. It occurs when a JSON function times out. We try to close the underlying stream to force the HTTP parser to time out, but that doesn't work for SSL, and even if you aren't using SSL, if you compile with SSL support, all your API streams are SSL streams with SSL off. With the bug, when an RPC connection times out in bitcoind and then later times out in the caller, we fault.
legendary
Activity: 1596
Merit: 1012
Democracy is vulnerable to a 51% attack.
Oh no!

You can run bitcoind in a script, instead of daemon mode. That way, if it dies, it will restart immediately and automatically. That's not great, but pain minimization is important. Basically:

while `true`; do bitcoind parameters go here; date >> /path/to/crash.log; done

If you do a 'ulimit -c unlimited' before running bitcoind, and make sure you aren't running a service that eats core files (like 'abrt'), you should be able to get a core dump you can analyze. Use 'gdb ' followed by the 'where' command. (Keeping hitting enter to get the full output, the last few lines are usually the most important.)
sr. member
Activity: 403
Merit: 250
/me cries.

At 23:13 bitcoind died. I'm not sure why, can't find anything more then this:
Quote
Jun 28 23:09:01 bitcoins /USR/SBIN/CRON[31658]: (root) CMD (  [ -x /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -type f -cmin +$($
Jun 28 23:09:35 bitcoins pushpoold[988]: mysql pwdb query failed at fetch
Jun 28 23:09:58 bitcoins pushpoold[988]: mysql pwdb query failed at fetch
Jun 28 23:10:01 bitcoins /USR/SBIN/CRON[8590]: (jine) CMD (cd /var/www; php cron.php >> cron.log)
Jun 28 23:10:50 bitcoins pushpoold[988]: mysql pwdb query failed at fetch
Jun 28 23:11:19 bitcoins pushpoold[988]: mysql pwdb query failed at fetch
Jun 28 23:12:08 bitcoins pushpoold[988]: mysql pwdb query failed at fetch
Jun 28 23:12:42 bitcoins pushpoold[988]: mysql pwdb query failed at fetch
Jun 28 23:13:02 bitcoins pushpoold[988]: HTTP request failed: couldn't connect to host
Jun 28 23:13:02 bitcoins pushpoold[988]: HTTP request failed: couldn't connect to host
Jun 28 23:13:02 bitcoins pushpoold[988]: HTTP request failed: couldn't connect to host
Jun 28 23:13:02 bitcoins pushpoold[988]: HTTP request failed: couldn't connect to host
Jun 28 23:13:02 bitcoins pushpoold[988]: HTTP request failed: couldn't connect to host
Jun 28 23:13:02 bitcoins pushpoold[988]: HTTP request failed: couldn't connect to host
Jun 28 23:13:02 bitcoins pushpoold[988]: HTTP request failed: couldn't connect to host
Jun 28 23:13:02 bitcoins pushpoold[988]: HTTP request failed: couldn't connect to host

Quote
sending: inv (109 bytes)
received: inv (37 bytes)
  got inventory: tx e9c02552aa536253625a  have
sending: addr (31 bytes)
received: inv (37 bytes)
  got inventory: tx 5a643f15315d475b8136  have
sending: inv (37 bytes)
received: addr (31 bytes)
sending: inv (37 bytes)
received: addr (31 bytes)
received: addr (31 bytes)
sending: addr (31 bytes)
received: addr (31 bytes)
received: inv (37 bytes)
  got inventory: tx e9c02552aa536253625a  have
sending: addr (31 bytes)
received: addr (61 bytes)
AddAddress(88.122.132.102:8333)
received: addr (31 bytes)
received: addr (31 bytes)











(I then restarted it)



Bitcoin version 0.3.23-beta
Default data directory /root/.bitcoin
Bound to port 8333
Loading addresses...
dbenv.open strLogDir=/root/.bitcoin/database strErrorFile=/root/.bitcoin/db.log
Loaded 440713 addresses
 addresses              2626ms
Loading block index...
LoadBlockIndex(): hashBestChain=000000000000072182e9  height=133738
 block index            5793ms
Loading wallet...
...
legendary
Activity: 1596
Merit: 1012
Democracy is vulnerable to a 51% attack.
However (as mentioned before), I believe the proper way of solving the issue is by implementing keepalive on top of the asio [1] pull request (that's what Jeff suggested). I've looked into the issue reported there - turns out to be a trivial fix (send buffer goes out of scope, hence large tx'es fail).
I agree 100%. Using asio would be much better. My changes to implement the HTTP protocol properly with respect to keepalives should work fine with that code. (And my changes to rpc.cpp are definitely not mergable as they are.)

I'll see what I can do.
ius
newbie
Activity: 56
Merit: 0
Test reports, suggestions, and donations are all welcome.

Great job.

However (as mentioned before), I believe the proper way of solving the issue is by implementing keepalive on top of the asio [1] pull request (that's what Jeff suggested). I've looked into the issue reported there - turns out to be a trivial fix (send buffer goes out of scope, hence large tx'es fail).

Implementing keepalive on top of that should only take a few lines (header handling and timeout), and is probably a better merge candidate. Seems to work for me after fixing the asio patch and adding a quick hack to accept additional requests.

[1]: https://github.com/bitcoin/bitcoin/pull/214
sr. member
Activity: 403
Merit: 250
We're up and running with the new patch.
Some fast stats:

Quote
root@bitcoins:/usr/local/sbin# netstat -an | grep 8332
tcp        0      0 127.0.0.1:8332          0.0.0.0:*               LISTEN
tcp        0    769 127.0.0.1:8332          127.0.0.1:42015         ESTABLISHED
tcp        0      0 127.0.0.1:42015         127.0.0.1:8332          ESTABLISHED

Quote
21:09:31 <@jine> damn, bitcoind is really fast
21:09:35 <@jine> getinfo takes about 0.01 sec now
21:09:39 <@jine> from 0.5s
21:09:45 <@jine> real    0m0.097s

Quote
21:04:56 <@jine> root@bitcoins:/usr/local/sbin# netstat -an | grep TIME | wc -l
21:04:56 <@jine> 833


By the way, here's the promised reward to our hero JoelKatz:
http://blockexplorer.com/tx/263dccc9f3899b4118a8f8568064ba2b417868ecff51329efad83b6446f22fed

I'm holding my breath and watching every stats live atm... We'll see if this run with our amount of connections - stable.
So far - It's rocking.
sr. member
Activity: 406
Merit: 250
Okay, all builds have now been updated. (rpc.diff.txt, bitcoin-3diff.txt, rpc.cpp)

It now does keepalives correct if the client specified HTTP/1.1 and didn't ask for the connection to be closed or the client specified HTTP/1.0 and asked for the connection to be kept alive. If this doesn't fix your connection hanging issue, make sure you are either sending "HTTP/1.0" or "Connection: close".

That did it! Thanks!
sr. member
Activity: 403
Merit: 250
We're pushing this live in 40min. As soon as we got connection to bitcoind again, I'm paying out the promised 20 BTC.
I seriously love you right now Smiley

EDIT: To what address would you like to receive the payout? Feel free to PM me.
Or do you have a BitLC account i could credit? Choice is yours Smiley

/ Jim
Pages:
Jump to: