Pages:
Author

Topic: [20 BTC] Multithreaded Keep-alive Implementation in Bitcoind - page 5. (Read 31453 times)

legendary
Activity: 1260
Merit: 1000
Ok, well, with your preview.diff, bitcoind won't start.  Here is debug.log errors:

Bitcoin version 0.3.24-beta
Default data directory /home/xxx/.bitcoin
Bound to port 8333
Loading addresses...
dbenv.open strLogDir=/home/xxx/.bitcoin/database strErrorFile=/home/xxx/.bitcoin/db.log
Loaded 356873 addresses
 addresses              1032ms
Loading block index...
LoadBlockIndex(): hashBestChain=00000000000002b99ddf  height=136616
 block index            1914ms
Loading wallet...
nFileVersion = 32400
fGenerateBitcoins = 0
nTransactionFee = 0
addrIncoming = 255.255.255.255:8333
fMinimizeToTray = 0
fMinimizeOnClose = 0
fUseProxy = 0
addrProxy = 127.0.0.1:9050
 wallet                  118ms
Done loading
mapBlockIndex.size() = 136636
nBestHeight = 136616
mapKeys.size() = 172
setKeyPool.size() = 101
mapPubKeys.size() = 172
mapWallet.size() = 190
mapAddressBook.size() = 2
Loading addresses from DNS seeds (could take a while)
AddAddress(84.49.174.161:8333)
48 addresses found from DNS seeds
sending: version (85 bytes)
ThreadRPCServer started
ipv4 eth0: x.x.x.x
addrLocalHost = x.x.x.x:8333
ThreadSocketHandler started
ThreadIRCSeed started
ThreadOpenConnections started
ThreadMessageHandler started
trying connection 67.172.181.225:8333 lastseen=-0.1hrs lasttry=-364126.2hrs
IRC :irc.lechat.ir NOTICE AUTH :*** Looking up your hostname...
connected 67.172.181.225:8333
sending: version (85 bytes)



bitcoind exits at this point
legendary
Activity: 1596
Merit: 1012
Democracy is vulnerable to a 51% attack.
When you say ready soon do you mean today or within a couple days?  I don't want to start digging around in the code  and making a nuisance of myself if there's going to be some changes within the next few hours as far as that goes.
1 should be ready today. All that's left is a final audit and testing to make sure I can't break it. There are unlikely to be any significant changes and there's a good chance there will be no changes at all. Testing and reports are very helpful, so don't worry about making a nuisance of yourself. "It worked for me" is extremely helpful because it helps me get closer to the confidence level needed for release. "It didn't work for me" is extremely helpful because I hate to find issues after I release.
legendary
Activity: 1260
Merit: 1000
Cool, I will try it out and see what happens.  You might consider removing the upnp junk in the diff as well, since it's an extra dependency that is a) a pain in the ass and b) completely useless in this application.  I remove it from the makefile before compiling.

Just to clarify, is this preview a potential fix for the lockup issue I'm experiencing or is that part of number 2?

When you say ready soon do you mean today or within a couple days?  I don't want to start digging around in the code  and making a nuisance of myself if there's going to be some changes within the next few hours as far as that goes.
legendary
Activity: 1596
Merit: 1012
Democracy is vulnerable to a 51% attack.
Ahh, okay. I think there's a locking bug in those updates. I'm working on two things:

1) A new 4-diff based on 0.3.24 that includes all the updates that are believed to be safe.

2) A new diff based on 0.3.24 that includes even the getwork pre-compute update that is suspected to be responsible for deadlocks, but hopefully with the deadlock issue fixed.

1 should be ready soon. I still haven't done final auditing of the diff and testing to make sure it works.
I just put up a preview here:
http://davids.webmaster.com/~davids/preview.diff

2 may take a bit longer.
legendary
Activity: 1596
Merit: 1012
Democracy is vulnerable to a 51% attack.
Well, I tried a fresh copy of .23 and applied the update and diff4, still same problem as somebadgers repository.  System will basically stop responding after ~12 hours and bitcoind needs to be restarted and everything picks right back up.
What is this update?
legendary
Activity: 1260
Merit: 1000
Well, I tried a fresh copy of .23 and applied the update and diff4, still same problem as somebadgers repository.  System will basically stop responding after ~12 hours and bitcoind needs to be restarted and everything picks right back up.

How can I trouble shoot this?
legendary
Activity: 1596
Merit: 1012
Democracy is vulnerable to a 51% attack.
I'm sorry for wasting your time, but really appreciate what you have taught me about debugging/troubleshooting these issues.
No problem. That's par for the course. I've made more stupid mistakes that have wasted other people's times than I can count.
newbie
Activity: 20
Merit: 0
This is one of the annoying things about C++ exception handling. The exception was caught, and that obscured the information needed to find the code that generated it. Sad

All we can tell from that is that the RPC code threw an exception. This could be for reasons that really aren't the code's fault, such as running out of memory at a critical point, or (more likely) they could be due to bugs in the code.

One thing you can try -- use the 'up' command until you get to level 7, the 'ThreadRPCServer' call. And type 'print e'. If for some reason that doesn't work, you can try level 6, 'PrintException' and the command 'print pszMessage'.


So it turns out I made a stupid mistake.  As part of a test some days ago, I had created a cron job that stopped and restarted bitcoind on an hourly basis.  It appears that I wasn't waiting long enough for the bitcoind stop command to close out bitcoind.  The 'trying to start bitcoind when it was in the midst of still shutting down' seems to have caused the cores.

I'm sorry for wasting your time, but really appreciate what you have taught me about debugging/troubleshooting these issues.
legendary
Activity: 1596
Merit: 1012
Democracy is vulnerable to a 51% attack.
Made the changes, I will see how it goes.

I get this when doing a kill on bitcoind BTW:

terminate called after throwing an instance of 'boost::exception_detail::clone_impl >'
  what():  mutex: Invalid argument

It looks like bitcoind is not exiting cleanly when it gets a SIG to die.
The most likely issue is in this code:

 { // CAUTION: Raising the delay will slow connection accept
     boost::posix_time::time_duration wait_duration = boost::posix_time::millisec(250);
     boost::unique_lock lock(mWorkNotification);
     if(!fWorkFound)
         cvWorkNotification.timed_wait(lock, wait_duration); // ** HERE **
 }

I wonder if I have to enclose that in a try/catch block or it doesn't release the mutex if the timed wait is interrupted. I'll try to track it down.

You can safely replace that entire block of code with 'Sleep(250);' if you want. Network latency (the time between when you get a block or transaction and the time you can pass it to neighboring nodes) will be slightly higher -- but still better than without the hub patches. But if nothing else, it would tell me if that block is causing the issue.
legendary
Activity: 1260
Merit: 1000
Made the changes, I will see how it goes.

I get this when doing a kill on bitcoind BTW:

terminate called after throwing an instance of 'boost::exception_detail::clone_impl >'
  what():  mutex: Invalid argument

It looks like bitcoind is not exiting cleanly when it gets a SIG to die.

legendary
Activity: 1596
Merit: 1012
Democracy is vulnerable to a 51% attack.
I put in some debug code, why is this failing?

[2011-07-16 04:22:57.656676] JSON-RPC call failed: (null)
[2011-07-16 04:22:57.656694] submit_work json_rpc_call failed
[2011-07-16 04:22:57.656701] curl: P*, srv.rpc_url: http://127.0.0.1:8332/, srv.rpc_userpass: xxx:xxx, s: {"method": "getwork", "params": [ "000000017389ec1cfb159619a8182312807a8d7a77041d1d835f6377000008f00000000020988cf 8191cab5a828a8bd44b73c7310ab418f1c469f9d836d66d8dd6cf139c4e2112071a0abbcf6a64c9 c400000000000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000" ], "id":1}

The above submit_work call is what's failing with JoelKatz patch and update applied.  it works fine without the patch.  This comes from the submit_work() function in msg.c

Nice catch!!!! This is a bug in the code that processes a found block. Fortunately, we are correctly processing the block, so you are getting credit for it. However, we bungle getting the information back to the caller and return nothing. Here's the fix:

--- rpc.cpp~    2011-07-10 04:37:16.000000000 -0700
+++ rpc.cpp     2011-07-15 21:47:58.097050116 -0700
@@ -1461,7 +1461,7 @@ Value getwork(const Array& params, bool
             npblock->vtx[0].vin[0].scriptSig = CScript() << pblock->nBits << CBigNum(nExtraNonce);
             npblock->hashMerkleRoot = npblock->BuildMerkleTree();
 
-            Value ret = CheckWork(npblock, reservekey);
+            ret = CheckWork(npblock, reservekey);
         }
         return ret;
     }

legendary
Activity: 1260
Merit: 1000
I put in some debug code, why is this failing?

[2011-07-16 04:22:57.656676] JSON-RPC call failed: (null)
[2011-07-16 04:22:57.656694] submit_work json_rpc_call failed
[2011-07-16 04:22:57.656701] curl: P*, srv.rpc_url: http://127.0.0.1:8332/, srv.rpc_userpass: xxx:xxx, s: {"method": "getwork", "params": [ "000000017389ec1cfb159619a8182312807a8d7a77041d1d835f6377000008f00000000020988cf 8191cab5a828a8bd44b73c7310ab418f1c469f9d836d66d8dd6cf139c4e2112071a0abbcf6a64c9 c400000000000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000" ], "id":1}

The above submit_work call is what's failing with JoelKatz patch and update applied.  it works fine without the patch.  This comes from the submit_work() function in msg.c

legendary
Activity: 1596
Merit: 1012
Democracy is vulnerable to a 51% attack.
This is one of the annoying things about C++ exception handling. The exception was caught, and that obscured the information needed to find the code that generated it. Sad

All we can tell from that is that the RPC code threw an exception. This could be for reasons that really aren't the code's fault, such as running out of memory at a critical point, or (more likely) they could be due to bugs in the code.

One thing you can try -- use the 'up' command until you get to level 7, the 'ThreadRPCServer' call. And type 'print e'. If for some reason that doesn't work, you can try level 6, 'PrintException' and the command 'print pszMessage'.
newbie
Activity: 20
Merit: 0
Can you paste the code from your 'rpc.cpp' file around line 1897 (say five lines before and five after). And please identify exactly which line is 1897.


Here is a snip from rpc.cpp:
Code:

void ThreadRPCServer(void* parg)
{
    IMPLEMENT_RANDOMIZE_STACK(ThreadRPCServer(parg));
    try
    {
        vnThreadsRunning[4]++;
        ThreadRPCServer2(parg);
        vnThreadsRunning[4]--;
    }
    catch (std::exception& e) {
        vnThreadsRunning[4]--;
        PrintException(&e, "ThreadRPCServer()");
    } catch (...) {
        vnThreadsRunning[4]--;
        PrintException(NULL, "ThreadRPCServer()");
    }
    printf("ThreadRPCServer exiting\n");
}

Line 1897 is: PrintException(&e, "ThreadRPCServer()");



Just to verify that I am doing things right, to create this file I:

Code:
git clone http://github.com/bitcoin/bitcoin/ davids
cd davids
git checkout v0.3.23
cd src
patch  < ~/src/bitcoin/bitcoin-4diff.txt
patch  < ~/src/bitcoin/updates.diff.txt
then built bitcoind normally.
legendary
Activity: 1260
Merit: 1000
Anyone got any ideas what's going on on my end?

Can't really run a backtrace since it's not crashing exactly.
legendary
Activity: 1596
Merit: 1012
Democracy is vulnerable to a 51% attack.
Can you paste the code from your 'rpc.cpp' file around line 1897 (say five lines before and five after). And please identify exactly which line is 1897.
newbie
Activity: 20
Merit: 0
Hello,

I grabbed bitcoin v0.3.23 and applied bitcoin-4diff.txt and updates.diff.txt.

This is on 64bit CentOS 5.5.

I am now generating bitcoind cores every hour on the hour.  I am fairly new to troubleshooting this type of thing, but a backtrace shows:

Code:
Core was generated by `/home/bitcoin/bitcoind -testnet -conf=/etc/bitcoin.conf -daemon -pollpidfile=/v'.
Program terminated with signal 6, Aborted.
#0  0x00000039f9e30265 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x00000039f9e30265 in raise () from /lib64/libc.so.6
#1  0x00000039f9e31d10 in abort () from /lib64/libc.so.6
#2  0x00000039fd2bed14 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib64/libstdc++.so.6
#3  0x00000039fd2bce16 in ?? () from /usr/lib64/libstdc++.so.6
#4  0x00000039fd2bce43 in std::terminate() () from /usr/lib64/libstdc++.so.6
#5  0x00000039fd2bcec5 in __cxa_rethrow () from /usr/lib64/libstdc++.so.6
#6  0x000000000040a40a in PrintException (pex=, pszThread=) at util.cpp:659
#7  0x00000000004adbee in ThreadRPCServer (parg=0x0) at rpc.cpp:1897
#8  0x00000039faa0673d in start_thread () from /lib64/libpthread.so.0
#9  0x00000039f9ed44bd in clone () from /lib64/libc.so.6

my bitcoind command line is:

/usr/local/sbin/bitcoind -conf=/etc/bitcoin.conf -daemon -pollpidfile=/var/run/pushpoold/pushpoold.pid

with bitcoind.conf only containing rpc user and pass.

Does anyone have any suggestions as to what I can do to resolve this issue?

Thanks much,
   btcmonkey

legendary
Activity: 1260
Merit: 1000
It's my understanding Somebadger backed off to .23 in the git repository.  Is that not correct?
hero member
Activity: 630
Merit: 500
I think it's because somebadger is patching against 0.3.24 and JoelKatz started with 0.3.23.  Correct me if I'm wrong.
Pages:
Jump to: