coin daemon becomes completely unresponsive over time NOMP/MPOS based pool

jonnybravo0311

legendary

Activity: 1344

Merit: 1024

Mine at Jonny's Pool

Hi barrysty1e,

Thanks for the reply. As the thread title states, I'm running NOMP/MPOS. NOMP is the stratum implementation, MPOS is the front end.

I took psycodad's advice and upped my thread count, which seemed to have had some effect; however, it has not solved the problem completely. I've also taken some steps to use better memory allocation (apparently there's an issue with fragmentation and glibc). Using a better memory allocation manager like jemalloc also seems to have had an effect; however, it too has not solved the problem.

Unfortunately, I'm still stuck with doing a restart every day. Keeps the memory manageable and things running relatively smoothly. I've been reading through a number of mail threads and comments on github related to this issue, but so far nobody has produced a working solution.

barrysty1e

hero member

Activity: 636

Merit: 516

just ran into a similar issue, hope it helps.. assume you're using python-stratum instead of mpos (as a dropin), what is your USE_COINDAEMON_DIFF and DIFF_UPDATE_FREQUENCY set to?

i find when USE_COINDAEMON_DIFF is enabled, with DIFF_UPDATE_FREQUENCY set quite short - will result in symptoms you've mentioned. mining starts fine, but you can see while it checks for blocks, its like no work is being submitted.

jonnybravo0311

legendary

Activity: 1344

Merit: 1024

Mine at Jonny's Pool

I fully get what you're stating. I can't imagine running a process like a coin daemon without at least some rudimentary monitoring. I guess what I'm getting at is that the answer shouldn't just be "restart it when it acts up" and that's the end of it. Yes, that might be necessary, but it would be far more valuable to figure out why it was acting up in the first place so that the underlying problem can be addressed. It's kind of like just constantly replacing oil in your car when you notice it's low, instead of looking into it to find out your seals are shot or the oil pan is leaking like a sieve

.

psycodad

legendary

Activity: 1612

Merit: 1608

精神分析的爸

Quote from: jonnybravo0311 on December 03, 2015, 09:43:13 AM

Will do. I've been trying to avoid writing a "restart coin daemon" script. Yes, I know restarting it helps, and is something a number of people do... but it feels like we're treating the symptoms, rather than trying to find a cure. It also seems counterproductive for a pool to be constantly restarting the coin daemon as that would flush the mempool and render worker shares invalid on the restart... one of those shares might have solved a block.

I see your point, however I was not proposing a regular restart, I was talking about restarting the coindaemon in case it does not respond anymore (which is a good thing to monitor if you run a pool, even if you are sure it runs stable).
If it is unresponsive and you have a block-winning share it is useless/lost anyway. I agree though that the ideal setup should run stable on its own and not rely on being restarted regulary and I am certainly not suggesting to just restart the coind without good reason.
I check all my coinds every 10mins and automatically restart them IF they do hang, else they just log being ok to syslog.

HTH

jonnybravo0311

legendary

Activity: 1344

Merit: 1024

Mine at Jonny's Pool

Will do. I've been trying to avoid writing a "restart coin daemon" script. Yes, I know restarting it helps, and is something a number of people do... but it feels like we're treating the symptoms, rather than trying to find a cure. It also seems counterproductive for a pool to be constantly restarting the coin daemon as that would flush the mempool and render worker shares invalid on the restart... one of those shares might have solved a block.

psycodad

legendary

Activity: 1612

Merit: 1608

精神分析的爸

Hey johnnybravo0311,

Yw, in any case it shouldn't harm you to increase the rpcthreads. The getinfo not being responsive rang a bell with me, that was the moment when I wrote my coind watchdog-shell-script to automatically restart them if they do not respond to getinfo within 5s which reliably happened after a few hours of mining.

I am curious if it helps, please let us know the outcome.

Cheers

jonnybravo0311

legendary

Activity: 1344

Merit: 1024

Mine at Jonny's Pool

Hi psycodad,

I appreciate the response. I have not set the RPC threads previously, just relying upon the default value. I will give your suggestion a try to see if it helps mitigate the issues.

Thanks

psycodad

legendary

Activity: 1612

Merit: 1608

精神分析的爸

Hi,

I have no experience with bitcoind and this is just a blind shot into the dark, but the below helped me with a similar prob with different alts.
Try to add the following to your bitcoin.conf:

Code:

rpcthreads=XX

and make XX a number between 16 and 50. I use 20 and am a happy miner since then, YMMV.

HTH

jonnybravo0311

legendary

Activity: 1344

Merit: 1024

Mine at Jonny's Pool

Hey everyone,

I asked this in the Technical Support forums as well, but thought I'd try my luck here, too. Since a lot of alt coin pools run on the NOMP/MPOS combination, perhaps some of you pool operators have run into a similar issue, and have a solution. Below is the text from my post:

All,

Very recently I have run into an issue with my bitcoin daemon. I am wondering if it might have to do with the pool software I'm running on top of it, and will likely ask this same question if I can find a forum thread for the pool

. Here's what I notice:

1) I start up the bitcoin daemon process and all is happy.
2) After about 24 hours, the process becomes completely unresponsive. Typing:

Code:

bitcoin-cli getpeerinfo

Or any other command just hangs without returning.
3) This causes all kinds of havoc on my pool. Website runs exceptionally slow. I get bombarded with mail from the cron jobs saying they're already running. Things like that. When I look at my debug.log, all I see is it trying to create new blocks:

Code:

2015-12-03 12:26:06 CreateNewBlock(): total size 998979
2015-12-03 12:26:12 CreateNewBlock(): total size 998945
2015-12-03 12:26:18 CreateNewBlock(): total size 998993
2015-12-03 12:26:24 CreateNewBlock(): total size 998883
2015-12-03 12:26:30 CreateNewBlock(): total size 998978
2015-12-03 12:26:36 CreateNewBlock(): total size 998837
2015-12-03 12:26:42 CreateNewBlock(): total size 998842
2015-12-03 12:26:48 CreateNewBlock(): total size 998971
2015-12-03 12:26:54 CreateNewBlock(): total size 998913
2015-12-03 12:26:58 keypool reserve 3
2015-12-03 12:26:58 keypool return 3
2015-12-03 12:27:00 CreateNewBlock(): total size 998914

The fact that it's trying to create a new block every 6 seconds seems a bit odd, but looking at the log shows that is pretty consistently happening, even when things are running smoothly.

My only recourse is to kill the process and restart it. Doing so makes everything run smoothly once again. Once restarted, I can see typically "normal" things in the log:

Code:

015-12-03 13:00:45 CreateNewBlock(): total size 696619
2015-12-03 13:00:48 ERROR: AcceptToMemoryPool: nonstandard transaction: dust
2015-12-03 13:00:48 ERROR: AcceptToMemoryPool: free transaction rejected by rate limiter
2015-12-03 13:00:52 CreateNewBlock(): total size 701939
2015-12-03 13:00:53 ERROR: AcceptToMemoryPool: free transaction rejected by rate limiter
2015-12-03 13:00:56 ERROR: AcceptToMemoryPool: free transaction rejected by rate limiter
2015-12-03 13:00:58 CreateNewBlock(): total size 707323
2015-12-03 13:00:59 ERROR: AcceptToMemoryPool: free transaction rejected by rate limiter
2015-12-03 13:01:03 CreateNewBlock(): total size 719860

By the way, I'm running 0.11.2 compiled and built from source on Ubuntu 15.10. Pool is running MPOS/NOMP. Processes are running on a 4 core box with 8G RAM. Looking at CPU usage, bitcoind is about 150% or more when it is unresponsive. Typically it ranges anywhere from 30% to 80%.

I'm wondering if having a max block size of 1M is causing problems. If you look at the two log snippets I included above, when it was completely unresponsive, all it was doing was trying to create new blocks of max size (or very close to it). When things are running smoothly, the blocks its trying to create are considerably smaller.

Thanks in advance for any suggestions.

Topic: coin daemon becomes completely unresponsive over time NOMP/MPOS based pool (Read 651 times)