And now, I announce the first ever automated block checking and restart script for your masternodes.
#!/bin/bash
blockold=$(/home/dash/dash-cli getinfo | grep "blocks" | grep -Eo '[0-9]{6,100}')
while true
do
sleep 1000
blocknew=$(/home/dash/dash-cli getinfo | grep "blocks" | grep -Eo '[0-9]{6,100}')
if [ $blocknew -gt $blockold ]
then
blockold=$blocknew
else
tail -n 30 /home/dash/.dash/debug.log >> crash.log
/home/dash/dash-cli stop
sleep 30
pkill -9 dashd
sleep 10
/home/dash/dashd
sleep 60
blockold=$(/home/dash/dash-cli getinfo | grep "blocks" | grep -Eo '[0-9]{6,100}')
fi
done
This program grabs the current block. Then waits 16 minutes(longest block I found was about 13 minutes) and checks that block. If the new block is bigger it stores the new block as the old and waits another 16 minutes. If the old block is not bigger we assume dashd is locked up. It first logs the last 30 lines of the debug.log and tries to shutdown nicely. And if that doesn't work it kills the process. Then restarts and finds blocks again. This should also restart the dashd if it doesn't get any output(like if it isn't running).
So for those with more than one masternode on a server, you should change dashd to dashd1, dashd2, etc. Then use pkill -9 dashd1, dashd2, etc. If you just use dashd it will shut anything down with dashd in it. If there is any interest, I could make a script for multiple nodes.
Start from crontab or with screen. Go back a dozen pages for more info on those options.
As of last night, I found 122 recent blocks (since 300000) over 1000 seconds apart.
(ranging from 17 to 40 minutes)
(block number, seconds since last, pasted here):
https://www.zerobin.net/?4ac778770e4c182a#sBpr6QYqHJSVW8tgPgTegOKgsgoVYpYqQ5/9Tvy9U9s=Your 1000 second sleep isn't long enough. You'll end up restarting unnecessarily.
I'm sure there's better test criteria to use, but I don't have any
suggestions at this time.
Also, during shutoff 'dashd' renames its process to 'dash-shutoff' I'm not
sure if pkill will pick that up. killall doesn't. I do 'dash-cli stop ;
sleep 20 ; killall -9 dashd dash-shutoff' to allow the daemon time to
shutdown and to be sure I've killed the (probably hung-on-closing) dashd
(now named dash-shutoff)
HTH
BTW, I run several hundred instances of dashd, and haven't found a need to
force-restart due to hangs. Version 12.0.53 has proven very stable for me.
If you are having recurring hangs, check your hosting/build environment.
I use the gitian-built distributed files from
https://www.dashpay.io/downloads/I use DigitalOcean to host my masternodes
Thanks Moocowmoo. I figured there would be tweaks on this.
It makes sense to change the sleep to 2500 (about 41 minutes). Should be good enough for 99.9% of the blocks. It is still before the 70 minutes that you get kicked off the pay queue if you don't restart in time. Thanks for the info on the kill dash-shutoff.
I haven't had any problems with .53 either. But .51, .49, ... didn't do so well. Usually, they just drop off. When we have new releases, it is a very good idea to have something like this running. Actually, using block number makes determining if the dashd is actually running much easier and more robust. Instead of looking at this program like a crutch for the current version, think of it like a safety net so you can really throw out innovative and ground breaking features.
And just like the other restart scripts. You can start with screen so it runs when you logout.
to start with screen type:
screen -dm /blockcheck.sh
This will run forever, so if you want it to stop type
screen -ls and get the number replace that with the 11111 below.
screen -X -S 11111 kill
You could also email yourself if you did restart with this command by adding this near the last /home/dash/dashd line.
ssmtp
[email protected] < /home/dash/m.txt
Assumes you setup this script first.
https://dashtalk.org/threads/v12-release.5888/page-32#post-64582