Author

Topic: [Avalon] How to automate restarting of Avalon/cgminer when it stops mining? (Read 5316 times)

sr. member
Activity: 266
Merit: 250
Hi all, it seems that some problem with cgminer connect timeout exist if primary pool port is in filtered state. It just fails to start and hangs on timeout connection to first pool. I noticed than a number of times, but i can't set cgminer connect timeout or install tools like nmap to check port because image space is already used for other things.

So i coded this permanent temporarily solution to switch the first pool until there a possibility to set connect timeout in cgminer options. Insert this in /etc/init.d/cgminer after checks of user input of pool strings.

Please notice that you will need other external host with http + php where you should put simple-portscan.php.
Maybe it's not the best solution, but it's working for me.

Quote
# this is to check the checker host
scan_host="freepc";
scan_host_test=`ping -w 1 -c 1 ${scan_host} | grep "64 bytes from ${scan_host}"`;

# set first pool to working one
if [ -n "$scan_host_test" ]
then
        CHECK_POOL1=`wget "http://${scan_host}/simple-portscan.php?url=${_pool1url}" -q -O - | grep open`;
        CHECK_POOL2=`wget "http://${scan_host}/simple-portscan.php?url=${_pool2url}" -q -O - | grep open`;
        CHECK_POOL3=`wget "http://${scan_host}/simple-portscan.php?url=${_pool3url}" -q -O - | grep open`;

        if [ -z "$CHECK_POOL1" ] && [ -n "$CHECK_POOL2" ]
        then
                POOL1=$POOL2;
                POOL2="";
        else
                if [ -z "$CHECK_POOL1" ] && [ -n "$CHECK_POOL3" ]
                then
                        POOL1=$POOL3;
                        POOL3="";
                fi
        fi
fi

echo "USING FIRST POOL: $POOL1";

simple-portscan.php
Quote
$url_string = $_GET["url"];
$url_arr = split(":",$url_string);
$host = $url_arr[0];
$port = $url_arr[1];
$timeout = 1;

$fp = fsockopen($host,$port,$errno,$errstr,$timeout);
if($fp)
{
echo "port " . $port . " open on " . $host . "\n";
fclose($fp);
}
else
{
echo "port " . $port . " closed on " . $host . "\n";
}
flush();
?>
sr. member
Activity: 332
Merit: 250
Fixed it by doing flash to 3-21, restore backup, then flash to 3-25 keep settings.

also fixed the wwan not connecting issue:
- thanks go out to "senseless" and "\\\" from #avalon for this fix:

Ip for wifi CANNOT be set on 192.168.0.xxx it will conflict with the avalon 192.168.0.100 internal ip.  Your subnet must be 10.x.x.x or can change like I did to 192.168.1.x

Once I made that change all the other settings could stay default and could connect over local wifi/wwan
legendary
Activity: 1764
Merit: 1002
Leaving it plugged in does not work... well I haven't tried that actually, maybe I need to use a dummy plug   Wink

No this one unit is my problem child for sure.  Now it is getting average 50 GH/s with long idle times of 1 to 3 minutes every hour or so, so it is successfully restarting itself but cgminer is stalling quite often which reduces the average hash rate.  Not sure what the cause is.

I have not opened this case yet, could this problem be caused by a faulty tp-link or by usb hub problems? 

i would definitely check all internal connections.  a number of ppl have reported loose or disconnected cords from the shipping.
legendary
Activity: 3080
Merit: 1080
Check the firewall rules - yes the router has iptables rules set. It may be programmed to ignore ssh/telnet connection on the wireless interface.

sr. member
Activity: 332
Merit: 250
It just happened on a different unit as well.  It stalls out, but as soon as you log in to it via a direct connection it hashes again.

BTW I still can't seem to log in over WWAN have to plug in direct with ethernet cable.
sr. member
Activity: 332
Merit: 250
Leaving it plugged in does not work... well I haven't tried that actually, maybe I need to use a dummy plug   Wink

No this one unit is my problem child for sure.  Now it is getting average 50 GH/s with long idle times of 1 to 3 minutes every hour or so, so it is successfully restarting itself but cgminer is stalling quite often which reduces the average hash rate.  Not sure what the cause is.

I have not opened this case yet, could this problem be caused by a faulty tp-link or by usb hub problems? 
hero member
Activity: 607
Merit: 500
i have it ethernet connected.
after some days i can say the latest firmware 3.25 is working great. I see sometimes cgminer that is restarting
or the machine is booting, i can't tell, but cgminer is never idle. Also the last time the shares are counted correctly thus
utility is back at 990 Wink
legendary
Activity: 1764
Merit: 1002
Still having trouble with 3.25 and the cgminer-monitor script fix both installed.  It happens about once per day with the more troubled units (some have been rock solid since day 1, others seem more trouble-prone.

After running fine for several hours, unit stops mining and goes idle.  When I plug in the LAN cable to check the miner status, it re-starts cgminer and gets right back to it like nothing happened.

So something about the lan cable being plugged into the ethernet port on the tp-link "wakes up" the system or perhaps it causes some kinda of process to run.  What to report this as a bug not sure if anyone else has had this problem.

leave it plugged in?  Grin
sr. member
Activity: 332
Merit: 250
Still having trouble with 3.25 and the cgminer-monitor script fix both installed.  It happens about once per day with the more troubled units (some have been rock solid since day 1, others seem more trouble-prone.

After running fine for several hours, unit stops mining and goes idle.  When I plug in the LAN cable to check the miner status, it re-starts cgminer and gets right back to it like nothing happened.

So something about the lan cable being plugged into the ethernet port on the tp-link "wakes up" the system or perhaps it causes some kinda of process to run.  What to report this as a bug not sure if anyone else has had this problem.
legendary
Activity: 1764
Merit: 1002
Thanks for the script mills!  Worked like a charm for my perpetually stalling avalon unit.  I can confirm that this is not included in 3.25 firmware, because I updated to 3.25 and that did not fix it, but overwriting the cgminer in u psr/bin did

Fantastic
sr. member
Activity: 332
Merit: 250
Thanks for the script mills!  Worked like a charm for my perpetually stalling avalon unit.  I can confirm that this is not included in 3.25 firmware, because I updated to 3.25 and that did not fix it, but overwriting the cgminer in usr/bin did
sr. member
Activity: 266
Merit: 250
I was using the following line just to get number of shares.
Code:
B=`cgminer-api | grep "\[Accepted\]" | cut -f2 -d">" | cut -f2 -d" "`;
(if you replace cut -f2 -d" " to sed "s/ //g" it would do essentially the same.)

But the biggest problem begins when pool going offline in a strange ways - for example,
nmap shows that pool connection port is not closed or opened, but filtered.
So the cgminer will stuck for a very long time on startup trying to test this pool, and
i can't configure pool test timeout values in cgminer options. Seems that the only option
is to build my own image with better timeout values or add an option to cgminer.
sr. member
Activity: 388
Merit: 250
you should check the stat of the cgminer
you will see a restart of cgminer each time the crom job runs
full member
Activity: 155
Merit: 100
Quasi fixed the issue with the miner quitting. The cgminer-monitor script has an error in it which writes out "   [ACCEPTED] => X" in the file it's comparing against "[ACCEPTED] => X". These extra spaces caused the files to not match which causes the script to think that cgminer is still mining correctly. This script below removes all spaces from the files when they are created and makes the checking accurate. Replace the contents of /usr/bin/cgminer-monitor with the script below and the cron job should once again be able to properly reset cgminer when it stops mining.



#!/bin/sh
# This file is for cron job

C=`pidof cgminer | wc -w`
if [ "$C" != "1" ]; then
   /etc/init.d/cgminer stop
   /etc/init.d/cgminer start
   exit 0;
fi

A=`cat /tmp/cm.log | sed "s/ //g"`
B=`cgminer-api  | grep "^   \[Accepted\]" | sed "s/ //g"`
echo $B > /tmp/cm.log
if [ "$A" == "$B" ]; then
   /etc/init.d/cgminer stop
   /etc/init.d/cgminer start
   exit 0;
fi


This is a good catch.  I've changed mine like this as well and will see if this does the trick.  Thanks!
this will restart cgminer each time the crom job runs

Not true. It's working as intended and has actually saved me twice today on one of my Avalons. First part of the script stops and starts cgminer if it cant detect a pid for it. The second part compares accepted shares from five minutes ago (if thats where your cron is scheduled) to current. If it's different, it's assumed that everything is working. If it's the same, it's assumed the miner has stalled but not quit. A restart is then initiated. The only difference between my script and the one already in there are the regex sed commands to remove spaces from the files it's echoing out and comparing so there is no false negative.
+1
newbie
Activity: 30
Merit: 0
Quasi fixed the issue with the miner quitting. The cgminer-monitor script has an error in it which writes out "   [ACCEPTED] => X" in the file it's comparing against "[ACCEPTED] => X". These extra spaces caused the files to not match which causes the script to think that cgminer is still mining correctly. This script below removes all spaces from the files when they are created and makes the checking accurate. Replace the contents of /usr/bin/cgminer-monitor with the script below and the cron job should once again be able to properly reset cgminer when it stops mining.



#!/bin/sh
# This file is for cron job

C=`pidof cgminer | wc -w`
if [ "$C" != "1" ]; then
   /etc/init.d/cgminer stop
   /etc/init.d/cgminer start
   exit 0;
fi

A=`cat /tmp/cm.log | sed "s/ //g"`
B=`cgminer-api  | grep "^   \[Accepted\]" | sed "s/ //g"`
echo $B > /tmp/cm.log
if [ "$A" == "$B" ]; then
   /etc/init.d/cgminer stop
   /etc/init.d/cgminer start
   exit 0;
fi


This is a good catch.  I've changed mine like this as well and will see if this does the trick.  Thanks!
this will restart cgminer each time the crom job runs

Not true. It's working as intended and has actually saved me twice today on one of my Avalons. First part of the script stops and starts cgminer if it cant detect a pid for it. The second part compares accepted shares from five minutes ago (if thats where your cron is scheduled) to current. If it's different, it's assumed that everything is working. If it's the same, it's assumed the miner has stalled but not quit. A restart is then initiated. The only difference between my script and the one already in there are the regex sed commands to remove spaces from the files it's echoing out and comparing so there is no false negative.
sr. member
Activity: 388
Merit: 250
Quasi fixed the issue with the miner quitting. The cgminer-monitor script has an error in it which writes out "   [ACCEPTED] => X" in the file it's comparing against "[ACCEPTED] => X". These extra spaces caused the files to not match which causes the script to think that cgminer is still mining correctly. This script below removes all spaces from the files when they are created and makes the checking accurate. Replace the contents of /usr/bin/cgminer-monitor with the script below and the cron job should once again be able to properly reset cgminer when it stops mining.



#!/bin/sh
# This file is for cron job

C=`pidof cgminer | wc -w`
if [ "$C" != "1" ]; then
   /etc/init.d/cgminer stop
   /etc/init.d/cgminer start
   exit 0;
fi

A=`cat /tmp/cm.log | sed "s/ //g"`
B=`cgminer-api  | grep "^   \[Accepted\]" | sed "s/ //g"`
echo $B > /tmp/cm.log
if [ "$A" == "$B" ]; then
   /etc/init.d/cgminer stop
   /etc/init.d/cgminer start
   exit 0;
fi


This is a good catch.  I've changed mine like this as well and will see if this does the trick.  Thanks!
this will restart cgminer each time the crom job runs
legendary
Activity: 3080
Merit: 1080
Hmm, so I guess that is indeed the fix. The latest testing firmware includes this fix:

http://downloads.qi-hardware.com/people/xiangfu/avalon/next-testing/

I think I shall wait until it's officially released out of the testing phase before updating. For now I've noticed no restarts.

The latest testing firmware does not include this fix.

"Update cgminer-monitor, fix [Accept] give null at the first few seconds of cgminer start
Fix a typo on /usr/bin/cgminer-monitor, which make it cannot restart cgminer when no Accept"

I understood that mention to mean that they fixed it, but to be honest I did not look at the code in the "NEXT" firmware.

There is a mention on how to fix the cgminer-monitor script for those that updated to 2013/03/21

Code:
sed -i 's/ $B / "$B" /' /usr/bin/cgminer-monitor

hero member
Activity: 607
Merit: 500
how about wrong shares' statistics in 'Cgminer Status' window and the silly utility of 15 Huh
also the need to put password whenever you change a tab is anoying Smiley
legendary
Activity: 1890
Merit: 1003
I am running the test firmware (3/25/2013 Next) as well. Nothing seems amiss at the moment.
full member
Activity: 155
Merit: 100
Hmm, so I guess that is indeed the fix. The latest testing firmware includes this fix:

http://downloads.qi-hardware.com/people/xiangfu/avalon/next-testing/

I think I shall wait until it's officially released out of the testing phase before updating. For now I've noticed no restarts.

The latest testing firmware does not include this fix.
full member
Activity: 155
Merit: 100
Quasi fixed the issue with the miner quitting. The cgminer-monitor script has an error in it which writes out "   [ACCEPTED] => X" in the file it's comparing against "[ACCEPTED] => X". These extra spaces caused the files to not match which causes the script to think that cgminer is still mining correctly. This script below removes all spaces from the files when they are created and makes the checking accurate. Replace the contents of /usr/bin/cgminer-monitor with the script below and the cron job should once again be able to properly reset cgminer when it stops mining.



#!/bin/sh
# This file is for cron job

C=`pidof cgminer | wc -w`
if [ "$C" != "1" ]; then
   /etc/init.d/cgminer stop
   /etc/init.d/cgminer start
   exit 0;
fi

A=`cat /tmp/cm.log | sed "s/ //g"`
B=`cgminer-api  | grep "^   \[Accepted\]" | sed "s/ //g"`
echo $B > /tmp/cm.log
if [ "$A" == "$B" ]; then
   /etc/init.d/cgminer stop
   /etc/init.d/cgminer start
   exit 0;
fi


This is a good catch.  I've changed mine like this as well and will see if this does the trick.  Thanks!
legendary
Activity: 3080
Merit: 1080
Hmm, so I guess that is indeed the fix. The latest testing firmware includes this fix:

http://downloads.qi-hardware.com/people/xiangfu/avalon/next-testing/

I think I shall wait until it's officially released out of the testing phase before updating. For now I've noticed no restarts.
legendary
Activity: 3080
Merit: 1080
Ok, I'd like to hear from BitSyncom as to which is the proper content to have in the cron job. At the moment I have this:

Code:
#!/bin/sh
# This file is for cron job

C=`pidof cgminer | wc -w`
if [ "$C" != "1" ]; then
        /etc/init.d/cgminer stop
        /etc/init.d/cgminer start
        exit 0;
fi

A=`cat /tmp/cm.log`
B=`cgminer-api  | grep "^   \[Accepted\]"`
echo $B > /tmp/cm.log
if [ "$A" == "$B" ]; then
        /etc/init.d/cgminer stop
        /etc/init.d/cgminer start
        exit 0;
fi


Is the code Mills00013 posted what we should have?
newbie
Activity: 30
Merit: 0
Quasi fixed the issue with the miner quitting. The cgminer-monitor script has an error in it which writes out "   [ACCEPTED] => X" in the file it's comparing against "[ACCEPTED] => X". These extra spaces caused the files to not match which causes the script to think that cgminer is still mining correctly. This script below removes all spaces from the files when they are created and makes the checking accurate. Replace the contents of /usr/bin/cgminer-monitor with the script below and the cron job should once again be able to properly reset cgminer when it stops mining.



#!/bin/sh
# This file is for cron job

C=`pidof cgminer | wc -w`
if [ "$C" != "1" ]; then
   /etc/init.d/cgminer stop
   /etc/init.d/cgminer start
   exit 0;
fi

A=`cat /tmp/cm.log | sed "s/ //g"`
B=`cgminer-api  | grep "^   \[Accepted\]" | sed "s/ //g"`
echo $B > /tmp/cm.log
if [ "$A" == "$B" ]; then
   /etc/init.d/cgminer stop
   /etc/init.d/cgminer start
   exit 0;
fi
legendary
Activity: 1610
Merit: 1000
i had the same setting/firmware as you , at 1day 23h it stoped hassing
stop start cgminer fixed it for about 3h-5h
after reset it seems to go on until it stops again
it is like no new work is subited to the worker   The Alive tab has 0mhs
i have random results about the hassing stop 
at 6h
at 4h
at about 3h
after 13h

it seems that maybe the monitor crom job is going in a loop , the only thing in the log si about the monitor job
Thorvald

cron job is not looping power off/on is needed sometimes to recover
PS Latest FW:    1day 12h 42m 04s up time so far
sr. member
Activity: 388
Merit: 250
i had the same setting/firmware as you , at 1day 23h it stoped hassing
stop start cgminer fixed it for about 3h-5h
after reset it seems to go on until it stops again
it is like no new work is subited to the worker   The Alive tab has 0mhs
i have random results about the hassing stop 
at 6h
at 4h
at about 3h
after 13h

it seems that maybe the monitor crom job is going in a loop , the only thing in the log si about the monitor job
Thorvald
legendary
Activity: 1176
Merit: 1001
Has anyone experienced some hard crash that required a manual machine reboot to fix? Or respawing cgminer is enough?
legendary
Activity: 3080
Merit: 1080
How often does this happen? I too am using the latest firmware but for me it's stable. 1 day and 13 hours uptime.
With the latest firmware, it only happened once around 23 hour mark, but when it does happen and can be costly.

Ok, that is odd. As far as I am aware the latest firmware is supposed to fix this issue. Perhaps there is something peculiar to your unit alone that causes the bug to still manifest itself.

Also did you make sure you're really running the latest firmware - 3/21/2013.

Ssh into the box and go into /etc
and then run: "cat avalon_version"
it should read:

20130321
cgminer-7c1428a
luci-46afd4a
openwrt-package-10ee304

Mine is still going, 1 day 18 hrs 22 min

I don't think this should make any difference but I have mine set to Failover mode. It will mine on btcguild and if that fails it will switch over to a backup pool. BTCguild with vardiff setting at 32 (I debated increasing it to 64 but I don't think it will make that huge of a difference).

sr. member
Activity: 388
Merit: 250
hello you have lower the content of the monitor from version latest 321
this wil not fix the not hassing issue

#!/bin/sh
# This file is for cron job

C=`pidof cgminer | wc -w`
if [ "$C" != "1" ]; then
   /etc/init.d/cgminer stop
   /etc/init.d/cgminer start
   exit 0;
fi

A=`cat /tmp/cm.log`
B=`cgminer-api  | grep "^   \[Accepted\]"`
echo $B > /tmp/cm.log
if [ "$A" == "$B" ]; then
   /etc/init.d/cgminer stop
   /etc/init.d/cgminer start
   exit 0;
fi
legendary
Activity: 1610
Merit: 1000
Dude,

This is my way and it works 100%
https://bitcointalksearch.org/topic/m.1603942
However there is a chance with latest Avalon FW problem to be fixed - no restarts since upgrade 1 day and 10 hours.
But we can now for sure if when up time reaches at least a week
hero member
Activity: 607
Merit: 500
is the same for me stopped after 4h
do you have the network pool connections to failsafe or balanced ?
i got it to "failover". in "balance", cgminer used all 3 pools at the same time! (i wonder if this was a bug or pool's responsibility at the time)

edit: forget it, this is how balance is working (i am sooooo newbie :p )
do you think that balance or load balance is better for avalon even if i choose a zero fee pps first pool and the rest has 2% fees?
is the first pool's lagging worst that the fees of other 2 pools?!
legendary
Activity: 1890
Merit: 1003
From time to time, I see that it stops hashing.  I would like to automate detection and restart and was wondering if anyone had any pointer.
Thanks in advance!

Update to latest firmware that was released recently, which fixes some "stuck" issue and will continue to automatically restart.

https://en.bitcoin.it/wiki/Avalon#20130321

or you can try the NEXT firmware which is in testing.
Is there a change log for the NEXT firmware?

By the way, it is running very well (3/21/2013).

Edit: My only complaint (not a serious one either) is that the web interface tends to timeout quite often if you are refreshing it frequently.
sr. member
Activity: 388
Merit: 250
is the same for me stopped after 4h
do you have the network pool connections to failsafe or balanced ?
hero member
Activity: 607
Merit: 500
the same for me also after 21-22 hours. cgminer just stopped with fans to go full and to low periodically, that is how
i noticing it. then it needs a reboot  Smiley
full member
Activity: 155
Merit: 100
How often does this happen? I too am using the latest firmware but for me it's stable. 1 day and 13 hours uptime.
With the latest firmware, it only happened once around 23 hour mark, but when it does happen and can be costly.
legendary
Activity: 3080
Merit: 1080
How often does this happen? I too am using the latest firmware but for me it's stable. 1 day and 13 hours uptime.
full member
Activity: 155
Merit: 100
Thanks for the reply, BitSyncom. I am using 20130321 firmware, and, yes, I noticed the system restart (fan noise), but it still stayed idle.  I will definitely give the next firmware update when it comes out.
sr. member
Activity: 336
Merit: 251
Avalon ASIC Team
From time to time, I see that it stops hashing.  I would like to automate detection and restart and was wondering if anyone had any pointer.
Thanks in advance!

Update to latest firmware that was released recently, which fixes some "stuck" issue and will continue to automatically restart.

https://en.bitcoin.it/wiki/Avalon#20130321

or you can try the NEXT firmware which is in testing.
full member
Activity: 155
Merit: 100
From time to time, I see that it stops hashing.  I would like to automate detection and restart and was wondering if anyone had any pointer.
Thanks in advance!
Jump to: