I just got my rig setup a few days ago, and the one thing that really bugged me, especially when I was playing around with different settings, is that if a card was unstable and crashed, the only way to fix it was to coldreboot.
It's possible that you could have a card that might only crash once every few days, but at that point you have to issue a coldreboot. What if you're away from computer? Think of all those precious coins you could be losing!
Anyway, this script will take care of that for you. I'm using LTCrabbit's customized SMOS-Linux, but it should work for any other similar distro.
First we'll need to make a script. You can use nano or vim or whatever you prefer, I'll write the tutorial using nano since if you're a Linux newbie it's probably the easiest way to go. Fire up a root terminal, then
nano /root/autoRebooter.sh
Paste the following contents into that file (make sure to edit your targetMinTemp accordingly!!!):
#!/bin/bash
#Set your targeted minimum temp here, system will issue a cold
#reboot if a card temp falls below this number
targetMinTemp=50
i=0
(/opt/bamt/viewgpu | awk '{ print $2; }' | cut -c -2 > /tmp/viewgpu) & pid=$!
echo $pid
(sleep 10 && kill $pid)
sleep 15
array=(`cat /tmp/viewgpu`)
if [ ${#array[@]} -eq 0 ]; then
echo "`date +%m-%d-%Y` `uptime | awk -F, '{sub(".*ge ",x,$1);print $1}'` viewgpu command failed to run, rebooting" >> /home/$(grep '1000' /etc/passwd | cut -d ':' -f 1)/autoRebooter.log
/sbin/coldreboot &
sleep 30
echo s > /proc/sysrq-trigger
sleep 10
echo b > /proc/sysrq-trigger
fi
for temp in ${array[@]}; do
if [ $temp -lt $targetMinTemp ]; then
echo "`date +%m-%d-%Y` `uptime | awk -F, '{sub(".*ge ",x,$1);print $1}'` card number $i has stopped, its current temp is $temp, coldrebooting" >> /home/$(grep '1000' /etc/passwd | cut -d ':' -f 1)/autoRebooter.log
/sbin/coldreboot &
sleep 30
echo s > /proc/sysrq-trigger
sleep 10
echo b > /proc/sysrq-trigger
fi
i=$(($i+1))
done
Use ctrl+o to write the file out, then ctrl+x to exit nano.
Next you'll need to make the script executable
chmod a+x /root/autoRebooter.sh
Lastly, we'll need to add a cronjob to periodically check in. I set it to run every hour.
crontab -e
Add the following line to the end of crontab
0 */1 * * * /root/autoRebooter.sh
ctrl+o to write it out, ctrl+x to save it.
There you go, now you never have to worry about a crashed GPU bringing down your hashrate ever again!
If you found this helpful, I'm currently at a whopping 2.3 LTC and 0.01 BTC would love to have a few fractions more!
LTC: Lhb3yJGPL9dsUZ2tt5KrbNMm3pVmmA1fkbBTC: 1NKkGEsY5UwkzSmD63yBcJj9hkrS4YWsbXedit: I've made improvements incase viewgpu gets stuck or coldreboot fails, tested and verified to work!