OK, I have this remote rig that is inside a self made metal box and have 12 fans pushing fresh air inside.
The fans are powered by an external PSU that I can not monitor.
My concern was that the external psu would fail and the system would burn so I have a raspberry pi "attached" to the rig
that can take the temperature inside the box and hard reset/hard switch on/off the rig.
I have written a script to accomplish this task that I would like to share.
In case the temp threshold is reached the script power off the rig and send a telegram alert, then wait for 2 minutes and check if the ring is responding to ping. If so sends one last telegram (as there should'n be a answer to the ping) and keeps loggin the every two minutes ping activity to the rig.
There are enough explanatory notes within the code that I hope will be useful.
Here some pictures:
#!/bin/bash
# by: kk003
# Rig name: AyM
# What's this about?:
# This script is for rigs that have a raspberry pi (type 3 in my case) "attached" whith:
# 1. Temp sensor type DS18B20
# 2. Ability to hard reset the rig from rasberry pi
# 3. Ability to hard poweroff/poweron the rig from rasberry pi
# 4. Rasberry pi running ubuntu-mate-16.04.2-desktop-armhf-raspberry-pi
# Hardware notes:
# Main the relay SRD-05VDC-SL-C beheivor is different than others
# Read "The reset bit" and "The poweroff bit" down the code for more info
# Helpfull documentation:
# http://bernaerts.dyndns.org/linux/75-debian/351-debian-send-telegram-notification
# https://www.ibm.com/developerworks/community/blogs/aixpert/entry/Computer_Room_Temperature_Monitoring_with_a_Raspberry_Pi?lang=en
# https://bitcointalk.org/index.php?topic=1854250.msg22615430#msg22615430
# Name script: check_health_system.sh
# Poweroff the rig if room temperature too high
# If room temperature threshold is reached the script hard reset and hard poweroff the rig. Then it gets in to a loop state
# which keeps it running "forever" until you manualy kill it (kill -9 number_id_check_health_system.sh)
# You can get the id of this running script like this: ps aux | grep -v grep | grep check_health_system.sh
# You will see two lines with the script name. Kill both using the numbres in the second column once you resolve the temp problem
# Otherwise if (if not in loop) temperature does not drop at next check the rig will be turn on (keep in main that power switch meets both functions, on and off)
# Installation:
# Login as root or su -l from your user
# Save the script in root dir and give execution rights like this
# chmod 700 /root/check_health_system.sh
# Run the script from cron every 2 minutes (or whatever you want) like this
# */2 * * * * /root/check_health_system.sh >> /root/temp_ambiente.log
# log file's name: /root/temp_ambiente.log
# Some vars
RIG_NAME="AyM" # Your rig's name
IP_RIG="192.168.0.243" # The ip of the rig "attached" to your raspberry pi goes here and must be possible to ping it from the raspi
TEMP_THRESHOLD=47 # MUST be a INTEGER and is the max room temperature you permit. DON'T set it too low or the rig wil be powered off
THRESHOLD_REACHED=0 # 0=threshold NOT reached, 1=threshold reached ALERT!!!!. WARNING: If you set this here to 1 your rig will be turn off
LF=$'\n' # New line
NOMBRES_DIR="/root/nombres"
# Telegram vars
CHATID="Your telegram chat id here"
APIKEY="Your telegram api key here"
TELEGRAM_TEMP_LOG_FILE="/root/telegram_temp_warning.log"
START_TIME=`date "+%Y-%m-%d %H:%M:%S"`
echo "Start time: " $START_TIME
# Me aseguro de que el script no se esta ejecutando. Solo quiero una instancia del script a la vez.
# Si ya se esta ejecutando salgo sin mas
if pidof -x $(basename $0); then
for p in $(pidof -x $(basename $0)); do
if [ $p -ne $$ ]; then
echo "$0 Already running. Exiting..."
echo
exit
fi
done
fi
# Clear the file's content if exists and is not empty
if [[ -s $TELEGRAM_TEMP_LOG_FILE ]]; then
> $TELEGRAM_TEMP_LOG_FILE
fi
# Get the raspi's hostname
SYSTEM_PI=$(hostname)
ls /sys/bus/w1/devices/ | grep 28* > $NOMBRES_DIR
N=$(cat $NOMBRES_DIR | wc -l)
echo "Number of sensors : " $N | tee -a $TELEGRAM_TEMP_LOG_FILE
for ((LINEAS=1; LINEAS <= $N ; LINEAS=LINEAS+1))
do
NOMBRE_FILE=`sed -n -e "${LINEAS}p" $NOMBRES_DIR`
STRING_TEMP=$(cat /sys/bus/w1/devices/$NOMBRE_FILE/w1_slave | grep t= | cut -d"=" -f2)
TEMP=$(echo "scale=2; $STRING_TEMP/1000" | bc)
TEMP_INTEGER=$(echo $TEMP | cut -d. -f1)
echo -n "Sensor$LINEAS : $NOMBRE_FILE - Temp " | tee -a $TELEGRAM_TEMP_LOG_FILE
if [[ $TEMP_INTEGER -ge $TEMP_THRESHOLD ]]; then
echo -n $TEMP | tee -a $TELEGRAM_TEMP_LOG_FILE
echo -n "ºC" | tee -a $TELEGRAM_TEMP_LOG_FILE
echo -n " ----> threshold : $TEMP_THRESHOLD" | tee -a $TELEGRAM_TEMP_LOG_FILE
echo "ºC" | tee -a $TELEGRAM_TEMP_LOG_FILE
THRESHOLD_REACHED=1 # 1= reached temp limit ALERT!!!!!!
else
echo -n $TEMP | tee -a $TELEGRAM_TEMP_LOG_FILE
echo "ºC" | tee -a $TELEGRAM_TEMP_LOG_FILE
fi
done
if [[ $THRESHOLD_REACHED -eq 1 ]];then # 1=reached threshold temp, 0=we are under threshold limit
# Send a telegram warning first
PUBLIC_IP=$(curl -s checkip.dyndns.org | sed -e 's/.*Current IP Address: //' -e 's/<.*$//')
WARNING="WARNING from $SYSTEM_PI $LF Warning: room temperature limit reached ($TEMP_THRESHOLD ºC) $LF Ip Raspi: $PUBLIC_IP $LF Attempting to turn off the rig.$LF"
CONTENT=$(cat $TELEGRAM_TEMP_LOG_FILE)
MSG="$WARNING$CONTENT"
curl -s -X POST --output /dev/null https://api.telegram.org/bot${APIKEY}/sendMessage -d "text=${MSG}" -d chat_id=${CHATID}
CODE_CURL=$?
# Check if the mgs went out ok
if [[ $CODE_CURL -eq 0 ]]; then
echo "Msg send ok!!"
else
echo "Got a non zero exit code from curl : $CODE_CURL . The message probably has not reached its destination"
fi
# I do a hard reset first and then a hard poweroff directly. Don't risk to try a soft poweroff or whatever via software
# Reset the rig using relay's channel 2
# It does not work the same with all relays
# With SONGLE SRD-05VDC-SL-C setting "0" in gpio shorts the reset and "1" frees the circuit
# So you must check this before you put it into production
#
# The reset bit
echo "Reseting the rig..."
GPIO_RESET=23
gpio -g mode $GPIO_RESET out
sleep 1
# Reset the rig
gpio -g write $GPIO_RESET 0
sleep 1 # Keep the short circuit for 1 seg
gpio -g write $GPIO_RESET 1
sleep 1
# Poweroff the rig using relay's channel 1
# It does not work the same with all relays
# With SONGLE SRD-05VDC-SL-C setting "0" in gpio shorts the poweroff and "1" frees the circuit.
# So you must check this before you put it into production
#
# The poweroff bit
echo "Powering off the rig..."
GPIO_POWER=24
gpio -g mode $GPIO_POWER out
sleep 2
# Poweroff the rig according to the previous state (assume it is powered on)
gpio -g write $GPIO_POWER 0
sleep 1 # Keep the short circuit for 1 seg
gpio -g write $GPIO_POWER 1
# I keep the script in a infinite loop at this point because if the temp does not drop
# the poweroff bit will startup the rig again, so this script will need a manual kill
# and cron will automatically run it again
#
sleep 120 # I stop here for 2 minutes before enter the loop
TELEGRAM_PING_WARNING=0 #0=I did not send a telegram warning notifying the rig is responding to ping (it should't), 1=The warning has been sent
while true # It's allways true!!!
do
# For testing I do a ping just to make sure the rig is offline and I can see it in log file later
ping -n -c 3 -i 2 -W 3 $IP_RIG
if [[ $? -eq 0 ]]; then
echo "WARNING: Rig $RIG_NAME is responding to ping. It should't. The rig's state should be poweroff"
if [[ $TELEGRAM_PING_WARNING -eq 0 ]]; then # I telegram the warning only once
WARNING="WARNING from $SYSTEM_PI $LF WARNING: room temperature limit reached ($TEMP_THRESHOLD ºC). $LF The rig responds to ping and should not. $LF The rig's state should be poweroff. $LF This is the last warning."
MSG="$WARNING"
curl -s -X POST --output /dev/null https://api.telegram.org/bot${APIKEY}/sendMessage -d "text=${MSG}" -d chat_id=${CHATID}
CODE_CURL=$?
# Check if the mgs went out ok
if [[ $CODE_CURL -eq 0 ]]; then
echo "Msg send ok!!"
TELEGRAM_PING_WARNING=1
else
echo "Got a non zero exit code from curl : $CODE_CURL . The message probably has not reached its destination"
fi
fi
else
echo "OK: rig $RIG_NAME is NOT responding to ping. Its current state should be poweroff"
fi
CURRENT_TIME=`date "+%Y-%m-%d %H:%M:%S"`
echo "Current time: " $CURRENT_TIME
echo "*"
sleep 120 # 2 minutes delay until the loop starts again
done
else
echo -n "Ok, temp is under threshold limit ($TEMP_THRESHOLD"
echo "ºC)"
fi
END_TIME=`date "+%Y-%m-%d %H:%M:%S"`
echo "End time: " $END_TIME
echo "****"
echo