Pages:
Author

Topic: How to reboot a rig if it stops mining? (Read 11917 times)

member
Activity: 135
Merit: 10
March 13, 2018, 10:58:52 PM
#36
Wouldnt it just be easier if the nicehash miner executed a command on GPU lost. It know's it lost it when it shutsdown and restarts the excavator. Just look and see that you have settings for X GPU's and only X-? are currently available. HMM. execute the users shell command.  which could be a attempt to recover GPU then reboot or just reboot the system
newbie
Activity: 210
Merit: 0
March 13, 2018, 03:30:00 PM
#35
I'm using modded RX480 8GB with Samsung memory and out of 24 cards, 2 on different rigs make problems. Yes, one option is to revert to stock BIOS, but I'm looking for another solution since that happens maybe once per day. Win 10 Pro.

Claymore's Dual Eth+Dcr miner shows 28.4MH/s for all 5 cards in the rig, and randomly, as I said, once per day, one card in the setup starts to drop to 21MH/s and eventually after few minutes 0MH/s. Then miner's watchdog tries to restart mining which goes well, but is stuck on the list of GPU's and doesn't start mining. The solution is to manually close the miner and just start it again.

EDIT: To make it clear, the rig doesn't freeze. Only the miner stops mining. So I can manually restart the miner WITHOUT rebooting the rig, but that was one option to restart the miner as well.

Now I'm searching for a solution which would reboot the whole rig (or just kill the miner process and start it again) if the GPU temperature falls below say 60°C ... something opposite to the overheating protection Smiley

Any ideas?

Might not be ideal solution for you, but my rigs are plugged into wall via wifi socket. If I`m not at home, I check once an hour power consumption and if its below of what its normally consuming, I know its not working as it should. I then press OFF on my phone`s app, wait 20 seconds, then ON. I`m using SMOS, which mines automatically once you apply power to your rig, so there is nothing else I need to do.

Example:
https://www.tp-link.com/us/products/details/cat-5516_HS110.html
newbie
Activity: 25
Merit: 0
March 13, 2018, 02:42:52 PM
#34
If mining rig crash, you can use wifi plug to force power cycle.

like these products

Wifi Plug
full member
Activity: 238
Merit: 100
Borderless for the People, Frictionless for the Ba
March 13, 2018, 11:07:46 AM
#33
If your miner has not been mining for more than an hour, the pool will usually email you. You can also apply for SMS.
jr. member
Activity: 50
Merit: 3
Searchin` perfection!
March 13, 2018, 09:27:36 AM
#32
If you need a tool to reset the whole rig - like pushing the power button. You can use my tool - https://bitcointalksearch.org/topic/diy-auto-hard-reset-mining-rigs-with-raspberry-pi-1933467 .It's free and you need only a Raspberry Pi and relay + some wires.
jr. member
Activity: 64
Merit: 4
March 13, 2018, 07:55:33 AM
#31
Just a thought
regarding managing fully processes on the OS (if win) Leverage the power of SysIntenals Suite of tools.
Especially, for remote tasks which could be used as an alternative to the "OpenHardwareMonitor" && Powershell script/method contained in this thread. (Works great btw).

https://docs.microsoft.com/en-us/sysinternals/downloads/pskill

To this day they are very useful utils as were they from back in the day.!
newbie
Activity: 5
Merit: 0
February 21, 2018, 09:35:43 PM
#30
Hmmm... I am not having problems there. Have you reviewed this information: https://help.yahoo.com/kb/SLN4724.html.
Have you tried port 465? (non-SSL).
othgeek
newbie
Activity: 4
Merit: 0
February 20, 2018, 03:04:23 PM
#29
newbie
Activity: 4
Merit: 0
February 19, 2018, 12:44:29 PM
#28
I don’t use 2FA for gmail so I’m not sure why it’s not working. I’ll try yahoo and see if that works. Thanks for posting the new script. Sick today but I’ll try it out soon. This could be a game changer for monitoring my rigs.

Email setup is tricky. If you are using 2 factor authorization on Gmail you cannot send an email without getting a special code (search for this in google account). See this link for more information: https://www.digitalocean.com/community/tutorials/how-to-use-google-s-smtp-server. I avoided this with Gmail for this very reason and use yahoo. You could easily set up a new yahoo account just to send you these texts.

othgeek
newbie
Activity: 5
Merit: 0
February 18, 2018, 06:36:14 PM
#27
Below is the updated PowerShell script to monitor for GPUs being to cool (indicating monitor hang) or too hot.
It uses OpenHardwareMonitor, and is based on not.you's original script.

The file name, as noted above. should be called MonitorMining.ps1 and resides in the same directory as OpenHardwareMonitor.
Code:
$LogFile     = "MiningMonitor.log"
$verboseLogInfo = $false
$ProcessName = "openhardwaremonitor"
$Date        = Get-Date
$tooLow      = 40     # This is the Celsius temperature you wish to "flag" as an offline GPU
$reportIftooHigh = $true
$tooHigh     = 60           # This temperature triggers a warning text
$wayTooHigh  = 82           # This temperature and above shuts down the miner if reached on ONE occasion
$lowestReportedTemp  = 1000 # Remember to check that HardwareMonitor is reporting in Celsius
$highestReportedTemp = 0
$numIterations = 9          # Set the number of iterations to test before considering a "failed" miner process (too low).
$rebootIfLow = $true        # Set to true to send text and reboot (production) when low temps are found. Set to false to only send text.

# Textual elements
$textBodyReboot = "At $Date, Miner rebooted - GPU temperature dropped to:"
$textBodyWarning = "At $Date, GPU temperature dropped to:"
$textBodyHotWarning = "At $Date, a GPU reached warning temp of:"

# Constants for email message
$senderEmail = '[email protected]'
$senderPassword = 'yourpassword'
# you must set this for your phone number and carrier. The list is here: http://www.emailtextmessages.com/
$textNumberAndCarrier = '[email protected]'
$textSubject =  'Miner Message';
$smtpServer = 'smtp.mail.yahoo.com'
$port = '587'
$useSsl = $true


if ($rebootIfLow) {
    Write-Host "Monitor will send Message and Reboot if any GPU is consistently below $tooLow"
    $textBody = $textBodyReboot
    }
    else {
    Write-Host "Monitor will send Text Message if any GPU is consistently below $tooLow"
    $textBody = $textBodyWarning
    }
if ($reportIftooHigh) { Write-Host "Monitoring for hot GPUs. Message sent at $tooHigh, Reboot if any GPU reaches $wayTooHigh" }

# [console]::TreatControlCAsInput = $true
try
{
#Test if openhardwaremonitor is running and if not, start it
if((get-process $ProcessName -ErrorAction SilentlyContinue) -eq $Null)
    { Start-Process -FilePath ".\OpenHardwareMonitor.exe" -WorkingDirectory ".\"} # -WindowStyle Minimized; echo "Starting OpenHardwareMonitor..." }
elseif ($verboseLogInfo )
    { echo "OpenHardwareMonitor is running" }

$countOfIterationsLow = 0
$countIterationsHigh  = 0
$hotGPU = 99
#if the computer just started or OpenHardwareMonitor is slow starting we will have problems, so insert wait here
Start-Sleep 90

For ($i=1; $i -le $numIterations; $i++) {
# Query GPU temperature from OpenHardwareMonitor
$GPUTempObj = Get-WmiObject -namespace root\openhardwaremonitor -class sensor | Where-Object {$_.SensorType -Match "temperature" -and $_.Identifier -like "*gpu*"}
    $GPU_Num = 0;
    ForEach($GPU In $GPUTempObj)
{
$GPUTemp = $GPU.value
if($GPUTemp -lt $tooLow)
{
Write-Host "GPU$GPU_Num is $GPUTemp, below cutoff of $tooLow"
"$Date - GPU$GPU_Num is $GPUTemp, below cutoff of $tooLow" | Out-File $LogFile -Append
$countOfIterationsLow = $countOfIterationsLow + 1
                if ($GPUTemp -lt $lowestReportedTemp) { $lowestReportedTemp = $GPUTemp }
}
  else
{
Write-Host "GPU$GPU_Num is OK at $GPUTemp degrees (above $tooLow)"   
}
            $GPU_Num++
}
         if ($reportIftooHigh) {
             if($GPUTemp -ge $tooHigh)
    {
    Write-Host "GPU$GPU_Num is $GPUTemp, hotter than the cutoff of $tooHigh degrees"
    "$Date - GPU$GPU_Num is $GPUTemp, hotter than cutoff of $tooHigh" | Out-File $LogFile -Append
     $countOfIterationsHigh++
                    if ($GPUTemp -gt $highestReportedTemp) { $highestReportedTemp = $GPUTemp; $hotGPU = $GPU_Num }
    }
              if ($GPUTemp -ge $wayTooHigh) { Stop-Computer -force } 
         }
         if ($Host.UI.RawUI.KeyAvailable -and (3 -eq [int]$Host.UI.RawUI.ReadKey("AllowCtrlC,IncludeKeyUp,NoEcho").Character))
            {
                Write-Host "You pressed CTRL-C. Do you want to continue Mining Monitor (Y/N)?"
                $key = $Host.UI.RawUI.ReadKey("NoEcho, IncludeKeyDown")
                if ($key.Character -eq "N") { break; }
            }   
           
#if we have bad timing on a driver crash (and recovery) or work restarts we may get low results, so we wait between tests
if ($i -lt $numIterations) {
        Write-Host "- - - - - - - - - - -"
Start-Sleep 20
}
    }

# All tests (determined by &numIterations) have to get a low result restart
if($countOfIterationsLow -ge $numIterations)
     {"$Date - $textBody $lowestReportedTemp on $countOfIterationsLow occasions with cutoff of $tooLow - restarting" | Out-File $LogFile -Append
$MailArgs = @{
From       = $senderEmail
To         = $textNumberAndCarrier
Subject    = $textSubject
Body       = "$textBody $lowestReportedTemp on $countOfIterationsLow occasions with cutoff of $tooLow"
SmtpServer = $smtpServer
Port       = $port
UseSsl     = $useSsl
Credential = New-Object pscredential $senderEmail,$($senderPassword |ConvertTo-SecureString -AsPlainText -Force)
}
Send-MailMessage @MailArgs
if ($rebootIfLow) {
Restart-Computer -force
}
     }
elseif ($lowestReportedTemp -lt $tooLow)
     {"$Date - $countOfIterationsLow results below cutoff, lowest was $lowestReportedTemp with report cutoff of $tooLow" | Out-File $LogFile -Append}
elseif ($verboseLogInfo)
    {"$Date - No concerning results, cutoff temperature currently: $tooLow" | Out-File $LogFile -Append}


# Some tests (determined by &numIterations) reported GPUs too high
if($countOfIterationsHigh -ge $numIterations) {
     "$Date - GPU$hotGPU reported $highestReportedTemp degrees $countOfIterationsHigh times with warning set to $tooHigh" | Out-File $LogFile -Append
$MailArgs = @{
From       = $senderEmail
To         = $textNumberAndCarrier
Subject    = "Hot GPU - $textSubject"
Body       = "$Date - GPU$hotGPU reached $highestReportedTemp degrees $countOfIterationsHigh times. Current warning set to $tooHigh"
SmtpServer = $smtpServer
Port       = $port
UseSsl     = $useSsl
Credential = New-Object pscredential $senderEmail,$($senderPassword |ConvertTo-SecureString -AsPlainText -Force)
}
Send-MailMessage @MailArgs
}
}
finally
{
    if ($verboseLogInfo) { "$Date - Mining Monitor Stopped" | Out-File $LogFile -Append }
}

Here is the batch file, which should be in the same directory and can be named anything (well anything.bat).
Code:
@echo off
set counter=0
:start
cls
set /a counter=counter+1
echo Mining Monitor loop %counter%
powershell.exe .\MonitorMining.ps1
TIMEOUT /T 30
goto start
newbie
Activity: 5
Merit: 0
February 18, 2018, 05:52:50 PM
#26
Email setup is tricky. If you are using 2 factor authorization on Gmail you cannot send an email without getting a special code (search for this in google account). See this link for more information: https://www.digitalocean.com/community/tutorials/how-to-use-google-s-smtp-server. I avoided this with Gmail for this very reason and use yahoo. You could easily set up a new yahoo account just to send you these texts.

othgeek
newbie
Activity: 4
Merit: 0
February 18, 2018, 04:36:11 PM
#25
Awesome I’ll try it out. When I tested the too low it restarted but I didn’t get an email. Not sure what I’m doing wrong I used gmail Ssl info and put it in there but it didn’t work. Is there some other email setup I need to do?

samsoccer7: The tooHigh trigger is not yet implemented -- that code will be posted in the next few days. Sorry for the teaser.

The ShutDown is a simple process -- look in the code for "Restart-Computer" and replace it with "Stop-Computer".

othgeek
newbie
Activity: 5
Merit: 0
February 18, 2018, 09:34:30 AM
#24
samsoccer7: The tooHigh trigger is not yet implemented -- that code will be posted in the next few days. Sorry for the teaser.

The ShutDown is a simple process -- look in the code for "Restart-Computer" and replace it with "Stop-Computer".

othgeek
newbie
Activity: 4
Merit: 0
February 17, 2018, 11:07:43 PM
#23
full member
Activity: 434
Merit: 107
February 15, 2018, 02:08:22 PM
#22
I'm using modded RX480 8GB with Samsung memory and out of 24 cards, 2 on different rigs make problems. Yes, one option is to revert to stock BIOS, but I'm looking for another solution since that happens maybe once per day. Win 10 Pro.

Claymore's Dual Eth+Dcr miner shows 28.4MH/s for all 5 cards in the rig, and randomly, as I said, once per day, one card in the setup starts to drop to 21MH/s and eventually after few minutes 0MH/s. Then miner's watchdog tries to restart mining which goes well, but is stuck on the list of GPU's and doesn't start mining. The solution is to manually close the miner and just start it again.

EDIT: To make it clear, the rig doesn't freeze. Only the miner stops mining. So I can manually restart the miner WITHOUT rebooting the rig, but that was one option to restart the miner as well.

Now I'm searching for a solution which would reboot the whole rig (or just kill the miner process and start it again) if the GPU temperature falls below say 60°C ... something opposite to the overheating protection Smiley

Any ideas?

Seems like youve narrowed down the problem to 2 rigs.
How did you initially tune the cards.
did you use HWOinfo to set the right clock, core, power settings?
How many memory errors are you getting over an hour/ a day?
Might be better if you stabilise these two rigs before adding watchdogs etc.
newbie
Activity: 5
Merit: 0
February 15, 2018, 12:49:36 PM
#21
I have modified not.you's code, cleaning it up a bit and adding the ability to just text you when your miner has stopped or text you and automatically reboot. It has been working for about 2 weeks without false alarms and 1 needed reboot.

Name the code below with file name MinitorMining.ps1
Code:
$LogFile     = "MiningMonitor.log"
$ProcessName = "openhardwaremonitor"
$Date        = Get-Date
$TooLow      = 40   # This is the Celsius temperature you wish to "flag" as an offline GPU
$TooHigh     = 70         # This temperature triggers a warning text
$lowestReportedTemp = 1000  # Remember to check that HardwareMonitor is reporting in Celsius
$numIterations = 9        # Set the number of iterations to test before considering a "failed" miner process.
$rebootIfLow = $true     # Set to true to send text and reboot (production) when low temps are found. Set to false to only send text.

# Textual elements
$textBodyReboot = "At $Date, Miner rebooted - GPU temperature dropped to:"
$textBodyWarning = "At $Date, GPU temperature dropped to:"

# Constants for email message
$senderEmail = '[email protected]'
$senderPassword = 'emailpassword'
# you must set this for your phone number and carrier. The list is here: http://www.emailtextmessages.com/
$textNumberAndCarrier = '[email protected]'
$textSubject =  'Miner Message';
$smtpServer = 'smtp.mail.yourprovider.com'
$port = '587'
$useSsl = $true


if ($rebootIfLow) {
    Write-Host "Monitor will send Message and Reboot if Needed"
    $textBody = $textBodyReboot
    }
    else {
    Write-Host "Monitor is in Message Only Mode"
    $textBody = $textBodyWarning
    }

# [console]::TreatControlCAsInput = $true
try
{
#Test if openhardwaremonitor is running and if not, start it
if((get-process $ProcessName -ErrorAction SilentlyContinue) -eq $Null)
    { Start-Process -FilePath ".\OpenHardwareMonitor.exe" -WorkingDirectory ".\"} # -WindowStyle Minimized; echo "Starting OpenHardwareMonitor..." }
else
    { echo "OpenHardwareMonitor is running" }

$countOfIterationsLow = 0
#if the computer just started or OpenHardwareMonitor is slow starting we will have problems, so insert wait here
Start-Sleep 120

For ($i=1; $i -le $numIterations; $i++) {
# Query GPU temperature from OpenHardwareMonitor
$GPUTempObj = Get-WmiObject -namespace root\openhardwaremonitor -class sensor | Where-Object {$_.SensorType -Match "temperature" -and $_.Identifier -like "*gpu*"}
    $GPU_Num = 0;
    ForEach($GPU In $GPUTempObj)
{
$GPUTemp = $GPU.value
if($GPUTemp -lt $TooLow)
{
Write-Host "GPU$GPU_Num is $GPUTemp, below cutoff of $TooLow"
"$Date - GPU$GPU_Num is $GPUTemp, below cutoff of $TooLow" | Out-File $LogFile -Append
$countOfIterationsLow = $countOfIterationsLow + 1
                if ($GPUTemp -lt $lowestReportedTemp) { $lowestReportedTemp = $GPUTemp }
}
  else
{
Write-Host "GPU$GPU_Num is OK at $GPUTemp degrees (above $TooLow)"   
}
            $GPU_Num++
}
         if ($Host.UI.RawUI.KeyAvailable -and (3 -eq [int]$Host.UI.RawUI.ReadKey("AllowCtrlC,IncludeKeyUp,NoEcho").Character))
            {
                Write-Host "You pressed CTRL-C. Do you want to continue Mining Monitor (Y/N)?"
                $key = $Host.UI.RawUI.ReadKey("NoEcho, IncludeKeyDown")
                if ($key.Character -eq "N") { break; }
            }   
           
#if we have bad timing on a driver crash (and recovery) or work restarts we may get low results, so we wait between tests
if ($i -lt $numIterations) {
        Write-Host "- - - - - - - - - - -"
Start-Sleep 20
}
    }

# All tests (determined by &numIterations) have to get a low result restart
if($countOfIterationsLow -ge $numIterations)
     {"$Date - $textBody $lowestReportedTemp on $countOfIterationsLow occasions with cutoff of $TooLow - restarting" | Out-File $LogFile -Append
$MailArgs = @{
From       = $senderEmail
To         = $textNumberAndCarrier
Subject    = $textSubject
Body       = "$textBody $lowestReportedTemp on $countOfIterationsLow occasions with cutoff of $TooLow"
SmtpServer = $smtpServer
Port       = $port
UseSsl     = $useSsl
Credential = New-Object pscredential $senderEmail,$($senderPassword |ConvertTo-SecureString -AsPlainText -Force)
}
Send-MailMessage @MailArgs
if ($rebootIfLow) {
Restart-Computer -force
}
     }
elseif ($lowestReportedTemp -lt $TooLow)
     {"$Date - $countOfIterationsLow results below cutoff, lowest was $lowestReportedTemp with report cutoff of $TooLow" | Out-File $LogFile -Append}
else
    {"$Date - No concerning results, cutoff temperature currently: $TooLow" | Out-File $LogFile -Append}
}

finally
{
    "$Date - Mining Monitor Stopped" | Out-File $LogFile -Append
}   

Below are the contents of batch file to start it. I named mine "StartMiningMonitor.bat", but it can be named anything you want
Code:
@echo off
set counter=0
:start
cls
set /a counter=counter+1
echo Mining Monitor loop %counter%
powershell.exe .\MonitorMining.ps1
TIMEOUT /T 30
goto start

NOTE: This script has to be placed in the same folder as OpenHardwareMonitor. If you want to place it somewhere else, the following line has to be modified:
Code:
   { Start-Process -FilePath ".\OpenHardwareMonitor.exe" -WorkingDirectory ".\"} # -WindowStyle Minimized; echo "Starting OpenHardwareMonitor..." }

You are free to use and adapt to your needs. I would appreciate improvement suggestions and feedback.

othgeek
newbie
Activity: 1
Merit: 0
January 22, 2018, 04:55:08 PM
#20
newbie
Activity: 7
Merit: 0
September 12, 2017, 02:01:47 AM
#19
Hy all!

I hope i don't break some rule with that.

I am pretty new in mining so i have one question if anyone can help me I would be really happy  Smiley

So i have same question as the creator of this post , but my MAIN question is :

Can i command somehow to my claymore's miner that it need to restart when temperature of all GPU falls in idk 50.

Is there any command where i can write this down and if is Can someone please explain me how and where to write it.


Thank a lot for any help!
member
Activity: 112
Merit: 10
September 06, 2017, 06:07:40 AM
#18
Try to reflash your custom bios on those faulty cards via command line if you decrease intensity and you still get hangs .
Worked for me on a 480.
newbie
Activity: 1
Merit: 0
September 06, 2017, 03:36:55 AM
#17
As of V9.6, Claymore has a -minspeed option that will reboot the system if the speed hasn't been reached for > 5 mins.

Thanks didn't know that. Great option if a single GPU crashes.

If the whole system freezes I'm using the free mining-rig-resetter. Simple, configurable and it just works... Be sure to use Python 2.7x though.
Pages:
Jump to: