Author

Topic: How to reboot a rig if it stops mining? (Read 11968 times)

member
Activity: 135
Merit: 10
March 13, 2018, 09:58:52 PM
#36
Wouldnt it just be easier if the nicehash miner executed a command on GPU lost. It know's it lost it when it shutsdown and restarts the excavator. Just look and see that you have settings for X GPU's and only X-? are currently available. HMM. execute the users shell command.  which could be a attempt to recover GPU then reboot or just reboot the system
newbie
Activity: 210
Merit: 0
March 13, 2018, 02:30:00 PM
#35
I'm using modded RX480 8GB with Samsung memory and out of 24 cards, 2 on different rigs make problems. Yes, one option is to revert to stock BIOS, but I'm looking for another solution since that happens maybe once per day. Win 10 Pro.

Claymore's Dual Eth+Dcr miner shows 28.4MH/s for all 5 cards in the rig, and randomly, as I said, once per day, one card in the setup starts to drop to 21MH/s and eventually after few minutes 0MH/s. Then miner's watchdog tries to restart mining which goes well, but is stuck on the list of GPU's and doesn't start mining. The solution is to manually close the miner and just start it again.

EDIT: To make it clear, the rig doesn't freeze. Only the miner stops mining. So I can manually restart the miner WITHOUT rebooting the rig, but that was one option to restart the miner as well.

Now I'm searching for a solution which would reboot the whole rig (or just kill the miner process and start it again) if the GPU temperature falls below say 60°C ... something opposite to the overheating protection Smiley

Any ideas?

Might not be ideal solution for you, but my rigs are plugged into wall via wifi socket. If I`m not at home, I check once an hour power consumption and if its below of what its normally consuming, I know its not working as it should. I then press OFF on my phone`s app, wait 20 seconds, then ON. I`m using SMOS, which mines automatically once you apply power to your rig, so there is nothing else I need to do.

Example:
https://www.tp-link.com/us/products/details/cat-5516_HS110.html
newbie
Activity: 25
Merit: 0
March 13, 2018, 01:42:52 PM
#34
If mining rig crash, you can use wifi plug to force power cycle.

like these products

Wifi Plug
full member
Activity: 238
Merit: 100
Borderless for the People, Frictionless for the Ba
March 13, 2018, 10:07:46 AM
#33
If your miner has not been mining for more than an hour, the pool will usually email you. You can also apply for SMS.
jr. member
Activity: 50
Merit: 3
Searchin` perfection!
March 13, 2018, 08:27:36 AM
#32
If you need a tool to reset the whole rig - like pushing the power button. You can use my tool - https://bitcointalksearch.org/topic/diy-auto-hard-reset-mining-rigs-with-raspberry-pi-1933467 .It's free and you need only a Raspberry Pi and relay + some wires.
jr. member
Activity: 64
Merit: 5
March 13, 2018, 06:55:33 AM
#31
Just a thought
regarding managing fully processes on the OS (if win) Leverage the power of SysIntenals Suite of tools.
Especially, for remote tasks which could be used as an alternative to the "OpenHardwareMonitor" && Powershell script/method contained in this thread. (Works great btw).

https://docs.microsoft.com/en-us/sysinternals/downloads/pskill

To this day they are very useful utils as were they from back in the day.!
newbie
Activity: 5
Merit: 0
February 21, 2018, 08:35:43 PM
#30
Hmmm... I am not having problems there. Have you reviewed this information: https://help.yahoo.com/kb/SLN4724.html.
Have you tried port 465? (non-SSL).
othgeek
newbie
Activity: 4
Merit: 0
February 20, 2018, 02:04:23 PM
#29
newbie
Activity: 4
Merit: 0
February 19, 2018, 11:44:29 AM
#28
I don’t use 2FA for gmail so I’m not sure why it’s not working. I’ll try yahoo and see if that works. Thanks for posting the new script. Sick today but I’ll try it out soon. This could be a game changer for monitoring my rigs.

Email setup is tricky. If you are using 2 factor authorization on Gmail you cannot send an email without getting a special code (search for this in google account). See this link for more information: https://www.digitalocean.com/community/tutorials/how-to-use-google-s-smtp-server. I avoided this with Gmail for this very reason and use yahoo. You could easily set up a new yahoo account just to send you these texts.

othgeek
newbie
Activity: 5
Merit: 0
February 18, 2018, 05:36:14 PM
#27
Below is the updated PowerShell script to monitor for GPUs being to cool (indicating monitor hang) or too hot.
It uses OpenHardwareMonitor, and is based on not.you's original script.

The file name, as noted above. should be called MonitorMining.ps1 and resides in the same directory as OpenHardwareMonitor.
Code:
$LogFile     = "MiningMonitor.log"
$verboseLogInfo = $false
$ProcessName = "openhardwaremonitor"
$Date        = Get-Date
$tooLow      = 40     # This is the Celsius temperature you wish to "flag" as an offline GPU
$reportIftooHigh = $true
$tooHigh     = 60           # This temperature triggers a warning text
$wayTooHigh  = 82           # This temperature and above shuts down the miner if reached on ONE occasion
$lowestReportedTemp  = 1000 # Remember to check that HardwareMonitor is reporting in Celsius
$highestReportedTemp = 0
$numIterations = 9          # Set the number of iterations to test before considering a "failed" miner process (too low).
$rebootIfLow = $true        # Set to true to send text and reboot (production) when low temps are found. Set to false to only send text.

# Textual elements
$textBodyReboot = "At $Date, Miner rebooted - GPU temperature dropped to:"
$textBodyWarning = "At $Date, GPU temperature dropped to:"
$textBodyHotWarning = "At $Date, a GPU reached warning temp of:"

# Constants for email message
$senderEmail = '[email protected]'
$senderPassword = 'yourpassword'
# you must set this for your phone number and carrier. The list is here: http://www.emailtextmessages.com/
$textNumberAndCarrier = '[email protected]'
$textSubject =  'Miner Message';
$smtpServer = 'smtp.mail.yahoo.com'
$port = '587'
$useSsl = $true


if ($rebootIfLow) {
    Write-Host "Monitor will send Message and Reboot if any GPU is consistently below $tooLow"
    $textBody = $textBodyReboot
    }
    else {
    Write-Host "Monitor will send Text Message if any GPU is consistently below $tooLow"
    $textBody = $textBodyWarning
    }
if ($reportIftooHigh) { Write-Host "Monitoring for hot GPUs. Message sent at $tooHigh, Reboot if any GPU reaches $wayTooHigh" }

# [console]::TreatControlCAsInput = $true
try
{
#Test if openhardwaremonitor is running and if not, start it
if((get-process $ProcessName -ErrorAction SilentlyContinue) -eq $Null)
    { Start-Process -FilePath ".\OpenHardwareMonitor.exe" -WorkingDirectory ".\"} # -WindowStyle Minimized; echo "Starting OpenHardwareMonitor..." }
elseif ($verboseLogInfo )
    { echo "OpenHardwareMonitor is running" }

$countOfIterationsLow = 0
$countIterationsHigh  = 0
$hotGPU = 99
#if the computer just started or OpenHardwareMonitor is slow starting we will have problems, so insert wait here
Start-Sleep 90

For ($i=1; $i -le $numIterations; $i++) {
# Query GPU temperature from OpenHardwareMonitor
$GPUTempObj = Get-WmiObject -namespace root\openhardwaremonitor -class sensor | Where-Object {$_.SensorType -Match "temperature" -and $_.Identifier -like "*gpu*"}
    $GPU_Num = 0;
    ForEach($GPU In $GPUTempObj)
{
$GPUTemp = $GPU.value
if($GPUTemp -lt $tooLow)
{
Write-Host "GPU$GPU_Num is $GPUTemp, below cutoff of $tooLow"
"$Date - GPU$GPU_Num is $GPUTemp, below cutoff of $tooLow" | Out-File $LogFile -Append
$countOfIterationsLow = $countOfIterationsLow + 1
                if ($GPUTemp -lt $lowestReportedTemp) { $lowestReportedTemp = $GPUTemp }
}
  else
{
Write-Host "GPU$GPU_Num is OK at $GPUTemp degrees (above $tooLow)"   
}
            $GPU_Num++
}
         if ($reportIftooHigh) {
             if($GPUTemp -ge $tooHigh)
    {
    Write-Host "GPU$GPU_Num is $GPUTemp, hotter than the cutoff of $tooHigh degrees"
    "$Date - GPU$GPU_Num is $GPUTemp, hotter than cutoff of $tooHigh" | Out-File $LogFile -Append
     $countOfIterationsHigh++
                    if ($GPUTemp -gt $highestReportedTemp) { $highestReportedTemp = $GPUTemp; $hotGPU = $GPU_Num }
    }
              if ($GPUTemp -ge $wayTooHigh) { Stop-Computer -force } 
         }
         if ($Host.UI.RawUI.KeyAvailable -and (3 -eq [int]$Host.UI.RawUI.ReadKey("AllowCtrlC,IncludeKeyUp,NoEcho").Character))
            {
                Write-Host "You pressed CTRL-C. Do you want to continue Mining Monitor (Y/N)?"
                $key = $Host.UI.RawUI.ReadKey("NoEcho, IncludeKeyDown")
                if ($key.Character -eq "N") { break; }
            }   
           
#if we have bad timing on a driver crash (and recovery) or work restarts we may get low results, so we wait between tests
if ($i -lt $numIterations) {
        Write-Host "- - - - - - - - - - -"
Start-Sleep 20
}
    }

# All tests (determined by &numIterations) have to get a low result restart
if($countOfIterationsLow -ge $numIterations)
     {"$Date - $textBody $lowestReportedTemp on $countOfIterationsLow occasions with cutoff of $tooLow - restarting" | Out-File $LogFile -Append
$MailArgs = @{
From       = $senderEmail
To         = $textNumberAndCarrier
Subject    = $textSubject
Body       = "$textBody $lowestReportedTemp on $countOfIterationsLow occasions with cutoff of $tooLow"
SmtpServer = $smtpServer
Port       = $port
UseSsl     = $useSsl
Credential = New-Object pscredential $senderEmail,$($senderPassword |ConvertTo-SecureString -AsPlainText -Force)
}
Send-MailMessage @MailArgs
if ($rebootIfLow) {
Restart-Computer -force
}
     }
elseif ($lowestReportedTemp -lt $tooLow)
     {"$Date - $countOfIterationsLow results below cutoff, lowest was $lowestReportedTemp with report cutoff of $tooLow" | Out-File $LogFile -Append}
elseif ($verboseLogInfo)
    {"$Date - No concerning results, cutoff temperature currently: $tooLow" | Out-File $LogFile -Append}


# Some tests (determined by &numIterations) reported GPUs too high
if($countOfIterationsHigh -ge $numIterations) {
     "$Date - GPU$hotGPU reported $highestReportedTemp degrees $countOfIterationsHigh times with warning set to $tooHigh" | Out-File $LogFile -Append
$MailArgs = @{
From       = $senderEmail
To         = $textNumberAndCarrier
Subject    = "Hot GPU - $textSubject"
Body       = "$Date - GPU$hotGPU reached $highestReportedTemp degrees $countOfIterationsHigh times. Current warning set to $tooHigh"
SmtpServer = $smtpServer
Port       = $port
UseSsl     = $useSsl
Credential = New-Object pscredential $senderEmail,$($senderPassword |ConvertTo-SecureString -AsPlainText -Force)
}
Send-MailMessage @MailArgs
}
}
finally
{
    if ($verboseLogInfo) { "$Date - Mining Monitor Stopped" | Out-File $LogFile -Append }
}

Here is the batch file, which should be in the same directory and can be named anything (well anything.bat).
Code:
@echo off
set counter=0
:start
cls
set /a counter=counter+1
echo Mining Monitor loop %counter%
powershell.exe .\MonitorMining.ps1
TIMEOUT /T 30
goto start
newbie
Activity: 5
Merit: 0
February 18, 2018, 04:52:50 PM
#26
Email setup is tricky. If you are using 2 factor authorization on Gmail you cannot send an email without getting a special code (search for this in google account). See this link for more information: https://www.digitalocean.com/community/tutorials/how-to-use-google-s-smtp-server. I avoided this with Gmail for this very reason and use yahoo. You could easily set up a new yahoo account just to send you these texts.

othgeek
newbie
Activity: 4
Merit: 0
February 18, 2018, 03:36:11 PM
#25
Awesome I’ll try it out. When I tested the too low it restarted but I didn’t get an email. Not sure what I’m doing wrong I used gmail Ssl info and put it in there but it didn’t work. Is there some other email setup I need to do?

samsoccer7: The tooHigh trigger is not yet implemented -- that code will be posted in the next few days. Sorry for the teaser.

The ShutDown is a simple process -- look in the code for "Restart-Computer" and replace it with "Stop-Computer".

othgeek
newbie
Activity: 5
Merit: 0
February 18, 2018, 08:34:30 AM
#24
samsoccer7: The tooHigh trigger is not yet implemented -- that code will be posted in the next few days. Sorry for the teaser.

The ShutDown is a simple process -- look in the code for "Restart-Computer" and replace it with "Stop-Computer".

othgeek
newbie
Activity: 4
Merit: 0
February 17, 2018, 10:07:43 PM
#23
full member
Activity: 434
Merit: 107
February 15, 2018, 01:08:22 PM
#22
I'm using modded RX480 8GB with Samsung memory and out of 24 cards, 2 on different rigs make problems. Yes, one option is to revert to stock BIOS, but I'm looking for another solution since that happens maybe once per day. Win 10 Pro.

Claymore's Dual Eth+Dcr miner shows 28.4MH/s for all 5 cards in the rig, and randomly, as I said, once per day, one card in the setup starts to drop to 21MH/s and eventually after few minutes 0MH/s. Then miner's watchdog tries to restart mining which goes well, but is stuck on the list of GPU's and doesn't start mining. The solution is to manually close the miner and just start it again.

EDIT: To make it clear, the rig doesn't freeze. Only the miner stops mining. So I can manually restart the miner WITHOUT rebooting the rig, but that was one option to restart the miner as well.

Now I'm searching for a solution which would reboot the whole rig (or just kill the miner process and start it again) if the GPU temperature falls below say 60°C ... something opposite to the overheating protection Smiley

Any ideas?

Seems like youve narrowed down the problem to 2 rigs.
How did you initially tune the cards.
did you use HWOinfo to set the right clock, core, power settings?
How many memory errors are you getting over an hour/ a day?
Might be better if you stabilise these two rigs before adding watchdogs etc.
newbie
Activity: 5
Merit: 0
February 15, 2018, 11:49:36 AM
#21
I have modified not.you's code, cleaning it up a bit and adding the ability to just text you when your miner has stopped or text you and automatically reboot. It has been working for about 2 weeks without false alarms and 1 needed reboot.

Name the code below with file name MinitorMining.ps1
Code:
$LogFile     = "MiningMonitor.log"
$ProcessName = "openhardwaremonitor"
$Date        = Get-Date
$TooLow      = 40   # This is the Celsius temperature you wish to "flag" as an offline GPU
$TooHigh     = 70         # This temperature triggers a warning text
$lowestReportedTemp = 1000  # Remember to check that HardwareMonitor is reporting in Celsius
$numIterations = 9        # Set the number of iterations to test before considering a "failed" miner process.
$rebootIfLow = $true     # Set to true to send text and reboot (production) when low temps are found. Set to false to only send text.

# Textual elements
$textBodyReboot = "At $Date, Miner rebooted - GPU temperature dropped to:"
$textBodyWarning = "At $Date, GPU temperature dropped to:"

# Constants for email message
$senderEmail = '[email protected]'
$senderPassword = 'emailpassword'
# you must set this for your phone number and carrier. The list is here: http://www.emailtextmessages.com/
$textNumberAndCarrier = '[email protected]'
$textSubject =  'Miner Message';
$smtpServer = 'smtp.mail.yourprovider.com'
$port = '587'
$useSsl = $true


if ($rebootIfLow) {
    Write-Host "Monitor will send Message and Reboot if Needed"
    $textBody = $textBodyReboot
    }
    else {
    Write-Host "Monitor is in Message Only Mode"
    $textBody = $textBodyWarning
    }

# [console]::TreatControlCAsInput = $true
try
{
#Test if openhardwaremonitor is running and if not, start it
if((get-process $ProcessName -ErrorAction SilentlyContinue) -eq $Null)
    { Start-Process -FilePath ".\OpenHardwareMonitor.exe" -WorkingDirectory ".\"} # -WindowStyle Minimized; echo "Starting OpenHardwareMonitor..." }
else
    { echo "OpenHardwareMonitor is running" }

$countOfIterationsLow = 0
#if the computer just started or OpenHardwareMonitor is slow starting we will have problems, so insert wait here
Start-Sleep 120

For ($i=1; $i -le $numIterations; $i++) {
# Query GPU temperature from OpenHardwareMonitor
$GPUTempObj = Get-WmiObject -namespace root\openhardwaremonitor -class sensor | Where-Object {$_.SensorType -Match "temperature" -and $_.Identifier -like "*gpu*"}
    $GPU_Num = 0;
    ForEach($GPU In $GPUTempObj)
{
$GPUTemp = $GPU.value
if($GPUTemp -lt $TooLow)
{
Write-Host "GPU$GPU_Num is $GPUTemp, below cutoff of $TooLow"
"$Date - GPU$GPU_Num is $GPUTemp, below cutoff of $TooLow" | Out-File $LogFile -Append
$countOfIterationsLow = $countOfIterationsLow + 1
                if ($GPUTemp -lt $lowestReportedTemp) { $lowestReportedTemp = $GPUTemp }
}
  else
{
Write-Host "GPU$GPU_Num is OK at $GPUTemp degrees (above $TooLow)"   
}
            $GPU_Num++
}
         if ($Host.UI.RawUI.KeyAvailable -and (3 -eq [int]$Host.UI.RawUI.ReadKey("AllowCtrlC,IncludeKeyUp,NoEcho").Character))
            {
                Write-Host "You pressed CTRL-C. Do you want to continue Mining Monitor (Y/N)?"
                $key = $Host.UI.RawUI.ReadKey("NoEcho, IncludeKeyDown")
                if ($key.Character -eq "N") { break; }
            }   
           
#if we have bad timing on a driver crash (and recovery) or work restarts we may get low results, so we wait between tests
if ($i -lt $numIterations) {
        Write-Host "- - - - - - - - - - -"
Start-Sleep 20
}
    }

# All tests (determined by &numIterations) have to get a low result restart
if($countOfIterationsLow -ge $numIterations)
     {"$Date - $textBody $lowestReportedTemp on $countOfIterationsLow occasions with cutoff of $TooLow - restarting" | Out-File $LogFile -Append
$MailArgs = @{
From       = $senderEmail
To         = $textNumberAndCarrier
Subject    = $textSubject
Body       = "$textBody $lowestReportedTemp on $countOfIterationsLow occasions with cutoff of $TooLow"
SmtpServer = $smtpServer
Port       = $port
UseSsl     = $useSsl
Credential = New-Object pscredential $senderEmail,$($senderPassword |ConvertTo-SecureString -AsPlainText -Force)
}
Send-MailMessage @MailArgs
if ($rebootIfLow) {
Restart-Computer -force
}
     }
elseif ($lowestReportedTemp -lt $TooLow)
     {"$Date - $countOfIterationsLow results below cutoff, lowest was $lowestReportedTemp with report cutoff of $TooLow" | Out-File $LogFile -Append}
else
    {"$Date - No concerning results, cutoff temperature currently: $TooLow" | Out-File $LogFile -Append}
}

finally
{
    "$Date - Mining Monitor Stopped" | Out-File $LogFile -Append
}   

Below are the contents of batch file to start it. I named mine "StartMiningMonitor.bat", but it can be named anything you want
Code:
@echo off
set counter=0
:start
cls
set /a counter=counter+1
echo Mining Monitor loop %counter%
powershell.exe .\MonitorMining.ps1
TIMEOUT /T 30
goto start

NOTE: This script has to be placed in the same folder as OpenHardwareMonitor. If you want to place it somewhere else, the following line has to be modified:
Code:
   { Start-Process -FilePath ".\OpenHardwareMonitor.exe" -WorkingDirectory ".\"} # -WindowStyle Minimized; echo "Starting OpenHardwareMonitor..." }

You are free to use and adapt to your needs. I would appreciate improvement suggestions and feedback.

othgeek
newbie
Activity: 1
Merit: 0
January 22, 2018, 03:55:08 PM
#20
newbie
Activity: 7
Merit: 0
September 12, 2017, 01:01:47 AM
#19
Hy all!

I hope i don't break some rule with that.

I am pretty new in mining so i have one question if anyone can help me I would be really happy  Smiley

So i have same question as the creator of this post , but my MAIN question is :

Can i command somehow to my claymore's miner that it need to restart when temperature of all GPU falls in idk 50.

Is there any command where i can write this down and if is Can someone please explain me how and where to write it.


Thank a lot for any help!
member
Activity: 112
Merit: 10
September 06, 2017, 05:07:40 AM
#18
Try to reflash your custom bios on those faulty cards via command line if you decrease intensity and you still get hangs .
Worked for me on a 480.
newbie
Activity: 1
Merit: 0
September 06, 2017, 02:36:55 AM
#17
As of V9.6, Claymore has a -minspeed option that will reboot the system if the speed hasn't been reached for > 5 mins.

Thanks didn't know that. Great option if a single GPU crashes.

If the whole system freezes I'm using the free mining-rig-resetter. Simple, configurable and it just works... Be sure to use Python 2.7x though.
newbie
Activity: 14
Merit: 0
September 03, 2017, 07:35:08 PM
#16
As of V9.6, Claymore has a -minspeed option that will reboot the system if the speed hasn't been reached for > 5 mins.
legendary
Activity: 1078
Merit: 1011
September 01, 2017, 08:29:44 AM
#15
Most of these issues stem from rig instability. Instead of trying to find these elaborate workarounds to reboot your rig when the instability causes it to lockup or crash, you may be better off tracking down the reason it is unstable in the first place. Most of the time this usually comes down to heat and power, and occasional bad drivers or some other reason. Heat and power can be either direct, the cards running too hot or being on a borderline power issue, such as weak PSU, loose or undersized wires or connectors. It can also be indirect as a result of too much overclocking or too much undervolting.

Anyway, I think if you can track down the culprit your issues will go away and any loss of hashrate, for instance if you need to back down on the overclock, would be more than offset by your increased stability and up-time. Myself and I am sure many others have rigs that go for weeks or months without the need for a reboot, and usually then it because of a needed maintenance (software upgrade, scheduled cleaning, etc.) rather than because of something unexpected.
hero member
Activity: 1274
Merit: 556
September 01, 2017, 07:59:11 AM
#14
You can also use speedfan.
I've got it running on one of my rigs. It's set to monitor temps of the GPUs... and if either goes below 55C during at least 5 minutes, the rig's rebooted.
newbie
Activity: 11
Merit: 0
September 01, 2017, 06:13:23 AM
#13
hero member
Activity: 552
Merit: 500
Anyone? It seems to have gotten worse in the last few updates..
hero member
Activity: 552
Merit: 500
I'm also having a lot of issues lately, the rig ran fine for days, and lately its been getting the open gl gpu hang issue.

I tried a few things..

I created a reboot script that calls another script which then launches the miner but for some reason while relaunching the miner it gets stuck on creating a dag for one.

Im slowly adjusting the cards but they ran fine for days.. Here is a bit of my config also which Im tweaking.. its becoming pretty annoying.  I wish someone had a program to watch this and kill it properly , wait then relaunch.  I see the
script above but not i'm going that route yet.

reboot.bat file

Code:
timeout 5 > NUL
start c:\\Claymore\"reboot-miner.bat"
exit 0

reboot-miner.bat

Code:
taskkill /IM EthDcrMiner64.exe /F
timeout 15 > NUL
start c:\\Claymore\"start-classic.bat"
exit 0

Here making sure that the miner is killed and with the /F its forced along with a 15 second timeout.

Issues can be that when this happens it kills the miner but 9 out of 10 times its stuck at creating the dag.  I'm assuming some gpu memory or driver hasnt reset yet.

also the reboot.bat command prompt doesn't close even though exit 0 is added to it.. any thoughts?

Here are some config vars I'm using

Code:
EthDcrMiner64.exe -wd 0 -mport 0 -r 1 -dcri 26 -tt 70 -eres 0

Testing now with: wd 0 to at least not restart the miner and get stuck, until I can figure out how to properly reset everything and not have it hang on restart with dag creation.

-r 1 calls the reboot.bat file
-dcri value I also play with.

temps are fine

I'm also going to test using -ethi 6 which the default is 8.

Anyone have any other thoughts or suggestions? I'm using the latest miner, I'm sure Claymore is aware of this and it would be good to fix the hanging miner restart ..

Cheers!
legendary
Activity: 4354
Merit: 9201
'The right to privacy matters'
March 29, 2017, 06:33:07 PM
#10
I'm using modded RX480 8GB with Samsung memory and out of 24 cards, 2 on different rigs make problems. Yes, one option is to revert to stock BIOS, but I'm looking for another solution since that happens maybe once per day. Win 10 Pro.

Claymore's Dual Eth+Dcr miner shows 28.4MH/s for all 5 cards in the rig, and randomly, as I said, once per day, one card in the setup starts to drop to 21MH/s and eventually after few minutes 0MH/s. Then miner's watchdog tries to restart mining which goes well, but is stuck on the list of GPU's and doesn't start mining. The solution is to manually close the miner and just start it again.

EDIT: To make it clear, the rig doesn't freeze. Only the miner stops mining. So I can manually restart the miner WITHOUT rebooting the rig, but that was one option to restart the miner as well.

Now I'm searching for a solution which would reboot the whole rig (or just kill the miner process and start it again) if the GPU temperature falls below say 60°C ... something opposite to the overheating protection Smiley

Any ideas?

Yeah set the clay more. To -ethi 6. And -dcri 20. See if it holds

This would be in your bat file.

If you look at page 1 post one. Claymore has many options. The best thing is get it stable .

Auto reboot is the last choice.

If you read all the settings he has. You can isolate a single card and a lot of other stuff.

That one card is an under performing card
sr. member
Activity: 689
Merit: 253
March 29, 2017, 06:25:40 PM
#9
I use simple rig resetter for this, its on this forum and simplemining.net
newbie
Activity: 53
Merit: 0
March 29, 2017, 05:59:10 PM
#8
Thank you all, I'll first try with -r 1 and reboot.bat, if it doesn't work then I'll try the PowerShell script!

Hopefully the simpler solution works well Smiley
legendary
Activity: 1726
Merit: 1018
March 29, 2017, 04:42:39 PM
#7
I have a DIY solution for this that I made up that you can use.  Basically I made a powershell script that checks GPU usage percentage and based on results it reboots the computer, although you can modify that to restart the mining program as well.  I also have something I use to restart the mining program instead of reboot the computer since sometimes that is enough.

The difficulty with restarting the mining app is shutting it down.  I use something called closeprog.exe to shut down the mining application.  I set a scheduled task that runs a script when the video driver crashes.  That script uses closeprog.exe to kill the mining app and then rerun it.  I can't remember where I got this closeprog.exe, I have had it for years and used it for various scripted things.  You can probably figure out how to use taskkill to do that also.

To monitor GPU usage and reboot when a specific GPU drops out I use openhardwaremonitor.  You can get that here: http://openhardwaremonitor.org/
That thing basically exposes the GPU sensor stats to windows management instrumentation which powershell can work with.  The powershell script runs from a scheduled task every 10 minutes.  It cycles through all of the GPU's looking for low usage results.  If it gets 9 low results in a row it reboots the whole rig.  The reason I do 9 results is because if a driver crash causes the other scheduled task to restart the mining application while this script is testing GPU usage then I can sometimes wind up with some low readings while things are being restarted by the other task.  Also the GPU usage will dip normally during certain work restarts from the pool.  So I want to be sure I am really seeing a consistent low result before I reboot.  The PS script is this:

Code:
$Log = "LogFile.log"
$Date = Get-Date
$TestValue = 0

#Test if openhardwaremonitor is running and if not, start it
$ProcessName = "openhardwaremonitor"

    if((get-process $ProcessName -ErrorAction SilentlyContinue) -eq $Null)
    { Start-Process -FilePath ".\OpenHardwareMonitor\OpenHardwareMonitor.exe" -WindowStyle Minimized}
else
    { echo "Process is already running" }

#if the computer just started it will get zeros while the miner is still getting the dag file ready so we wait
Start-Sleep 120

#Check GPU load

$FirstGPULoads = Get-WmiObject -namespace root\openhardwaremonitor -class sensor | Where-Object {$_.SensorType -Match "load" -and $_.Identifier -like "*gpu*"}

ForEach($GPU In $FirstGPULoads)
     {if($GPU.value -lt 10)
        {$FirstGPULoadValue = $GPU.value
        Write-Host $FirstGPULoadValue "seems low"
        #"$Date - Low result obtained $FirstGPULoadValue" >> $Log
        $TestValue = $TestValue + 1                                      
        }
      else
        {$FirstGPULoadValue = $GPU.value
        Write-Host $FirstGPULoadValue "seems fine"  
        }
     }

#if we have bad timing on a driver crash (and recovery) or work restarts we may get low results so we wait between tests
Start-Sleep 20

$SecondGPULoads = Get-WmiObject -namespace root\openhardwaremonitor -class sensor | Where-Object {$_.SensorType -Match "load" -and $_.Identifier -like "*gpu*"}

ForEach($GPU In $SecondGPULoads)
     {if($GPU.value -lt 10)
        {$SecondGPULoadValue = $GPU.value
        Write-Host $SecondGPULoadValue "seems low"
        #"$Date - Low result obtained $SecondGPULoadValue" >> $Log
        $TestValue = $TestValue + 1                                      
        }
      else
        {$SecondGPULoadValue = $GPU.value
        Write-Host $SecondGPULoadValue "seems fine"  
        }
     }

#if we have bad timing on a driver crash (and recovery) or work restarts we may get low results so we wait between tests
Start-Sleep 20


$ThirdGPULoads = Get-WmiObject -namespace root\openhardwaremonitor -class sensor | Where-Object {$_.SensorType -Match "load" -and $_.Identifier -like "*gpu*"}

ForEach($GPU In $ThirdGPULoads)
     {if($GPU.value -lt 10)
        {$ThirdGPULoadValue = $GPU.value
        Write-Host $ThirdGPULoadValue "seems low"
        #"$Date - Low result obtained $ThirdGPULoadValue" >> $Log
        $TestValue = $TestValue + 1                                      
        }
      else
        {$ThirdGPULoadValue = $GPU.value
        Write-Host $ThirdGPULoadValue "seems fine"  
        }
     }


#if we have bad timing on a driver crash (and recovery) or work restarts we may get low results so we wait between tests
Start-Sleep 20


$FourthGPULoads = Get-WmiObject -namespace root\openhardwaremonitor -class sensor | Where-Object {$_.SensorType -Match "load" -and $_.Identifier -like "*gpu*"}

ForEach($GPU In $FourthGPULoads)
     {if($GPU.value -lt 10)
        {$FourthGPULoadValue = $GPU.value
        Write-Host $FourthGPULoadValue "seems low"
        #"$Date - Low result obtained $FourthGPULoadValue" >> $Log
        $TestValue = $TestValue + 1                                      
        }
      else
        {$FourthGPULoadValue = $GPU.value
        Write-Host $FourthGPULoadValue "seems fine"  
        }
     }


#if we have bad timing on a driver crash (and recovery) or work restarts we may get low results so we wait between tests
Start-Sleep 20


$FifthGPULoads = Get-WmiObject -namespace root\openhardwaremonitor -class sensor | Where-Object {$_.SensorType -Match "load" -and $_.Identifier -like "*gpu*"}

ForEach($GPU In $FifthGPULoads)
     {if($GPU.value -lt 10)
        {$FifthGPULoadValue = $GPU.value
        Write-Host $FifthGPULoadValue "seems low"
        #"$Date - Low result obtained $FifthGPULoadValue" >> $Log
        $TestValue = $TestValue + 1                                      
        }
      else
        {$FifthGPULoadValue = $GPU.value
        Write-Host $FifthGPULoadValue "seems fine"  
        }
     }


#if we have bad timing on a driver crash (and recovery) or work restarts we may get low results so we wait between tests
Start-Sleep 20


$SixthGPULoads = Get-WmiObject -namespace root\openhardwaremonitor -class sensor | Where-Object {$_.SensorType -Match "load" -and $_.Identifier -like "*gpu*"}

ForEach($GPU In $SixthGPULoads)
     {if($GPU.value -lt 10)
        {$SixthGPULoadValue = $GPU.value
        Write-Host $SixthGPULoadValue "seems low"
        #"$Date - Low result obtained $SixthGPULoadValue" >> $Log
        $TestValue = $TestValue + 1                                      
        }
      else
        {$SixthGPULoadValue = $GPU.value
        Write-Host $SixthGPULoadValue "seems fine"  
        }
     }


#if we have bad timing on a driver crash (and recovery) or work restarts we may get low results so we wait between tests
Start-Sleep 20


$SeventhGPULoads = Get-WmiObject -namespace root\openhardwaremonitor -class sensor | Where-Object {$_.SensorType -Match "load" -and $_.Identifier -like "*gpu*"}

ForEach($GPU In $SeventhGPULoads)
     {if($GPU.value -lt 10)
        {$SeventhGPULoadValue = $GPU.value
        Write-Host $SeventhGPULoadValue "seems low"
        #"$Date - Low result obtained $SeventhGPULoadValue" >> $Log
        $TestValue = $TestValue + 1                                      
        }
      else
        {$SeventhGPULoadValue = $GPU.value
        Write-Host $SeventhGPULoadValue "seems fine"  
        }
     }


#if we have bad timing on a driver crash (and recovery) or work restarts we may get low results so we wait between tests
Start-Sleep 20


$EightthGPULoads = Get-WmiObject -namespace root\openhardwaremonitor -class sensor | Where-Object {$_.SensorType -Match "load" -and $_.Identifier -like "*gpu*"}

ForEach($GPU In $EightthGPULoads)
     {if($GPU.value -lt 10)
        {$EightthGPULoadValue = $GPU.value
        Write-Host $EightthGPULoadValue "seems low"
        #"$Date - Low result obtained $EightthGPULoadValue" >> $Log
        $TestValue = $TestValue + 1                                      
        }
      else
        {$EightthGPULoadValue = $GPU.value
        Write-Host $EightthGPULoadValue "seems fine"  
        }
     }


#if we have bad timing on a driver crash (and recovery) or work restarts we may get low results so we wait between tests
Start-Sleep 20


$NinthGPULoads = Get-WmiObject -namespace root\openhardwaremonitor -class sensor | Where-Object {$_.SensorType -Match "load" -and $_.Identifier -like "*gpu*"}

ForEach($GPU In $NinthGPULoads)
     {if($GPU.value -lt 10)
        {$NinthGPULoadValue = $GPU.value
        Write-Host $NinthGPULoadValue "seems low"
        #"$Date - Low result obtained $NinthGPULoadValue" >> $Log
        $TestValue = $TestValue + 1                                      
        }
      else
        {$NinthGPULoadValue = $GPU.value
        Write-Host $NinthGPULoadValue "seems fine"  
        }
     }

#all nine tests have to get a low result restart
if($TestValue -gt 8)
     {"$Date - Obtained $TestValue low results - Seems dead - restarting" >> $Log
      Restart-Computer -force
     }
else
     {"$Date - Obtained $TestValue low results - Seems ok" >> $Log}
        

Each of the nine GPU tests will test all GPU's and if any of them gives a result less than 10% it increments the TestValue variable.  If the TestValue variable gets a value of nine at the end that means each of the nine tests resulted in at least one GPU reporting a usage of under 10%.  Since there are also pauses while it waits in between tests that works pretty well for me in terms of only doing a reboot when a GPU has well and truly crashed.  The openhardwaremonitor app has to be in a subdirectory called openhardwaremonitor underneath where this powershell script runs (or you need to change the path).  First it checks if that app is running and if it isn't it runs it.  That way you never have to bother with actually making sure the openhardwaremonitor app is running.  The script also has a long pause in the beginning so that if the computer did get rebooted and it runs right after reboot it waits a minute for the miner apps to get started (because you have those start automatically at boot right?)

To run a powershell script from a scheduled task I use a batch file to run the PS script (I know convoluted but it works).  The batch file that runs the PS script is in the same directory as the PS script and it looks like this:
powershell.exe .\GPU_Monitor.ps1

EDIT: I was just looking over my log file for this and I see that it actually increments the TestValue variable for every single low reading so if you have 6 GPU's (I don't) then it could actually hit 9 low results quite easily if the test was run at an inopportune moment (like during a video driver crash).  You can change the value in the line towards the end to determine how many low results you need to initiate a reboot.  The line that determines that is
Code:
if($TestValue -gt 8)
just change 8 to whatever.  You can comment out the restart also by putting a # in front of it.  I ran this for a couple days with the restart command commented out to be sure it was really only going to restart when I wanted it to.  When I was satisfied that it wasn't going to cause a lot of unnecessary reboots I removed the comment on that line so it could restart the rig.  But looking at my log for the past few days it looks like the script could use some refinement.
newbie
Activity: 1
Merit: 0
March 29, 2017, 02:48:48 PM
#6
I believe there is a command inside Claymore for this problem.
Code:
-r   Restart miner mode. "-r 0" (default) - restart miner if something wrong with GPU. "-r -1" - disable automatic restarting. -r >20 - restart miner if something
   wrong with GPU or by timer. For example, "-r 60" - restart miner every hour or when some GPU failed.
   "-r 1" closes miner and execute "reboot.bat" file ("reboot.bash" or "reboot.sh" for Linux version) in the miner directory (if exists) if some GPU failed.
   So you can create "reboot.bat" file and perform some actions, for example, reboot system if you put this line there: "shutdown /r /t 5 /f".
newbie
Activity: 53
Merit: 0
March 29, 2017, 02:35:11 PM
#5
But the miner doesn't quits, if it exited then there would be no problem. Internal watchdog restarts it without killing the process. It just stops everything and tries to start again in the same process which results in not starting again and being stuck on this screen:

https://i.imgur.com/JCOpGQy.jpg

It just remains stuck on that and doesn't continue mining till I close it manually by clicking the X button and starting it again...
legendary
Activity: 2002
Merit: 1051
ICO? Not even once.
March 29, 2017, 02:00:16 PM
#4
But the thing is the miner doesn't freezes the rig, I can teamviewer into it and just restart the miner. So no need for IP power switches... the thing is I notice that the mining stopped 10 hours after it stopped. Need something to do the resetting for me automatically...

Run the miner in a loop, on windows create a bat file like mining.bat:

:start
miner.exe....
goto start


That way when miner.exe stops, it will start again.


If the miner stops but the process doesn't end you can use a second batch file (eg. miner_restart.bat) with something like this:

:start
timeout -t 600
taskkill -t -f /im miner.exe
goto start


This will kill miner.exe every 600 seconds which will restart because of the first bat.
newbie
Activity: 53
Merit: 0
March 29, 2017, 11:55:37 AM
#3
But the thing is the miner doesn't freezes the rig, I can teamviewer into it and just restart the miner. So no need for IP power switches... the thing is I notice that the mining stopped 10 hours after it stopped. Need something to do the resetting for me automatically...
newbie
Activity: 31
Merit: 0
March 29, 2017, 11:52:13 AM
#2
Dakky,

Get yourself an IP power switch and set your BIOSes to start on power loss. Then it's just a matter of logging into the power switch and power cycling.

We actually know each other from another forum Smiley

If you want to see how it looks like poke me
newbie
Activity: 53
Merit: 0
March 29, 2017, 11:45:04 AM
#1
I'm using modded RX480 8GB with Samsung memory and out of 24 cards, 2 on different rigs make problems. Yes, one option is to revert to stock BIOS, but I'm looking for another solution since that happens maybe once per day. Win 10 Pro.

Claymore's Dual Eth+Dcr miner shows 28.4MH/s for all 5 cards in the rig, and randomly, as I said, once per day, one card in the setup starts to drop to 21MH/s and eventually after few minutes 0MH/s. Then miner's watchdog tries to restart mining which goes well, but is stuck on the list of GPU's and doesn't start mining. The solution is to manually close the miner and just start it again.

EDIT: To make it clear, the rig doesn't freeze. Only the miner stops mining. So I can manually restart the miner WITHOUT rebooting the rig, but that was one option to restart the miner as well.

Now I'm searching for a solution which would reboot the whole rig (or just kill the miner process and start it again) if the GPU temperature falls below say 60°C ... something opposite to the overheating protection Smiley

Any ideas?
Jump to: