Hi Fullzero
I want to share with you a GPU failed that the watchdog is not able to detect
wdog screen:
GPU UTILIZATION: Unable to determine the device handle for GPU 0000:09:00.0: GPU is lost. Reboot the system to recover this GPU
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: Unable: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: to: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: determine: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: the: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: device: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: handle: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: for: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: GPU: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: 0000:09:00.0:: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: GPU: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: is: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: lost.: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: Reboot: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: the: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: system: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: to: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: recover: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: this: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: GPU: integer expression expected
Tue Jul 25 16:57:01 CEST 2017 - All good! Will check again in 60 seconds
GPU UTILIZATION: Unable to determine the device handle for GPU 0000:09:00.0: GPU is lost. Reboot the system to recover this GPU
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: Unable: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: to: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: determine: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: the: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: device: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: handle: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: for: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: GPU: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: 0000:09:00.0:: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: GPU: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: is: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: lost.: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: Reboot: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: the: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: system: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: to: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: recover: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: this: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: GPU: integer expression expected
Tue Jul 25 16:58:01 CEST 2017 - All good! Will check again in 60 seconds
the miner show/detect only 6 GPU over 7
nvidia-smi doesn't work
$ nvidia-smi
Unable to determine the device handle for GPU 0000:09:00.0: GPU is lost. Reboot the system to recover this GPU
temp screen:
Provided power limit 75.00 W is not a valid power limit which should be between 115.00 W and 291.00 W for GPU 00000000:0A:00.0
Terminating early due to previous errors.
Tue Jul 25 17:01:07 CEST 2017 - All good, will check again soon
GPU 0, Target temp: 61, Current: 60, Diff: 1, Fan: 75, Power: 123.46
GPU 1, Target temp: 61, Current: 60, Diff: 1, Fan: 63, Power: 124.62
GPU 2, Target temp: 61, Current: 59, Diff: 2, Fan: 77, Power: 119.23
GPU 3, Target temp: 61, Current: 60, Diff: 1, Fan: 68, Power: 120.72
GPU 4, Target temp: 61, Current: 59, Diff: 2, Fan: 57, Power: 124.26
GPU 5, Target temp: 61, Current: Unable, Diff: 61, Fan: to, Power: determine
/home/m1/Maxximus007_AUTO_TEMPERATURE_CONTROL: line 125: [: Unable: integer expression expected
/home/m1/Maxximus007_AUTO_TEMPERATURE_CONTROL: line 158: [: the: integer expression expected
/home/m1/Maxximus007_AUTO_TEMPERATURE_CONTROL: line 171: [: to: integer expression expected
GPU 6, Target temp: 61, Current: 55, Diff: 6, Fan: 50, Power: 126.76
Tue Jul 25 17:01:37 CEST 2017 - Restoring Power limit for gpu:6. Old limit: 125 New limit: 75 Fan speed: 50
Provided power limit 75.00 W is not a valid power limit which should be between 115.00 W and 291.00 W for GPU 00000000:0A:00.0
Terminating early due to previous errors.
Tue Jul 25 17:01:37 CEST 2017 - All good, will check again soon
I believe this is the exact problem that Maxximus007 recently made a new code block to resolve.
Fullzero,
I'm getting this error as well, and looks like watchdog is not rebooting the system.
I believe I have the latest bash files.
are Maxximus007's changes to resolve this issue in the current bash files?
Thank you.
GPU UTILIZATION: Unable to determine the device handle for GPU 0000:01:00.0: GPU is lost. Reboot the system to recover this GPU
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: Unable: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: to: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: determine: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: the: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: device: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: handle: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: for: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: GPU: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: 0000:01:00.0:: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: GPU: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: is: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: lost.: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: Reboot: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: the: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: system: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: to: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: recover: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: this: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: GPU: integer expression expected
Sat Jul 29 21:07:09 PDT 2017 - All good! Will check again in 60 seconds
The newest watchdog download link is at the top of the OP in purple. It resolves this problem, and is more effective; it should not have a false positive reboot at all.