Hey guys,
Anyone else having an issue mining ETH (and ZCL now) and the rig crashing / freezing?
Seems that something happen on the ETH miner, watchdog thinks it lost a GPU and tries to restart 3main. But on both of my rigs, when this happens the system freezes and I have to manually restart.
Errors I have seen -
GPU Utilization is low: restarting 3main...
Thread exited with code: 29
Is there a way to disable 3main restarting without disabling watchdog? The miner would start mining again once connected, but this freezing is my issue.
I'm mining hush temporarily with zero issues. Was on ZCL before the fork without issues as well, same settings and whatnot. And was mining ETH for about 3 weeks with no issues. Now it won't stop crashing after being up 12-24 hours each time. It's driving me nuts.
Two separate rigs, both 13 card (1070 and 1070 Ti's)
The only time I have ever had a rig freeze was ultimately due to OC being too high. Lowering the GPU OC by just a bit (5 or 10) fixed my issue.
I am not sure I totally understand your question, but if you want to disable the the 3main restart, the only way to do this is to not run the watchdog at all. To do that, change this in 1bash:
MINER_WATCHDOG="YES"
to
MINER_WATCHDOG="NO"
Shouldn't be an overclock issue. Running Hush right now for two days straight no issue.
Running 50 core and 200 mem at 80% TDP.
Def do not want to turn off watchdog. I've ran with these settings fine for multiple coins and long periods of time. ETH was running for over a month with zero issues and now all the sudden with the same settings, same miner, different rigs crashes every 12ish hours within minutes of eachother. Even switched servers and same issue. Something else is at play here.
Like I mentioned, my personal desktop that mines I see ethminer restart randomly exactly when the rigs go down. It doesn't make much sense.
I had a similar issue to this a few months ago and was told it was an issue within NVOC. That when the miner switched to "donation" mode it would freeze and the solution was to switch miners. I've tried ETHMINER, GENOIL and CLAYMORE. All the same issue.
Oh, the miner you are referring to was an older version of the DSTM miner. The dev only had one pool configured for his donations and a network issue in Europe hosed a bunch of us for hours early one morning. That was fixed in a newer DSTM miner version, several versions ago. That was not a freeze. It was just the miner trying over and over to connect to something that it could not. When the watchdog saw that the GPUs were idle, it would restart the miner a few times and ultimately the box. This went on for hours and hours and even destroyed the boot USB drive for some folks running an older nvOC version.
Have you checked to see what the system logs (/var/logs) say? I am assuming when you say "freeze" that the entire host becomes unresponsive and has to be hard rebooted.
Yeah it completely freeze's and becomes unresponsive. Here is the error after turning on the logs.
tail -f /home/m1/nvoc_logs/watchdog-screenlog.0
m1@m1-desktop:~$ tail -f /home/m1/nvoc_logs/watchdog-screenlog.0
Watchdog for nvOC v0019-2.0 - Community Release
Version: v0019-2.0.011
LOG FILE: (Showing the last 10 recorded entries)
| 7 | 112W | 3.87 Sol/W |
| 8 | 116W | 4.26 Sol/W |
| 9 | 98W | 3.82 Sol/W |
| 10 | 120W | 3.65 Sol/W |
| 11 | 121W | 3.68 Sol/W |
| 12 | 118W | 3.58 Sol/W |
+-----+-------------+--------------+
CRITICAL: Sun Apr 29 09:37:52 MST 2018 - GPU Utilization is too low: restarting 3main...
WARNING: Mon Apr 30 02:25:19 MST 2018 - Internet is down, checking...
WARNING: Mon Apr 30 09:38:05 MST 2018 - Internet is down, checking...
Mon Apr 30 22:34:23 MST 2018 - No mining issues detected.
GPU UTILIZATION: 100 96 100 100 100 99 97 100 100 98 100 99 99
GPU_COUNT: 13
Mon Apr 30 22:34:43 MST 2018 - GPU 2 under threshold found - GPU UTILIZATION: 59
Mon Apr 30 22:34:43 MST 2018 - GPU 3 under threshold found - GPU UTILIZATION: 0
Mon Apr 30 22:34:44 MST 2018 - GPU 4 under threshold found - GPU UTILIZATION: 0
Mon Apr 30 22:34:44 MST 2018 - GPU 5 under threshold found - GPU UTILIZATION: 0
Mon Apr 30 22:34:45 MST 2018 - GPU 6 under threshold found - GPU UTILIZATION: 0
Mon Apr 30 22:34:45 MST 2018 - GPU 7 under threshold found - GPU UTILIZATION: 0
Mon Apr 30 22:34:46 MST 2018 - GPU 8 under threshold found - GPU UTILIZATION: 10
Mon Apr 30 22:34:46 MST 2018 - GPU 9 under threshold found - GPU UTILIZATION: 0
Mon Apr 30 22:34:46 MST 2018 - GPU 10 under threshold found - GPU UTILIZATION: 0
Mon Apr 30 22:34:46 MST 2018 - GPU 11 under threshold found - GPU UTILIZATION: 0
Mon Apr 30 22:34:47 MST 2018 - GPU 12 under threshold found - GPU UTILIZATION: 0
Connection to google.com 443 port [tcp/https] succeeded!
Connection to google.com 443 port [tcp/https] succeeded!
WARNING: Mon Apr 30 22:34:47 MST 2018 - Found no miner, jumping to 3main restart
WARNING: Mon Apr 30 22:34:47 MST 2018 - Problem found: See diagnostics below:
Percent of GPUs bellow threshold: 84 %
name, pstate, temperature.gpu, fan.speed [%], utilization.gpu [%], power.draw [W], power.limit [W]
GeForce GTX 1070 Ti, P0, 63, 60 %, 27 %, 40.54 W, 120.00 W
GeForce GTX 1070 Ti, P0, 65, 75 %, 0 %, 32.57 W, 120.00 W
GeForce GTX 1070 Ti, P0, 68, 65 %, 0 %, 42.96 W, 120.00 W
GeForce GTX 1080 Ti, P0, 66, 60 %, 0 %, 60.21 W, 175.00 W
GeForce GTX 1070 Ti, P0, 60, 60 %, 0 %, 31.63 W, 120.00 W
GeForce GTX 1070 Ti, P0, 59, 60 %, 0 %, 27.96 W, 120.00 W
GeForce GTX 1070 Ti, P0, 65, 80 %, 0 %, 35.29 W, 120.00 W
GeForce GTX 1070 Ti, P0, 67, 90 %, 0 %, 41.77 W, 120.00 W
GeForce GTX 1080, P0, 61, 60 %, 0 %, 44.37 W, 120.00 W
GeForce GTX 1070, P0, 67, 80 %, 0 %, 36.41 W, 120.00 W
GeForce GTX 1070 Ti, P0, 67, 65 %, 0 %, 42.16 W, 120.00 W
GeForce GTX 1070 Ti, P0, 62, 60 %, 0 %, 34.22 W, 120.00 W
GeForce GTX 1070, P0, 60, 60 %, 0 %, 37.31 W, 120.00 W
+-----+-------------+--------------+
| 0 | 121W | 3.50 Sol/W |
| 1 | 122W | 3.59 Sol/W |
| 2 | 120W | 3.69 Sol/W |
| 3 | 138W | 4.42 Sol/W |
| 4 | 117W | 3.83 Sol/W |
| 5 | 118W | 3.81 Sol/W |
| 6 | 113W | 3.81 Sol/W |
| 7 | 125W | 3.55 Sol/W |
| 8 | 115W | 4.23 Sol/W |
| 9 | 120W | 3.32 Sol/W |
| 10 | 122W | 3.49 Sol/W |
| 11 | 118W | 3.69 Sol/W |
| 12 | 121W | 3.40 Sol/W |
+-----+-------------+--------------+
CRITICAL: Mon Apr 30 22:34:47 MST 2018 - GPU Utilization is too low: restarting 3main...
Mon Apr 30 22:37:07 MST 2018 - Back 'on watch' after miner restart
GPU UTILIZATION: 95 98 99 100 99 96 99 100 100 100 99 100 100
GPU_COUNT: 13