Im not sure how the dead device detection as sick works. I 'think' AM is rebooting my rigs without cause.
Rig: 6x Vega 64 running TeamRedMiner
Log from Miner:
[2019-08-06 12:41:44] Pool europe.cryptonight-hub.miningpoolhub.com share accepted. (GPU0) (a:1690 r:0) (378 ms)
[2019-08-06 12:41:46] Pool europe.cryptonight-hub.miningpoolhub.com share accepted. (GPU0) (a:1691 r:0) (377 ms)
[2019-08-06 12:43:56] Initializing GPU 0.
[2019-08-06 12:43:57] Initializing GPU 1.
[2019-08-06 12:43:58] Initializing GPU 2.
[2019-08-06 12:44:00] Initializing GPU 3.
[2019-08-06 12:44:01] Initializing GPU 4.
[2019-08-06 12:44:02] Initializing GPU 5.
[2019-08-06 12:44:03] Watchdog thread starting.
The miner has a built in watchdog, and im pretty sure if a GPU had crashed the miner watchdog would have picked it up before AM does. Note how the shares are submitted till 12:41:46 and then a reboot of rig, so no log for 2 mins and miner starts again
Log from Awesome Miner:
06/08/2019 12:41:56 PM.132 [022] [ S][ManagedMiner#7 - Vega Rig] Dead device detection: Sick
06/08/2019 12:41:56 PM.132 [022] [ S]Rule execute: Dead device detection, miners: [Vega Rig, 192.168.100.33]
06/08/2019 12:41:56 PM.144 [022] [ S][ManagedMiner#7 - Vega Rig] Run Miner Command: Reboot
06/08/2019 12:41:56 PM.145 [022] [ S]Rule execution done: Dead device detection
Based on the above logs, 10 seconds after the last shares the rigs is marked as sick
Log from Remote Agent:
06-08-2019 12:41:52 PM.950 [009] [ManagedMiner#7 - Vega Rig] : ProcessMiner
06-08-2019 12:41:52 PM.950 [009] Execute API, Hostname: 192.168.100.33:4028, Command: {"command":"config"}
06-08-2019 12:41:52 PM.957 [009] Execute API, Hostname: 192.168.100.33:4028, Command: {"command":"summary"}
06-08-2019 12:41:52 PM.957 [009] Execute API, Hostname: 192.168.100.33:4028, Command: {"command":"privileged"}
06-08-2019 12:41:52 PM.957 [009] Execute API, Hostname: 192.168.100.33:4028, Command: {"command":"devs"}
06-08-2019 12:41:52 PM.957 [009] Execute API, Hostname: 192.168.100.33:4028, Command: {"command":"pools"}
06-08-2019 12:41:52 PM.957 [009] Execute API, Hostname: 192.168.100.33:4028, Command: {"command":"coin"}
06-08-2019 12:41:52 PM.957 [009] Execute API, Hostname: 192.168.100.33:4028, Command: {"command":"stats"}
06-08-2019 12:41:54 PM.841 [020] MinerService.AddMinerCommand
06-08-2019 12:41:54 PM.841 [020] [ S]MinerService.AddMinerCommand from client: 192.168.100.29:33439, Reboot
06-08-2019 12:41:54 PM.841 [020] [ S]Command request: Reboot
06-08-2019 12:41:54 PM.841 [007] [ S][ManagedMiner#7 - Vega Rig] Execute command: Reboot
06-08-2019 12:41:54 PM.841 [007] [ S]Reboot of Managed Miner initiated
06-08-2019 12:41:54 PM.904 [007] [ S]Miner commands left to process: 0
Last share by miner : 12:41:46
Last stats query by AM : 12:41:52
Declared sick and reboot init : 12:41:54
I have no clue how this is happening. Since i am pretty sure TRM watchdog would pop the dead GPU and log it before AM !!