Any idea how to correct this?
Oh believe me, I have thought about this one, too, and have yet to arrive and what I feel is a good solution. Having the reboot time grow proportionally to #GPU is not ideal, IMO. I am all ears if somebody wants to chime in. The best that I can come up with is an upper limit on count instead of just 6xGPU. There has to be a better way.
The watchdog needs complete overhaul, some of its logic has no logic at all
I had cases when one GPU would hang and pretty much brings the whole rig to a crawl. Its 6x logic (x 13 GPU's for me) caused 3-4 hours of slow ~10% performance before it realized its time to reboot. Quick solution for me was to scratch its logic and change the code to reboot on any detected problem... I'd rather lose 1 minute rebooting than 4 hours of mining at 10%
I will revisit the watchdog soon, see if i can come up with some new logic for it