I noticed that claymore sometimes fail
When it fails it will restart and fail again at the same card
So sometimes I see:
04:56:49:166 f04 em hbt: 0, fm hbt: 16,
04:56:49:166 f04 watchdog - thread 0, hb time 250
04:56:49:182 f04 watchdog - thread 1, hb time 78
04:56:49:182 f04 watchdog - thread 2, hb time 79172
04:56:49:182 f04 WATCHDOG: GPU 1 hangs in OpenCL call, exit
04:56:49:182 f04 watchdog - thread 3, hb time 79016
04:56:49:197 f04 WATCHDOG: GPU 1 hangs in OpenCL call, exit
04:56:49:197 f04 watchdog - thread 4, hb time 78
04:56:49:197 f04 watchdog - thread 5, hb time 250
04:56:49:197 f04 watchdog - thread 6, hb time 250
04:56:49:197 f04 watchdog - thread 7, hb time 94
04:56:49:213 f04 watchdog - thread 8, hb time 78
04:56:49:213 f04 watchdog - thread 9, hb time 250
04:56:49:213 f04 watchdog - thread 10, hb time 16
04:56:49:213 f04 watchdog - thread 11, hb time 188
04:56:50:353 f04 Restarting OK, exit...
Then it restart again and I got this
04:56:54:572 f14 OpenCL platform: AMD Accelerated Parallel Processing
04:56:54:572 f14 OpenCL initializing...
04:56:54:588 f14 AMD Cards available: 6
04:56:54:588 f14 GPU #0: Ellesmere, 4096 MB available, 36 compute units
04:56:54:588 f14 GPU #0 recognized as Radeon RX 480
04:56:54:588 f14 GPU #1: Ellesmere, 4096 MB available, 36 compute units
04:56:54:588 f14 GPU #1 recognized as Radeon RX 480
04:56:54:604 f14 GPU #2: Ellesmere, 4096 MB available, 36 compute units
04:56:54:604 f14 GPU #2 recognized as Radeon RX 480
04:56:54:604 f14 GPU #3: Ellesmere, 4096 MB available, 36 compute units
04:56:54:604 f14 GPU #3 recognized as Radeon RX 480
04:56:54:619 f14 GPU #4: Ellesmere, 4096 MB available, 36 compute units
04:56:54:619 f14 GPU #4 recognized as Radeon RX 480
04:56:54:619 f14 GPU #5: Ellesmere, 4096 MB available, 36 compute units
04:56:54:619 f14 GPU #5 recognized as Radeon RX 480
04:56:54:635 f14 POOL/SOLO version
04:56:54:635 f14 b255
04:56:54:635 f14 Platform: Windows
04:56:54:901 f14 start building OpenCL program for GPU 0...
04:57:03:792 f14 done
04:57:03:917 f14 start building OpenCL program for GPU 1...
So definitely GPU 1 is problematic. However, if I wait a few minutes and run it works.
I wonder what's the problem? Is it temperature? Is there a way to log temperature? Is there a way to tell claymore to wait for 1-2 minutes before restarting?
I am not the only one with the problem
https://forum.ethereum.org/discussion/10243/claymore-crashing-watchdog-hang-error