Hello,
When we get a watchdog alert, is there an easy way to determine which GPU created the error? It looks to me that in the example below, the all 10 GPUs are failing at the same time which is not possible.
Do you have any peace of advice?
WARNING: Tue Feb 27 09:35:24 CET 2018 - Problem found: See diagnostics below:
Percent of GPUs bellow threshold: 100 %
name, pstate, temperature.gpu, fan.speed [%], utilization.gpu [%], power.draw [W], power.limit [W]
GeForce GTX 1060 6GB, P5, 63, 60 %, 1 %, 12.96 W, 80.00 W
GeForce GTX 1060 6GB, P2, 45, 50 %, 0 %, 27.97 W, 80.00 W
GeForce GTX 1060 6GB, P5, 58, 50 %, 0 %, 13.95 W, 80.00 W
GeForce GTX 1060 6GB, P5, 59, 50 %, 0 %, 11.65 W, 80.00 W
GeForce GTX 1060 6GB, P5, 49, 50 %, 0 %, 12.42 W, 80.00 W
GeForce GTX 1060 6GB, P5, 56, 50 %, 0 %, 10.57 W, 80.00 W
GeForce GTX 1060 6GB, P5, 54, 50 %, 0 %, 12.23 W, 80.00 W
GeForce GTX 1060 6GB, P5, 59, 50 %, 0 %, 9.47 W, 80.00 W
GeForce GTX 1060 6GB, P5, 50, 50 %, 0 %, 12.10 W, 80.00 W
GeForce GTX 1060 6GB, P5, 41, 50 %, 0 %, 13.65 W, 80.00 W
[101m[1;30m ✘ [35m09:35:03[0m[30m|[34mcuda-4 [0m Error CUDA mining: an illegal memory access was encountered
CUDA error in func 'search' at line 489 : an illegal memory access was encountered.
[101m[1;30m ✘ [35m09:35:03[0m[30m|[34mcuda-6 [0m Error CUDA mining: an illegal memory access was encountered
[32m m [35m09:35:04[0m[30m|[34methminer[0m Speed [1;36m231.75[0m Mh/s gpu/0 [36m22.52[0m gpu/1 [36m24.14[0m gpu/2 [36m24.06[0m gpu/3 [36m22.84[0m gpu/4 [36m22.84[0m gpu/5 [36m22.84[0m gpu/6 [36m22.84[0m gpu/7 [36m22.84[0m gpu/8 [36m22.76[0m gpu/9 [36m24.06[0m [A338+0:R0+0:F0] Time: 01:36[0m
[32m m [35m09:35:06[0m[30m|[34methminer[0m Speed [1;36m213.04[0m Mh/s gpu/0 [36m20.73[0m gpu/1 [36m22.13[0m gpu/2 [36m22.13[0m gpu/3 [36m20.99[0m gpu/4 [36m20.99[0m gpu/5 [36m20.99[0m gpu/6 [36m20.99[0m gpu/7 [36m20.99[0m gpu/8 [36m20.99[0m gpu/9 [36m22.13[0m [A338+0:R0+0:F0] Time: 01:36[0m
[94m ℹ [35m09:35:06[0m[30m|[34mstratum [0m Received new job #ff63903c[0m seed: #1261dfe17d0bf58cb2861ae84734488b[0m target: #0000000112e0be826d694b2e[0m
[32m m [35m09:35:08[0m[30m|[34methminer[0m Speed [1;36m208.73[0m Mh/s gpu/0 [36m20.36[0m gpu/1 [36m21.62[0m gpu/2 [36m21.62[0m gpu/3 [36m20.57[0m gpu/4 [36m20.57[0m gpu/5 [36m20.57[0m gpu/6 [36m20.57[0m gpu/7 [36m20.57[0m gpu/8 [36m20.57[0m gpu/9 [36m21.72[0m [A338+0:R0+0:F0] Time: 01:36[0m
[32m m [35m09:35:10[0m[30m|[34methminer[0m Speed [1;36m208.73[0m Mh/s gpu/0 [36m20.36[0m gpu/1 [36m21.62[0m gpu/2 [36m21.62[0m gpu/3 [36m20.57[0m gpu/4 [36m20.57[0m gpu/5 [36m20.57[0m gpu/6 [36m20.57[0m gpu/7 [36m20.57[0m gpu/8 [36m20.57[0m gpu/9 [36m21.72[0m [A338+0:R0+0:F0] Time: 01:36[0m
[32m m [35m09:35:12[0m[30m|[34methminer[0m Speed [1;36m208.73[0m Mh/s gpu/0 [36m20.36[0m gpu/1 [36m21.62[0m gpu/2 [36m21.62[0m gpu/3 [36m20.57[0m gpu/4 [36m20.57[0m gpu/5 [36m20.57[0m gpu/6 [36m20.57[0m gpu/7 [36m20.57[0m gpu/8 [36m20.57[0m gpu/9 [36m21.72[0m [A338+0:R0+0:F0] Time: 01:36[0m
[94m ℹ [35m09:35:12[0m[30m|[34mstratum [0m Received new job #76b48990[0m seed: #1261dfe17d0bf58cb2861ae84734488b[0m target: #0000000112e0be826d694b2e[0m
[32m m [35m09:35:14[0m[30m|[34methminer[0m Speed [1;36m208.73[0m Mh/s gpu/0 [36m20.36[0m gpu/1 [36m21.62[0m gpu/2 [36m21.62[0m gpu/3 [36m20.57[0m gpu/4 [36m20.57[0m gpu/5 [36m20.57[0m gpu/6 [36m20.57[0m gpu/7 [36m20.57[0m gpu/8 [36m20.57[0m gpu/9 [36m21.72[0m [A338+0:R0+0:F0] Time: 01:36[0m
[32m m [35m09:35:16[0m[30m|[34methminer[0m Speed [1;36m208.73[0m Mh/s gpu/0 [36m20.36[0m gpu/1 [36m21.62[0m gpu/2 [36m21.62[0m gpu/3 [36m20.57[0m gpu/4 [36m20.57[0m gpu/5 [36m20.57[0m gpu/6 [36m20.57[0m gpu/7 [36m20.57[0m gpu/8 [36m20.57[0m gpu/9 [36m21.72[0m [A338+0:R0+0:F0] Time: 01:36[0m
[32m m [35m09:35:18[0m[30m|[34methminer[0m Speed [1;36m208.73[0m Mh/s gpu/0 [36m20.36[0m gpu/1 [36m21.62[0m gpu/2 [36m21.62[0m gpu/3 [36m20.57[0m gpu/4 [36m20.57[0m gpu/5 [36m20.57[0m gpu/6 [36m20.57[0m gpu/7 [36m20.57[0m gpu/8 [36m20.57[0m gpu/9 [36m21.72[0m [A338+0:R0+0:F0] Time: 01:36[0m
[32m m [35m09:35:20[0m[30m|[34methminer[0m Speed [1;36m208.73[0m Mh/s gpu/0 [36m20.36[0m gpu/1 [36m21.62[0m gpu/2 [36m21.62[0m gpu/3 [36m20.57[0m gpu/4 [36m20.57[0m gpu/5 [36m20.57[0m gpu/6 [36m20.57[0m gpu/7 [36m20.57[0m gpu/8 [36m20.57[0m gpu/9 [36m21.72[0m [A338+0:R0+0:F0] Time: 01:36[0m
[32m m [35m09:35:22[0m[30m|[34methminer[0m Speed [1;36m208.73[0m Mh/s gpu/0 [36m20.36[0m gpu/1 [36m21.62[0m gpu/2 [36m21.62[0m gpu/3 [36m20.57[0m gpu/4 [36m20.57[0m gpu/5 [36m20.57[0m gpu/6 [36m20.57[0m gpu/7 [36m20.57[0m gpu/8 [36m20.57[0m gpu/9 [36m21.72[0m [A338+0:R0+0:F0] Time: 01:36[0m
CRITICAL: Tue Feb 27 09:35:25 CET 2018 - GPU Utilization is too low: restarting 3main...