Hi,
Can anyone help me please and give me some info on this error:
18:42:06:498 12d8 GPU 4, GpuMiner kx failed 1
18:42:06:514 12d8 Set global fail flag, failed GPU4
18:42:06:404 11c8 Set global fail flag, failed GPU0
18:42:06:404 177c Set global fail flag, failed GPU1
18:42:06:404 f0c Set global fail flag, failed GPU2
18:42:06:654 f0c GPU 2 failed
18:42:06:733 11c8 GPU 0 failed
18:42:06:811 12d8 GPU 4 failed
18:42:06:858 17fc GPU 5, GpuMiner kx failed 1
18:42:06:858 17fc Set global fail flag, failed GPU5
18:42:06:905 177c GPU 1 failed
18:42:06:951 1390 GPU 3 failed
18:42:06:983 17fc GPU 5 failed
18:42:08:655 498 srv_thr cnt: 1, 192.168.XX.XX
18:42:08:670 498 recv: 51
18:42:08:670 498 srv pck: 50
18:42:08:670 498 NVML: cannot get current temperature, error 15
18:42:08:670 498 NVML: cannot get fan speed, error 15
18:42:08:670 498 NVML: cannot get current temperature, error 15
18:42:08:670 498 NVML: cannot get fan speed, error 15
18:42:08:670 498 srv bs: 0
18:42:08:670 498 sent: 196
18:42:09:514 1070 NVML: cannot get current temperature, error 15
18:42:09:514 1070 NVML: cannot get fan speed, error 15
18:42:09:514 1070 NVML: cannot get current temperature, error 15
18:42:09:514 1070 NVML: cannot get fan speed, error 15
18:42:12:530 1070 NVML: cannot get current temperature, error 15
18:42:12:530 1070 NVML: cannot get fan speed, error 15
18:42:12:530 1070 NVML: cannot get current temperature, error 15
18:42:12:530 1070 NVML: cannot get fan speed, error 15
18:42:13:718 1094 srv_thr cnt: 1, 192.168.XX.XX
18:42:13:718 1094 recv: 51
18:42:13:718 1094 srv pck: 50
18:42:13:718 1094 NVML: cannot get current temperature, error 15
18:42:13:718 1094 NVML: cannot get fan speed, error 15
18:42:13:718 1094 NVML: cannot get current temperature, error 15
18:42:13:718 1094 NVML: cannot get fan speed, error 15
18:42:13:718 1094 srv bs: 0
18:42:13:718 1094 sent: 196
I had the same issue and did the following things
a) shutdown everything and make sure no loss riser, means GPU are fully fitted into the riser and not loss and all the cables are fully in.
b) if you are overclocking GPUs, reduce OC by a bit, may be 10 each (core and memory) and see if the things improve.
I did lowered the OC but it's the same and even worse. First time it ran for 3 days and failed. Then 1 day and fail again. And now 6 hours of the third fail which made me to ask here, it has failed again. I did checked the risers. But it's looks good. And what are the chances that 2 risers be loose like that?
Right now I'm clean Uninstalling the NVIDIA driver and reinstalling it to see if the problem goes away. My next step is PSU. Since both cards are getting power from same 700W PSU and nothing else is connected to that PSU. (P.S. I said PSU is 1050W. I made a mistake that's another miner. On this rig, there are two 700W PSUs. 1 for MB and 4 1050 cards. And 1 for two 1060 cards which are failing)