Hi,
I'am facing a problem and don't know how to approach it correctly.
I'am running simplemining with 6 x XFX RX470 8GB Hynix, there are two thing happening:
1. After about 10 minutes of mining, hashrate of GPU3 goes down from 28mh/s to about 22.. 20mh/s and stays there, after some time eventually it will crash with
the following message in dmesg:
[ 1084.587016] amdgpu 0000:07:00.0: GPU fault detected: 147 0x05708801
[ 1084.587016] amdgpu 0000:07:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00E0983C
[ 1084.587016] amdgpu 0000:07:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x06088002
Lots of those messages.
2. Sometimes all GPU's hashrate drops to about 10mh/s and the CPU usage goes to 100% ( claymore process ), again, the same error appears into dmesg (the gpu fault).
Now, I did shut off GPU3 and identify the card that cooled down, the problem is that the dmesg errors are for PCI ID 03 not for PCI ID 04 ( card that drops in hashrate ), here is my lspci:
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 67df (rev cf)
01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device aaf0
02:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 67df (rev cf)
02:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device aaf0
03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 67df (rev cf)
03:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device aaf0
04:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 67df (rev cf)
04:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device aaf0
05:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 67df (rev cf)
05:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device aaf0
06:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 67df (rev cf)
06:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device aaf0
Now, I believe claymore see GPU in order, right ? I mean, GPU0 is 01:00, GPU2 is 02:00, etc... is this correct ?
Because the card that is dropping in hashrate is GPU3 but the error in dmesg are for PCI ID: 03:00, which should be for GPU2 in claymore ( GPU0, 1, 2-- 3rd card ).
So, what card is faulty
? The one that drops in hashrate or the one that id refereed by dmesg error ?
Has anyone else encountered this type of problem ?
Thanks!