I just build a Rig with 2 RX480 8G (Saphire) and an Asrock H81 pro btc v2 (4GB dram, +20GB swap), on ubuntu 16.04 desktop and amdgpu-pro-17.10-414273 (reinstalled from scratch with a clean procedure)
I don't yet have all my risers so GPU 0 is on my 16x PCIE slot.
Rom are stock (I suppose, reconditioned).
I used claymore 9.3 to mine ETH only.
I optimised ethi and dcri,
my config.txt contains
-tt 80 -ttli 85 -tstop 90 -fanmin 80
-etht 200 -ftime 10 -r 1 -retrydelay 5
-gser 1
GPU_MAX_HEAP_SIZE=100
GPU_USE_SYNC_OBJECTS=1
GPU_MAX_ALLOC_PERCENT=100
GPU_SINGLE_ALLOC_PERCENT=100
and all seemed fine. never invalid shares, stable throughput at 24.535MH/s*2 (total >49MH/s ...). I consume 395W from the plug (MB consume about 50W).
at the beginning there was no other message than
kernel: gmc_v8_0_process_interrupt: 36 callbacks suppressed
kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x00504802
kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x001E920A
kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x04048002
kernel: amdgpu 0000:01:00.0: VM fault (0x02, vmid 2) at page 2003466, read from 'TC4' (0x54433400) (72)
...
WATCHDOG: GPU 0 hangs in OpenCL call, exit
...
gmc_v8_0_process_interrupt: 36 callbacks suppressed
amdgpu 0000:01:00.0: GPU fault detected: 146 0x0600480c
amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x001032C0
amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0804800C
amdgpu 0000:01:00.0: VM fault (0x0c, vmid 4) at page 1061568, read from 'TC4' (0x54433400) (72)
amdgpu 0000:01:00.0: GPU fault detected: 147 0x06004808
amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x001032C0
amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x08048008
amdgpu 0000:01:00.0: VM fault (0x08, vmid 4) at page 1061568, read from 'TC4' (0x54433400) (72)
amdgpu 0000:01:00.0: GPU fault detected: 146 0x0678040c
amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x001432CF
amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0800400C
amdgpu 0000:01:00.0: VM fault (0x0c, vmid 4) at page 1323727, read from 'TC1' (0x54433100) (4)
...
amdgpu 0000:01:00.0: GPU fault detected: 147 0x06004808
amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x001432C0
amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x08048008
amdgpu 0000:01:00.0: VM fault (0x08, vmid 4) at page 1323712, read from 'TC4' (0x54433400) (72)
...
WATCHDOG: GPU 0 hangs in OpenCL call, exit
...
kernel: gmc_v8_0_process_interrupt: 36 callbacks suppressed
kernel: amdgpu 0000:01:00.0: GPU fault detected: 146 0x06404404
kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x001832C8
kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x04044004
kernel: amdgpu 0000:01:00.0: VM fault (0x04, vmid 2) at page 1585864, read from 'TC5' (0x54433500) (68)
kernel: amdgpu 0000:01:00.0: GPU fault detected: 146 0x0668440c
kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x001832CD
kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0404400C
kernel: amdgpu 0000:01:00.0: VM fault (0x0c, vmid 2) at page 1585869, read from 'TC5' (0x54433500) (68)
...
WATCHDOG: GPU 0 hangs in OpenCL call, exit
...
ethdcrminer64[1166]: ETH - Total Speed: 49.078 Mh/s, Total Shares: 41, Rejected: 0, Time: 01:03
ethdcrminer64[1166]: ETH: GPU0 24.539 Mh/s, GPU1 24.539 Mh/s
ethdcrminer64[1166]: GPU0 t=77C fan=72%, GPU1 t=75C fan=72%
gmc_v8_0_process_interrupt: 36 callbacks suppressed
amdgpu 0000:01:00.0: GPU fault detected: 147 0x06004808
amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x001032C0
amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x04048008
amdgpu 0000:01:00.0: VM fault (0x08, vmid 2) at page 1061568, read from 'TC4' (0x54433400) (72)
amdgpu 0000:01:00.0: GPU fault detected: 146 0x0678480c
amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x001832CF
...
amdgpu 0000:01:00.0: GPU fault detected: 146 0x06780804
amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x001432CF
amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x04008004
amdgpu 0000:01:00.0: VM fault (0x04, vmid 2) at page 1323727, read from 'TC0' (0x54433000) (8)
...
WATCHDOG: GPU 0 hangs in OpenCL call, exit
...
I have many hypothesis.
- One if that it does not work well with 16x slot ? Is that a known problem?
- One is that it is related to overheating of the card (I've pushed the fan to test) . Is it s usual symptom?
- One is overclocking (I did not change ROM but maybe the card is modded). Is it a usual symptoom
- One is my DCRI/ETHI being "abusive" for the card . Is it a typical symptom?
- One is software problem. NB: I've migrated to 9.2. maybe some option to tweak? Since GPU1 is OK (maybe it does not have time to worry), I'm not convinced.
The good point is that the watchdog works well...