Need help guys. My rig keeps crashing and I don't know why. Running rxoc v0012
mobo is biostar tb250-btc
gpu 1 is a sapphire 580 4gb nitro+
gpu 2 is a xfx 480 4gb (with the stupid white leds)
If I run both gpus with unchanged clocks on onebash, I get pcie bus errors and it fills up the USB. I know it's not the risers because I tested them individually (after receiving replacements from the seller) on both gpus using smOS and rxoc individually and got no pcie errors.
I'm mining ZEC and it doesn't matter if I use optiminer or claymore, eventually the system hangs due to some error. I checked the error logs and this is a snippet of what I find
from the xorg.0.log: [ 2402.679] (WW) AMDGPU(0): amdgpu_dri2_flip_event_handler: Pageflip completion event has impossible msc 143245 < target_msc 143246 (a crapload of these)
from the syslog.log: only the 580 gpu running:
Jul 14 21:14:30 m1-desktop systemd[1]: dev-disk-by\x2duuid-d06ff735\x2d6872\x2d4264\x2daa59\x2dd42811d47b35.swap: Job dev-disk-by\x2duuid-d06ff735\x2d6872\x2d4264\x2daa59\x2dd42811d47b35.swap/start failed with result 'dependency'.
Jul 14 21:14:30 m1-desktop systemd[1]: dev-disk-by\x2duuid-d06ff735\x2d6872\x2d4264\x2daa59\x2dd42811d47b35.device: Job dev-disk-by\x2duuid-d06ff735\x2d6872\x2d4264\x2daa59\x2dd42811d47b35.device/start failed with result 'timeout'.
Jul 14 21:17:01 m1-desktop CRON[4102]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
(the miner crashed and restarted at 21:15)
Jul 14 21:32:31 m1-desktop kernel: [ 2752.674236] gmc_v8_0_process_interrupt: 39 callbacks suppressed
Jul 14 21:32:31 m1-desktop kernel: [ 2752.674247] amdgpu 0000:01:00.0: GPU fault detected: 147 0x09020402
Jul 14 21:32:31 m1-desktop kernel: [ 2752.674253] amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00018920
Jul 14 21:32:31 m1-desktop kernel: [ 2752.674257] amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x03004002
Jul 14 21:32:31 m1-desktop kernel: [ 2752.674263] amdgpu 0000:01:00.0: VM fault (0x02, vmid 1) at page 100640, write from 'TC1' (0x54433100) (4)
Jul 14 21:32:31 m1-desktop kernel: [ 2752.731128] amdgpu 0000:01:00.0: IH ring buffer overflow (0x00081AD0, 0x000016C0, 0x00001AE0)
(I had never seen these before by the way, just pcie bus errors)
from the terminal running the claymore miner and throwing a fit:
ZEC: 07/14/17-21:13:42 - New job from zec-us-west1.nanopool.org:6666
ZEC - Total Speed: 311.807 H/s, Total Shares: 9, Rejected: 0, Time: 00:13
ZEC: GPU0 311.807 H/s
GPU0 t=57C fan=66%
DevFee: ZEC: Stratum - connecting to 'zec.suprnova.cc' <46.105.114.185> port 2242
ZEC: 07/14/17-21:14:09 - New job from zec-us-west1.nanopool.org:6666
ZEC - Total Speed: 309.570 H/s, Total Shares: 9, Rejected: 0, Time: 00:14
ZEC: GPU0 309.570 H/s
GPU0 t=57C fan=66%
GPU0 t=57C fan=66%
ZEC: 07/14/17-21:15:10 - New job from zec-us-west1.nanopool.org:6666
ZEC - Total Speed: 310.868 H/s, Total Shares: 9, Rejected: 0, Time: 00:15
ZEC: GPU0 310.868 H/s
ZEC: 07/14/17-21:15:15 - SHARE FOUND - (GPU 0)
ZEC: Share accepted (166 ms)!
GPU0 t=57C fan=66%
GPU0 t=57C fan=66%
Miner thread hangs, need to restart miner!
What's going on? bad gpus? bad mobo? risers have been swapped out already 2 times. I am a total Linux noob by the way.
First I would try setting:
and seeing if that is what is causing the problem.
Ensure your fan speed is set high enough to keep your GPUs cool:
I would try 200
It also looks like there are some disk errors occurring: if disabling the overdrive and increasing the fan speed doesn't solve the problem: I would reimage the USB or use another USB.
Let me know how it goes.