Author

Topic: Diagnose this (Read 252 times)

hero member
Activity: 1274
Merit: 556
October 17, 2017, 06:39:18 PM
#7
So I have managed to clean up drivers, install blockchain drivers, install newest Claymore and run it.

One card is definitely lagging in terms of core speed it can sustain compared to the others (talking 25-30 MHz lower while dual mining). At least I've identified it now... will have to let it run a while at these lower clocks and see if that solves it. If not, I'll switch riser. If that doesn't do it, I'll know the GPU is probably on its way out.

Hopefully it's just a little degradation.
hero member
Activity: 935
Merit: 1001
I don't always drink...
October 17, 2017, 06:33:05 PM
#6
I'm going to say risers.  Those things always cause problems and are the weakest link.  Since it reboots in 30 minutes it shouldn't be to hard to track down which one.  Just use the -di option to disable say 3 cards.  If it crashes, then enable the 3 you disabled and then disable a different set.  Keep going until you narrow it down to just one card that causes the system to crash.  It's either that card or that riser.

^THIS!
Been there, done that.
hero member
Activity: 1274
Merit: 556
October 17, 2017, 03:48:33 PM
#5
Event log says system rebooted without cleanly shutting down first... so essentially some sort of crash/failure happens.

Using -di seems like a good idea, albeit potentially long and annoying/costly to diagnose. But that actually gave me another idea. Since that watchdog version of Claymore is pretty crap, I might just switch to mining ETH and use that Claymore monitoring instead. It's usually a lot better at reporting GPU failures.

Means I'm gonna have to update drivers and everything. Oh well. It was overdue anyway...
legendary
Activity: 1096
Merit: 1021
October 17, 2017, 03:29:20 PM
#4
I'm going to say risers.  Those things always cause problems and are the weakest link.  Since it reboots in 30 minutes it shouldn't be to hard to track down which one.  Just use the -di option to disable say 3 cards.  If it crashes, then enable the 3 you disabled and then disable a different set.  Keep going until you narrow it down to just one card that causes the system to crash.  It's either that card or that riser.
hero member
Activity: 1498
Merit: 597
October 17, 2017, 03:25:10 PM
#3
cards have modded bios ?
cards are overclocked ?
if yes overclocked one by one or one setting applies to all cards ?
cards have pretty much the same ASIC quality ?
Did you tried mining with default settings ? reset all overclocking / undervolting ?
full member
Activity: 490
Merit: 105
October 17, 2017, 03:14:35 PM
#2
What does the windows eventlog say about the issue?
hero member
Activity: 1274
Merit: 556
October 17, 2017, 02:11:09 PM
#1
Two identical rigs of 7x RX570 on MSI Z170A Pro Carbon.

Both mining XMR using Claymore v9.6 (iirc). 1220/1925@975mV, custom timings.




One rig mining for 60 days with only 1 interruption (on Windows I find that quite remarkable).

The other used to be chugging pretty well but recently is just not having it anymore. It will work for 30min then reboot, and start mining again with Wattman stating default clocks restored after some sort of error (so not reapplying my application specific clocks and voltages).

Claymore watchdog-enabled logs say absolutely nothing.

Hwinfo64 shows only a handful of memory errors (i.e. Maybe 10 in the space of 5 minutes).
I bumped core voltage by 20 mV. No change.

I'm now lowering target temp to 69C from 74. Not expecting miracles.

Any suggestions?
Jump to: