DEAD CARD DETECTION--
My Vega64 rig was stable for 2 days mining CNr on NiceHash with TeamRedMiner (2.2kH/s per GPU), but I noticed that poolside hashrate varied a lot, both on the NiceHash dashboard (wild swings +/-), and on the TRM console (more steady but low). I thought to experiment with ETH.
Switching to ETH from CNr, TRM to PhoenixMiner, I initially achieved 48MH/s ETH per card. But, very shortly 1 GPU was reported dead and not hashing. The rig generally began behaving badly and I eventually pulled that GPU. I tried ETH overnight with the remaining 3 cards but woke to a zero hash rate.
Unplugging and replacing the "dead" GPU a few times brought no better results, and switching back to TRM mining CNr still wound up with a dead GPU detected and failed launches.
So, I commented out my call to AMDmemtweak in my "custom.sh" file (a script file run at boot by ethOS to launch utilities and such). After that I configured the rig to mine VertCoin (lyra2v3) with TRM. Lyra2v3 is a compute heavy algorithm, and I have my memory clocked low (500mhz). TRM booted just fine, both without and with my "undead" GPU. The cards are hashing VTC at about 100MH/s each.
I sincerely expected to RMA the card last night. Today it works fine mining Lyra2v3, but at a higher wattage (250+W/GPU) than when mining CNr (200W/GPU) and ETH (150W/GPU).
I need to do some more reading about the strap values, I think. Any tips or comments left on the matter will be appreciated. --scryptr
If your cards aren't adequately cooled, a super high REF can cause hw errors. My vegas @ 40 are fine w/ 15600. My old, cranky, and hotter ref 56 doesn't seem to like that (it's temperamental even w/o timing mods.) That's what I'd look at first. After that, I've found RCDRD and RCDWR to be the most common culprits of hw errors.
As for poolside h/r variability - it is a fact of life w/ CN in my experience. 24hr h/r reports on the pool are helpful, but generally, those are not even long enough avgs (unless your diff is super low.) Best bet is to look at miner reported pool rate, after at least a couple days, if you can keep your rig up that long.