What I tried to underline is that:
a) I have no idea what caused my Saturns problems, my Jupiter running on the same line and same PSU type was not affected
b) The Saturns came with all boards affected by die #0 issue, (same as my Jupiter), they are running with die #0 off on 0.98
c) Both Saturns ran pretty well on 0.98.1 beta, one of them easily powering all dies, the other required more heat and was really hard to keep all dies powered on
d) Once 0.99 was released, i switched all my machines to it and for a day or two, they were running the same as on 0.98.1 beta
e) All of a sudden, both Saturns became affected with disabled cores, but in a different way: one of them simply has all cores disabled at startup (and keeps them like that) on 0.98.1 beta or official and 0.99. The other one starts normally, but after a while almost all cores on the same ASIC (coincidence?), the lower one on the web interface, gets disabled, only 0-5 cores remaining enabled on each die. Again, disabled cores are present on just one ASIC, the upper one in the web interface page.
This started at a day or 2 after 0.99 and the problem remained even after hard reset and affecting 0.98.1 as well, beta or official.
f) Both Saturns run normally (with their die #0 issue) on 0.98
My guess was that 0.99 FW somehow caused this issue, but it just persists now on 0.98.1 as well.
Hmm, that is indeed strange, and I had a similar incident... which is why I stay on the beta.... I had to reboot/reflash one of the machines a few times...
I 'm clueless on the actual problem, but as I stated yesterday, that very same machine/hashing board was fixed in a "desperation move" that I can't recommend...
It hasn't acted up in over 12 hours now so far.... (The tower grab incident)
but...
one of the other main techniques I employ, is keeping the temps to a specific "Sweetspot" which also seems to help stability, and help avoid the"Dwindledown"(also linked to mem issue)The "Spot" is also in a tight 5 degree range. Moreover, a board with troubles is likely to stay fully operational for longer periods, with a lower error rate.
I probably shouldn't mention this, but I've moved my target temps even higher on every board, which is how I achieved a full 12v & 50amp on almost every vrm, on every single board, which I agree...doesn't really make sense, but it is what it is... discovered it fiddling with the Die0, or bad/unop vrm, whichever you deem it. The board with the "Dead vrm" became the best one when I accidentally reached
80C a certain temp on that board. I thought that it may be specific to that board, but decided to try to duplicate it on the other boards, on that machine, to see if the behaved the same way....they did....so I did them all. And before
Edgar someone starts screaming I'm giving "Bad advice", stop right there... I'm NOT giving advice here. I'm simply telling you what I'm doing, and what the results are.
I wonder if There's a way I could get ahold of some of the "Bad" boards... RMA'd ones.... to try this on.... but I doubt it...they probably repair them...lol