Author

Topic: s17+ Simultaneous failure of all three hashboards (Read 92 times)

newbie
Activity: 2
Merit: 0
Thank you BitMaxz and JonnyRocket for the replies.

As suggested, I removed (and air cleaned) all three boards and tried them one by one.  All three work.  Plug them all back in an 2 out of 3 work.  Swapped the ordering in the unit, to find the same board fails.  Swapped that one board into the other unit that was working fine, and not surprised to find that now hashes on 2 of 3 (incidentally I put the one I pulled from the working unit into the 'bad' unit and that runs 3 of 3).  So it would appear that I do indeed have one bad board, and having cleaned and fiddled with all the connectors whilst testing, I am now in a position where it does not prevent the other 2 boards from working.

I found it odd that the 'bad' board did work when it was the only board in use, however I only let it run for 5 or 10 mins before continuing testing with the other boards.  I want to run that test again for a longer period, then I will get some logs on here once Im happy I have done everything possible from a physical point of view.

T
newbie
Activity: 12
Merit: 1
Im so glad I sold all my S17+ last year. They are all garbage quality, but thats not what you want to hear...

I would carefully try each hashboard one by one in the working miner to see if any hash.  If the ALL hash it is certainly a power supply/controller issue. Buy extra PSUs/controllers for these machines.... If 2/3 hash, the bad board is killing the good ones and not allowing the unit to work. Could also be a controller failure. You could try the good controller with the good hashboards (after testing) and see if that works. Swapping controllers sometimes works.
legendary
Activity: 3234
Merit: 2943
Block halving is coming.
Yes if one hashboard is failing it could affect the other 2 hashboards if you want a better analysis for this unit I suggest you post the whole kernel logs here.

I guess that hashboard have a connection issue replug all cable PSU and ribbon cable from the control board and hashboard should fix this issue.
You can also try running them one by one if you suspected the PSU just disconnect the two hashboards and run only one hashboard to test which one is running fine.
Also, check the PSU connector and test them one by one and then update here the result including logs.
newbie
Activity: 2
Merit: 0
Good evening all

I recently purchased 2 s17+ boxes, in used, but good condition (cosmetically anyway).
Each was checked on collection and everything seemed to be absolutely fine (30 min test, hashrate as expected, kernel logs 'normal', temps and fans again normal).

Having moved the boxes to their new home (200km drive - well packed) one of the boxes stopped hashing after approx 2 hours and despite best efforts, it has not recovered.

Having restarted the problematic machine, logs indicated 0 asics found on all 3 boards.
Reinstalled firmware (June 2020) no change.
Contacted Bitmain support and got a link to factory firmware, no change.
Updated to latest firmware, no change.
Now running Braiins (from SD), no change.

The other s17+ (which this one is installed next to) is running perfectly, so I dont think this can be an environmental issue (box ~58, chips ~72, fans all 3k)

I could understand 1 board failing, but not all 3, and not at the same time?

I do have the opportunity to switch out the hashboards between the 2 units (in case there is a controller or PSU issue) however I do not have much (any) experience with this hardware, and although the process looks relatively straight forward, I am somewhat apprehensive about interfering with the one thats working.

I did try disabling all combinations of hashboards whilst running Braiins, but this made no difference.

Is anyone able to suggest any possible cause for a systemic failure of this nature, or any other practical steps I could take to troubleshoot further?

I can provide current logs if they might be of use (but with my basic understanding, it finds 0 asics on each board and then powers them down).  Unfortunately I do not have the log from the first failure.

Thanks

T

Jump to: