Author

Topic: S9 hash board failing (Read 271 times)

newbie
Activity: 7
Merit: 3
January 09, 2018, 11:43:43 PM
#13
There is one other thing, once or twice an hour every couple hours, sometimes more. I get a red fault light on all three miners at the same time, sometimes it'll just be two miners. I thought maybe my Ethernet switch might be too cold as I have them in my garage. It's a 5 port Netgear ProSafe FS105. It states that it's operating temperature is between 32F - 104F. It's been freezing outside, but I seriously doubt that it's actually been down to 32F in my garage, especially with three miners throwing off so much heat. Nevertheless, I set up a small 250 watt heater pointed at the Ethernet switch. It sufficiently warms the switch without adding to much heat for the miners. But I'm still getting fault lights on all three miners and the kernel logs say, "Fatal Error: network connection lost!" They all eventually go back to green on their own, but this can't be normal.
  What makes me curious is that all this started when I switched from Slush to Bitcoin.com. Before switching to Bitcoin.com I never had any of these issues. Which is why I ask if any of this could be caused by the pool.
newbie
Activity: 7
Merit: 3
January 09, 2018, 11:24:12 PM
#12
So, I have three S9s from the Novemeber batch, all three are 14Th/s. I haven't had any serious problems to speak of other than in the last 10 hours one of the boards has racked up 121,295 HW errors or 0.1076%. The percentage has actually dropped, it was at 0.12 The board continues to hash at 4,733.24 GH/s and everything seems normal with the pool I'm mining in, Bitcoin.com.
The temps for all three boards are 74, 71, 67. The board with the errors is the one at 67.
On the miner status page there are no "x" in the ASICS status section. Can something within the pool do this, or would this be a board problem.
legendary
Activity: 3164
Merit: 2258
I fix broken miners. And make holes in teeth :-)
January 06, 2018, 10:38:53 PM
#11
Looks like the board is overheating.

These boards run over 110c, I would not call 82c anywhere close to overheating nor is it anywhere near the shutoff temp built into the firmware.

When you say "These boards..." are you talking about the board temperature being 110 degrees, or the chip temperatures?
Awesome miner shows , one unit for example, running at 70 deg C, but the chips are running 83/72/78.

Is it okay for these chip temperatures to be in the 80's? Do lower temps increase unit life?
Doesnt the firmware have a 90 degree unit shutdown?
100c is when things like FETs start to short and cut through. Likewise I would not run chips at higher than 80c normally, I prefer to run things at 70 or less. The higher the temerature, the more unstable the chips will be in terms of current draw and the more likely a brief airflow fluctuation would burn them out.

Your mileage may vary of course, but this is my opinion.
newbie
Activity: 19
Merit: 0
January 06, 2018, 08:44:58 PM
#10
Looks like the board is overheating.

These boards run over 110c, I would not call 82c anywhere close to overheating nor is it anywhere near the shutoff temp built into the firmware.

When you say "These boards..." are you talking about the board temperature being 110 degrees, or the chip temperatures?
Awesome miner shows , one unit for example, running at 70 deg C, but the chips are running 83/72/78.

Is it okay for these chip temperatures to be in the 80's? Do lower temps increase unit life?
Doesnt the firmware have a 90 degree unit shutdown?
legendary
Activity: 3164
Merit: 2258
I fix broken miners. And make holes in teeth :-)
January 04, 2018, 10:11:42 PM
#9
Looks like the board is overheating.

These boards run over 110c, I would not call 82c anywhere close to overheating nor is it anywhere near the shutoff temp built into the firmware.
82c is a bit unusual when the other boards are significantly cooler. There might be a reason why.

jr. member
Activity: 112
Merit: 4
January 04, 2018, 08:05:54 PM
#8
Power down and unplug/replug all the ribbon cables.  Switch them to different boards, even.  Looks like flaky comms as it found all the chips and they passed singleboardtest.
hero member
Activity: 756
Merit: 560
January 04, 2018, 12:02:29 PM
#7
Looks like the board is overheating.

These boards run over 110c, I would not call 82c anywhere close to overheating nor is it anywhere near the shutoff temp built into the firmware.
legendary
Activity: 3164
Merit: 2258
I fix broken miners. And make holes in teeth :-)
January 03, 2018, 10:53:31 PM
#6
Looks like the board is overheating. Can you try it at a lower frequency?

Or check to see if something is blocking the heat sinks. Maybe it ingested something that is blocking airflow.

Or now that I think about it, maybe a heat sink fell off a chip. That would cause a pretty quick trip to 80c. Does it rattle when you shake it gently by chance?
hero member
Activity: 1610
Merit: 538
I'm in BTC XTC
January 03, 2018, 05:00:00 PM
#5
You're running the APW3++ on 220V?
hero member
Activity: 756
Merit: 560
January 03, 2018, 04:57:36 PM
#4
You need to look at/post kernel logs if you want any kind of diagnosis. Since there are no user servicable parts inside the answer will always be the same unfortunately. Warranty time.
newbie
Activity: 9
Merit: 0
January 03, 2018, 03:07:13 PM
#3
APW3++
jr. member
Activity: 126
Merit: 1
January 03, 2018, 03:00:01 PM
#2
Which PSU are you using?
newbie
Activity: 9
Merit: 0
January 03, 2018, 02:04:41 PM
#1

https://imgur.com/a/j5MzV

As you can see, one of the 3 boards on this relatively new miner isn't hashing.  Rebooted the machine this morning when problem surfaced, and all three boards hashed normally.  But this one went out again and won't start hashing on reboot.

Any suggestions on why this may be happening and how to fix it?

Thanks.
Jump to: