Author

Topic: Auto-detecting burned chains on antminer s9? (Read 178 times)

full member
Activity: 538
Merit: 175
Yank the bad board from the miner and just mine at 2/3rd speed?

You do not want to pull bad boards from the system if at all possible. If you run a chasis with a board missing the air will travel along the path of lease resistance and not blow properly through the heatsinks creating hot spots and other issues for the remaining cards.

Excellent point.  Simply unplug power to them.
Good advice, they are getting swapped in this case anyway in order to bring the machines back up to full speed.
copper member
Activity: 658
Merit: 101
Math doesn't care what you believe.
Yank the bad board from the miner and just mine at 2/3rd speed?

You do not want to pull bad boards from the system if at all possible. If you run a chasis with a board missing the air will travel along the path of lease resistance and not blow properly through the heatsinks creating hot spots and other issues for the remaining cards.

Excellent point.  Simply unplug power to them.
hero member
Activity: 756
Merit: 560
Yank the bad board from the miner and just mine at 2/3rd speed?

You do not want to pull bad boards from the system if at all possible. If you run a chasis with a board missing the air will travel along the path of lease resistance and not blow properly through the heatsinks creating hot spots and other issues for the remaining cards.
full member
Activity: 538
Merit: 175
I deleted my previous post, since I need do to fix  what I said.

It used to max. 115C according to Bitmain,
but now the S9 user guide mentions that 125C is the absolute max.

Ref. page 12 in the S9 manual.

https://file.bitmain.com/shop-bitmain/download/AntMiner%20S9%20Installation%20Guide.pdf

Quote from: Bitmain
Note:
The    S9 miner   with   automatic   frequency   adjustment firmware will stop running when the
Temp (Chips) reach to 125-135°C, there will be a error   message “Fatal Error:   Temperature is too high!”
shown in the bottom of kernel log page.
Cool, thanks for taking the time to re-check on that. Running them hotter seems to be the only viable option currently. I appreciate all the help guys.
legendary
Activity: 2506
Merit: 1714
Electrical engineer. Mining since 2014.
I deleted my previous post, since I need do to fix  what I said.

It used to max. 115C according to Bitmain,
but now the S9 user guide mentions that 125C is the absolute max.

Ref. page 12 in the S9 manual.

https://file.bitmain.com/shop-bitmain/download/AntMiner%20S9%20Installation%20Guide.pdf

Quote from: Bitmain
Note:
The    S9 miner   with   automatic   frequency   adjustment firmware will stop running when the
Temp (Chips) reach to 125-135°C, there will be a error   message “Fatal Error:   Temperature is too high!”
shown in the bottom of kernel log page.
full member
Activity: 538
Merit: 175
Thanks for confirming that, do you know if 115C is the maximum for any board, or just on average for the miner? Sometimes I see miners above 14 TH/s with a single board >120C, for example.
copper member
Activity: 658
Merit: 101
Math doesn't care what you believe.
I've heard 115C.
full member
Activity: 538
Merit: 175
Yank the bad board from the miner and just mine at 2/3rd speed?
Going as fast as possible, but it's hard to keep up due to local air temp spiking up recently.

Does anyone know what the hard limiting temperature is for the chips? It's possible that they could be running higher but I wouldn't want to decrease their lifespan unnecessarily.
copper member
Activity: 658
Merit: 101
Math doesn't care what you believe.
Yank the bad board from the miner and just mine at 2/3rd speed?
full member
Activity: 538
Merit: 175
So these S9 miners are auto-configured (using bmminer api) to stop hashing after temps exceed 120C or so in order to keep the fan blowing and contributing to airflow.

Unfortunately, it seems that the miner retains last temp from boards that burn out from loose sinks/melting/etc. The kernel log seems to retain the temperature offset of chains where it can't read the thermistors anymore. This means that some cooler machines that had these temp spikes still display temps above 180C, which prevents them from hashing due to staying in cooldown mode. Even rolling the average usually gives a temp above 120C.

Is there any way to automatically ignore these faulty chains, or would it be better to roll the average of the boards and check if the hottest board deviates too much from the average?
Jump to: