Author

Topic: Ask for help on the failure of the antminer 17 series temperature sensor (Read 177 times)

legendary
Activity: 3472
Merit: 3217
Playbet.io - Crypto Casino and Sportsbook
Does anyone know what kind of process the firmware will read the temperature sensor chip model? I don’t know why the ANT 17 series mining machines have the problem of reading the temperature sensor failure. I am a little confused whether it is a software problem or a hardware problem.

If you already did flashing the unit with different firmware stock or braiins or even HulkOS but the issue is still there then it should be a hardware issue. Have you tried to replace the hashboard or the chip sensor? If not, then try it first because without testing it you will never know if it's a hardware issue or software issue.
newbie
Activity: 11
Merit: 5
Does anyone know what kind of process the firmware will read the temperature sensor chip model? I don’t know why the ANT 17 series mining machines have the problem of reading the temperature sensor failure. I am a little confused whether it is a software problem or a hardware problem.
newbie
Activity: 11
Merit: 5
Gave you a merit. it will help you be active on this forum.

here are your logs:

https://www.talkimg.com/images/2023/08/03/Gdbiq.png

https://www.talkimg.com/images/2023/08/03/GdDwj.png

Thank you very much my friend, I will try talkimg.com
newbie
Activity: 11
Merit: 5
~snip~

I never heard of HulkOS before but the image shows the wrong sensor type and then calibrate temp sensor failed.

Is this only happen in chain 1 or does all hashboard have the same error?

What steps that you already tried? Did you recently replace the miner with a new hashboard without a code editor or Pickit?

If you already tried changing PSU as suggested above and if this is not a hardware issue then can you try dumping the PIC firmware from the other 2 known working hashboard and then flash it to this hashboard(chain 1)?

Don't forget to dump the PIC firmware first from (chain 1) just in case you can revert it back to its original state.

This firmware is developed by us privately. It is currently in the public testing stage. This is the built-in maintenance mode of the firmware. It only supports testing one computing power board at a time, similar to the official maintenance fixture, so my log is only testing one computing power board, this board is a normal operating computing power board

When I was analyzing this problem, I kept trying to start the test program, and I often reproduced the problem that the temp_sensor could not be read. I analyzed the log and found that it would initially read the component model of the temp_sensor, and the hardware used was NCT218. , if it is read correctly, then he will work normally, if the identified component model is inaccurate or not read, then he will replace the identified model with the default TMP451, then he will not be able to calibrate Temp_sensor Get the correct parameters, resulting in a temperature reading error and an error

This test log is the result of testing under the same machine
If it is read correctly, it will prompt:
temp_sensor_type=26 is NCT218

If it is not read correctly, then he will prompt:
temp_sensor_type=0 then he will set this temp_sensor as TMP451 by default

NCT218=type26
TMP451=type0

Also, our firmware does not depend on the PIC firmware in the EEPROM chip attached to the PIC chip. We directly issue the operating frequency of the asic chip and the power supply voltage of the PSU from the hulkos firmware, which completely bypasses the problem of the PIC, and the frequency and voltage are dynamically adjusted, so even if the PIC chip is not flashed with firmware, or our three-chip computing power The PIC information of the board is not uniform, and our hashboard can still work normally, so it has nothing to do with the PIC, which simplifies the work difficulty of on-site operation and maintenance personnel. To a certain extent, you can understand that hulkos automatically sends the PIC program to the hashboard to run
legendary
Activity: 3472
Merit: 3217
Playbet.io - Crypto Casino and Sportsbook
~snip~

I never heard of HulkOS before but the image shows the wrong sensor type and then calibrate temp sensor failed.

Is this only happen in chain 1 or does all hashboard have the same error?

What steps that you already tried? Did you recently replace the miner with a new hashboard without a code editor or Pickit?

If you already tried changing PSU as suggested above and if this is not a hardware issue then can you try dumping the PIC firmware from the other 2 known working hashboard and then flash it to this hashboard(chain 1)?

Don't forget to dump the PIC firmware first from (chain 1) just in case you can revert it back to its original state.
legendary
Activity: 4326
Merit: 8950
'The right to privacy matters'
Gave you a merit. it will help you be active on this forum.

here are your logs:



newbie
Activity: 11
Merit: 5
If you don't mind would you mind to share one of your miner's logs so that we can analyze what the cause of the temp sensor to fail?

Sometimes the temp sensors are not the cause of why it fails it is sometimes due to the ASIC chip I suggest check the test points and find the broken chip. You can follow the guide from this link below

- https://www.zeusbtc.com/manuals/Antminer-S17-plus-Hash-Board-Repair-Guide.asp

If all test points are fine then the sensors chip is the cause of this failure you can replace it with a new one available on zeusbtc.

My friend, thank you for your suggestion, I know this company, but I don’t need their components, I have my own channel to get the goods, their prices, you know, I have a lot of scrap machines that I can dismantle Chip, I have carefully studied the maintenance instructions of all series of 17 a long time ago. I even have the original circuit diagram. I have a headache for this damn temp-sensor problem. I agree with your point of view through analysis, it’s his mother There is a problem with the computing power chip, and there is a problem with the welding process of Century Cloud Core. They even sprinkled rosin solution on the 17 series computing power board and re-soldered it. Maybe the board you saw with UV glue is that batch.

In addition, I am using HulkOS, and his logs are not the same as Bitmain’s official firmware. I don’t know how to upload attachments to this forum. It seems that it is not good to post large sections of logs.

Try this to see if it can be opened. This is the log of the same machine. It is normal to read the temperature sensor as NCT218 at the beginning. If it cannot be read later, it will default to TMP451 and start reporting an error.

http://m.qpic.cn/psc?/V53cp4R00xIcKH0EHAvf1OQprC2kRuqc/6tCTPh7N*X6CBkvkDvKlZW7IcKWUta.hnJDG0METJ90Ar0e6SMIWli43jVioiJGuwTqZ4ca3US4144ps1sKKAzUbfzWRp.UfSI4nkFNwh7w!/b&bo=OARSBAAAAAABJ2o!&rf=viewer_4&t=5

http://m.qpic.cn/psc?/V53cp4R00xIcKH0EHAvf1OQprC2kRuqc/6tCTPh7N*X6CBkvkDvKlZTUWsnYdM4Opujj.8Vzl0orUyQ4gUKypHmsJHlflbPfZ0.AxuI4iaZom9zyEezL6eGY2xnfXoy*ztOCOijPtreo!/b&bo=UwQ4BAAAAAABJ2s!&rf=viewer_4&t=5

newbie
Activity: 11
Merit: 5
I have a whole dedicate thread regarding the temperature sensor on the 17 series, you fill find a lot of useful info in there.

But to give you a bit of a summary:

- The part number of the temp sensor is TMP451AIDQFR.
- It's most often a chip/solder issue on one of the chips rather than a temp sensor.
- if all temp sensors on all boards are acting weird then it's most likely a bad PSU.

I had a long 'terrible' experience with those gears, and as weird as it sounds, 90% of the issues are caused by bad contact between the chip and PCB, the medium temperature solder used on those gears is terrible, the robots/human beings who soldered the chips did a terrible job, you are guaranteed to fix the majority of the problems (regardless whether it shows a bad temp sensor or missing chip) by simply reflowing all the chips and making sure there are no solder balls built up on any of the chips and shorting the chip.

If you confirm that it's indeed an issue of bad temp senrsor/s, refer to my topic to know their exact locations, also here is a short video showing a replacement of those temp sensors

I have carefully read all the replies under that post, and I have also carefully studied bitmain’s official maintenance instructions and the circuit diagram of S17. The 17 series hosts do not only have a model of TMP451. Among the 17 series models I have encountered so far, there are at least NCT218 and TMP451 two models, I found through the detailed log of my third-party firmware that he will first read the register of the temperature sensor, then obtain the device type of the sensor, that is, the model of the sensor, and then read the temperature correctly, such as The sensor of NCT218 is on the PCBA, and when the firmware cannot read NCT218, it will default to the device model TMP451, and then when it continues, the firmware will report that the reading of the temperature sensor failed

After analyzing the circuit diagram, I found that the temperature sensor has 4 wires connected to the corresponding power chip, which are SCL, SDA, TEMP_P, and TEMP_N. The BJT of the PNP transistor in the chip is read through the latter two signal wires. The temperature of the junction, and then the SDA transmits this data to the rx and TX on the ASIC chip through the I2C bus, and then transmits it to the PIC control chip, and then the PIC chip transmits it to the control board, and the temperature sensor is uploaded to the Whether the signal of the upper end machine is accurate or not becomes the key to whether the temperature sensor can correctly identify
legendary
Activity: 2436
Merit: 6643
be constructive or S.T.F.U
I have a whole dedicate thread regarding the temperature sensor on the 17 series, you fill find a lot of useful info in there.

But to give you a bit of a summary:

- The part number of the temp sensor is TMP451AIDQFR.
- It's most often a chip/solder issue on one of the chips rather than a temp sensor.
- if all temp sensors on all boards are acting weird then it's most likely a bad PSU.

I had a long 'terrible' experience with those gears, and as weird as it sounds, 90% of the issues are caused by bad contact between the chip and PCB, the medium temperature solder used on those gears is terrible, the robots/human beings who soldered the chips did a terrible job, you are guaranteed to fix the majority of the problems (regardless whether it shows a bad temp sensor or missing chip) by simply reflowing all the chips and making sure there are no solder balls built up on any of the chips and shorting the chip.

If you confirm that it's indeed an issue of bad temp senrsor/s, refer to my topic to know their exact locations, also here is a short video showing a replacement of those temp sensors
legendary
Activity: 3472
Merit: 3217
Playbet.io - Crypto Casino and Sportsbook
If you don't mind would you mind to share one of your miner's logs so that we can analyze what the cause of the temp sensor to fail?

Sometimes the temp sensors are not the cause of why it fails it is sometimes due to the ASIC chip I suggest check the test points and find the broken chip. You can follow the guide from this link below

- https://www.zeusbtc.com/manuals/Antminer-S17-plus-Hash-Board-Repair-Guide.asp

If all test points are fine then the sensors chip is the cause of this failure you can replace it with a new one available on zeusbtc.
newbie
Activity: 11
Merit: 5
Hi, everyone, I am a miner from China, I have a large number of T17+ and S17+, they often have the problem of temperature sensor reading failure, I want to get your help

I often encounter the problem that a certain computing power board reports that the temperature sensor fails, causing the board to fall off and fail to operate

I found that the 17 series mining machine has two sensors, one is TMP451, the other is NCT218, and the operating voltage range of these two chips is not the same, the former is 1.7~3.6V, the latter is 1.4~2.75V, And when I use the maintenance mode of the third-party firmware, it will show the NCT218 chip when it is normal at the beginning of the test, and it will show the TMP451 chip when there is a problem and the temperature sensor is lost. I don’t know if there is any connection between them.
Jump to: