Author

Topic: S17+ Temp sensors problem (Read 181 times)

hero member
Activity: 544
Merit: 589
June 20, 2021, 01:31:41 PM
#16
# 0: pattern test
# 1: only find asic
Great that you chimed in, what does the pattern test actually do aside from counting chips?

The "only find asic" just does the count, no hashing. The "pattern test" runs all the chips at the frequency & voltage specified in the Config.ini file, and checks the results. If a chip is able to communicate but performs poorly or not at all at the frequency & voltage being run, this test finds it.
legendary
Activity: 3528
Merit: 2414
Evil beware: We have waffles!
June 19, 2021, 06:31:18 PM
#15
Generally the term 'cold solder (joint)' refers to a bad connection caused from insufficient heat being applied. It results in a grainy looking solder joint with metal crystals in it. Mechanically weak and prone to cracking causing at best a higher than normal resistance through the joint and often intermittent connections.
legendary
Activity: 2170
Merit: 6279
be constructive or S.T.F.U
June 19, 2021, 05:44:03 PM
#14
-First symptom was "0 Asic found". I repaired a few cold solder joints. ( I measured CLK pins with an oscilloscope )
-After repair Test Fixture found 65 ASIC chips, but then showed the problem with thermal sensors.
-Then i found on 1 ASIC chip only 1,3Volts on pins(CO-RI-RST), another cold solder, problem with 1V8 power supply line.
-Then i found bad resistance among 1V8 power supply lines. I measured the first and the last ASIC chip in the line (5-Asics in line). Resistance was 2-3 Ohms but must be 0-Ohms. -> Another cold solder

Would you explain more about "cold solder" did you find some solder balls shorting some legs/pins? it would be great if you could share some more info (photos would be really nice).

What firmware is installed on this machine? can you test with custom firmware like Vnish or BO+ ? I prefer the former, you can get it on Asic.to or AwoesomeMiner website (both have some fees 2-3%), these custom firmware "ignore" some things that the stock does not, there is a good chance that the miner might work with custom firmware.

# 0: pattern test
# 1: only find asic

Great that you chimed in, what does the pattern test actually do aside from counting chips?
hero member
Activity: 544
Merit: 589
June 19, 2021, 08:20:48 AM
#13
You can run the board on the test fixture with some heatsinks off if you either shut it down before the pattern-test begins or set the option in the Config.ini file on the sdcard to only run the "find asic" test. It only uses a lot of power when doing the pattern test, so the chips won't overheat without a heatsink if only doing the asic count.

Code:
# 0: pattern test
# 1: only find asic
Only_find_ASIC=1

All of the temp sensor issues I have seen so far have been due to low 1.8V at the temp sensor. Typically you can find the 1.8V issues by tracing the resistance through the chips, like you described. Unfortunately, bad solder joints can break contact or make contact based on temperature, so many times you need to power the board up in order to find the issue. I check these by running the board and measuring the voltage drop across each chip. Anything above a couple mV drop is a potential issue. If you have access to a good benchtop multimeter, it can help a lot.

This is a lot easier with the Asic.repair test fixture, with the standard Bitmain style fixture it tries to find the chips 3 times and then stops, so you get only 15 seconds or so to measure before you have to wait for another run.
jr. member
Activity: 31
Merit: 6
June 18, 2021, 03:34:40 PM
#12
Thank you for info. But i think this issue is hard to repair. How to find bad ASIC chip without dismount all heat sinks ?

There are test points (CLK-CO-RI-BO-RST) which you can use to test the chip without having to remove the heatsinks, this is the repair manual for the S17+, the same website is also full of other tutorials, one thing to keep in mind is that if you are using the fixture tool you must NOT use an external PSU to test the hash boards.



-First symptom was "0 Asic found". I repaired a few cold solder joints. ( I measured CLK pins with an oscilloscope )
-After repair Test Fixture found 65 ASIC chips, but then showed the problem with thermal sensors.
-Then i found on 1 ASIC chip only 1,3Volts on pins(CO-RI-RST), another cold solder, problem with 1V8 power supply line.
-Then i found bad resistance among 1V8 power supply lines. I measured the first and the last ASIC chip in the line (5-Asics in line). Resistance was 2-3 Ohms but must be 0-Ohms. -> Another cold solder
-I changed All Thermal sensors and Asic chips connected to the sensors

But problem still persists.

Thanks.
legendary
Activity: 2170
Merit: 6279
be constructive or S.T.F.U
June 15, 2021, 04:21:54 PM
#11
Thank you for info. But i think this issue is hard to repair. How to find bad ASIC chip without dismount all heat sinks ?

There are test points (CLK-CO-RI-BO-RST) which you can use to test the chip without having to remove the heatsinks, this is the repair manual for the S17+, the same website is also full of other tutorials, one thing to keep in mind is that if you are using the fixture tool you must NOT use an external PSU to test the hash boards.
jr. member
Activity: 31
Merit: 6
June 15, 2021, 02:06:27 PM
#10
Hello,

I am trying repair my S17+ Hash Board. I have a problem with temperature sensors. I changed all 4 sensors from NTC218 to TMP451 and ASIC chip connected to temp sensors, but same error. Can you please help me with this? Here is log from antminer fixture tester.

I did conduct long research on this problem a year ago, more details can be found here, long story short, the temp sensor error is very rarely the result of a bad temp sensor, which makes perfect sense because what are the odds of 4 temp sensors falling at the same time? slim to nothing if you ask me.

Your problem is a bad chip (or a few of them), we do not "yet" know why some chip failure causes the miner to report a dead temp sensor but we are certain that is the case, if the fixture tool did not help with identifying the exact chip, you will need to manually measure the volage/resistance across all chips.

If the temp-sensor error shows on all three boards (you get a total of 12 temp sensor errors) then that is just a bad PSU.



Thank you for info. But i think this issue is hard to repair. How to find bad ASIC chip without dismount all heat sinks ?
Maybe are some TF or RF pins shorted inside of ASIC chip and sent wrong information to the control board. (This is only theory)
Could you please share with me some experiences from your repairs?

Thank you.
legendary
Activity: 2170
Merit: 6279
be constructive or S.T.F.U
June 14, 2021, 06:12:32 PM
#9
Hello,

I am trying repair my S17+ Hash Board. I have a problem with temperature sensors. I changed all 4 sensors from NTC218 to TMP451 and ASIC chip connected to temp sensors, but same error. Can you please help me with this? Here is log from antminer fixture tester.

I did conduct long research on this problem a year ago, more details can be found here, long story short, the temp sensor error is very rarely the result of a bad temp sensor, which makes perfect sense because what are the odds of 4 temp sensors falling at the same time? slim to nothing if you ask me.

Your problem is a bad chip (or a few of them), we do not "yet" know why some chip failure causes the miner to report a dead temp sensor but we are certain that is the case, if the fixture tool did not help with identifying the exact chip, you will need to manually measure the volage/resistance across all chips.

If the temp-sensor error shows on all three boards (you get a total of 12 temp sensor errors) then that is just a bad PSU.

jr. member
Activity: 31
Merit: 6
June 14, 2021, 02:38:15 PM
#8
There are two guys that can be trusted here in the usa to repair s17 gear


I can link there profiles in a bit.



lightfoot
https://bitcointalksearch.org/user/lightfoot-148567



wndsnb
https://bitcointalksearch.org/user/wndsnb-366233


these guys may be helpful

If you message them show your thread to them.

Thank you.
jr. member
Activity: 31
Merit: 6
June 14, 2021, 02:19:22 PM
#7
There are two guys that can be trusted here in the usa to repair s17 gear


I can link there profiles in a bit.



Yes please.
I repaired few S17+,T17, Avalon miners, APW9+ PSU,...
But,
I have a problem to understand this error.

Thanks.
M.
legendary
Activity: 4032
Merit: 7391
'The right to privacy matters'
June 14, 2021, 02:18:25 PM
#6
There are two guys that can be trusted here in the usa to repair s17 gear


I can link there profiles in a bit.



lightfoot
https://bitcointalksearch.org/user/lightfoot-148567



wndsnb
https://bitcointalksearch.org/user/wndsnb-366233


these guys may be helpful

If you message them show your thread to them.
full member
Activity: 416
Merit: 125
June 14, 2021, 02:04:21 PM
#5
There are two guys that can be trusted here in the usa to repair s17 gear


I can link there profiles in a bit.

legendary
Activity: 3528
Merit: 2414
Evil beware: We have waffles!
June 14, 2021, 01:47:04 PM
#4
Please edit that from being a Quote to using the Code tag instead - it is the # icon right next to the Quote icon... The Code tag will change that from being a text wall to a much more readable scroll box.
Done, sir.
Thanks for recommendation.
And for doing that simple thing - you get your 1st Merit Smiley
Now you just need for someone to give you some help (I run Canaan's miners, not Bitmain ones)
Cheers!
jr. member
Activity: 31
Merit: 6
June 14, 2021, 01:17:33 PM
#3
Please edit that from being a Quote to using the Code tag instead - it is the # icon right next to the Quote icon... The Code tag will change that from being a text wall to a much more readable scroll box.

Done, sir.
Thanks for recommendation.
legendary
Activity: 3528
Merit: 2414
Evil beware: We have waffles!
June 14, 2021, 12:52:56 PM
#2
Please edit that from being a Quote to using the Code tag instead - it is the # icon right next to the Quote icon... The Code tag will change that from being a text wall to a much more readable scroll box.
jr. member
Activity: 31
Merit: 6
June 14, 2021, 12:06:41 PM
#1
Hello,

I am trying repair my S17+ Hash Board. I have a problem with temperature sensors. I changed all 4 sensors from NTC218 to TMP451 and ASIC chip connected to temp sensors, but same error. Can you please help me with this? Here is log from antminer fixture tester.

Code:
1970-01-01 00:43:36 thread.c:852:cancel_pic_heart_beat_thread: cancel thread
1970-01-01 00:43:37 single_board_test.c:1558:kill_hashboard: ****power off hashboard****
1970-01-01 00:43:42 main.c:90:main: Press 'test' key to continue
1970-01-01 00:44:07 single_board_test.c:2327:get_eeprom_info: get EEPROM info success!
1970-01-01 00:44:07 single_board_test.c:2575:single_board_test: g_test_level 7, pattern_test_time 1
1970-01-01 00:44:07 single_board_test.c:2366:do_single_board_test: Begin test
1970-01-01 00:44:07 fan.c:276:front_fan_power_on: Note: front fan is power on!
1970-01-01 00:44:07 fan.c:288:rear_fan_power_on: Note: rear fan is power on!
1970-01-01 00:44:07 driver-btm-api.c:1137:miner_device_init: Detect 256MB control board of XILINX
1970-01-01 00:44:07 driver-btm-api.c:1085:init_fan_parameter: fan_eft : 0  fan_pwm : 0
1970-01-01 00:44:13 driver-btm-api.c:1069:init_miner_version: miner ID : 806cd4025c208854
1970-01-01 00:44:13 driver-btm-api.c:1075:init_miner_version: FPGA Version = 0xB031
1970-01-01 00:44:14 board.c:36:jump_and_app_check_restore_pic: chain[0] PIC jump to app
1970-01-01 00:44:16 board.c:40:jump_and_app_check_restore_pic: Check chain[0] PIC fw version=0x88
1970-01-01 00:44:16 thread.c:802:create_pic_heart_beat_thread: create thread
1970-01-01 00:44:20 power_api.c:228:set_higher_voltage_raw: higher_voltage_raw = 2100
1970-01-01 00:44:20 power_api.c:278:set_to_higher_voltage: Set to voltage raw 2100, one step.
1970-01-01 00:44:22 power_api.c:85:check_voltage_multi: retry time: 0
1970-01-01 00:44:23 power_api.c:40:_get_avg_voltage: chain = 0, voltage = 21.011479
1970-01-01 00:44:23 power_api.c:53:_get_avg_voltage: average_voltage = 21.011479
1970-01-01 00:44:23 power_api.c:71:check_voltage: target_vol = 21.00, actural_vol = 21.01, check voltage passed.
1970-01-01 00:44:23 uart.c:71:set_baud: set fpga_baud to 115200
1970-01-01 00:44:23 driver-hash-chip.c:245:dhash_chip_set_baud_v2: chain[0]: chip baud = 115200, chip_divider = 26
1970-01-01 00:44:34 driver-btm-api.c:1009:check_asic_number_with_power_on: Chain[0]: find 65 asic, times 0
1970-01-01 00:44:37 driver-hash-chip.c:266:set_uart_relay: set uart relay to 0x330003
1970-01-01 00:44:37 driver-btm-api.c:361:set_order_clock: chain[0]: set order clock, stragegy 3
1970-01-01 00:44:37 driver-hash-chip.c:502:set_clock_delay_control: core_data = 0x34
1970-01-01 00:44:37 driver-hash-chip.c:502:set_clock_delay_control: core_data = 0x34
1970-01-01 00:44:37 driver-hash-chip.c:517:set_clock_delay_control: singe chain mode
1970-01-01 00:44:38 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:44:38 temperature.c:320:calibrate_temp_sensor_one_chain: chain 0 temp sensor TMP451
1970-01-01 00:44:39 temperature.c:488:temp_statistics_show:   pcb temp 33~44  chip temp 43~53
1970-01-01 00:44:39 uart.c:71:set_baud: set fpga_baud to 12000000
1970-01-01 00:44:39 driver-hash-chip.c:245:dhash_chip_set_baud_v2: chain[0]: chip baud = 12000000, chip_divider = 3
1970-01-01 00:44:39 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:44:39 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:44:39 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:44:39 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:44:39 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:44:39 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:44:39 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:44:39 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:44:39 temperature.c:744:get_temp_info: read temp sensor failed: chain = 0, sensor = 2, chip = 54, reg = 0
1970-01-01 00:44:39 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:44:39 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:44:39 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:44:39 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:44:39 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:44:40 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:44:40 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:44:40 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:44:40 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:44:40 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:44:40 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:44:40 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:44:40 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:44:40 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:44:40 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:44:40 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:44:40 temperature.c:744:get_temp_info: read temp sensor failed: chain = 0, sensor = 2, chip = 54, reg = 1
1970-01-01 00:44:40 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:44:40 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:44:40 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:44:40 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:44:40 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:44:40 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:44:40 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:44:40 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:44:40 temperature.c:744:get_temp_info: read temp sensor failed: chain = 0, sensor = 3, chip = 50, reg = 0
1970-01-01 00:44:40 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:44:40 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:44:40 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:44:40 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:44:41 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:44:41 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:44:41 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:44:41 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:44:41 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:44:41 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:44:41 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:44:41 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:44:41 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:44:41 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:44:41 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:44:41 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:44:41 temperature.c:744:get_temp_info: read temp sensor failed: chain = 0, sensor = 3, chip = 50, reg = 1
1970-01-01 00:44:41 temperature.c:488:temp_statistics_show:   pcb temp 40~44  chip temp 46~53
1970-01-01 00:44:41 single_board_test.c:1602:check_temperature: temper sensor bad
1970-01-01 00:44:42 single_board_test.c:2105:upload_log_generate: generate result.json done
upload command::/mnt/card/uploadlog.sh BHB07602 FXDZYH7AIABJE0EH0 PT2 0001 Other-Error /tmp/log
sh: /mnt/card/uploadlog.sh: not found
run upload command failed!
1970-01-01 00:44:50 thread.c:852:cancel_pic_heart_beat_thread: cancel thread
1970-01-01 00:44:51 single_board_test.c:1558:kill_hashboard: ****power off hashboard****
1970-01-01 00:45:03 main.c:90:main: Press 'test' key to continue



Thank you.
Slovak Republic
Jump to: