Pages:
Author

Topic: T17/S17 malfunction: cases, solutions, remedies, RMA history - page 9. (Read 7059 times)

newbie
Activity: 21
Merit: 7
Sorry about the annoyance with code format Embarrassedbut im still a newbie Cheesy, seems as a  plausible explanation, as the the power cut only involed one of the two powercords.  icould try with a psu from a T17 APW9 would that be sufficient for the  test of  the faulty T17+ APW9+? Upgraded to latest  Bitmain orginal FW, and now flashed the awesom SD 2.03.
legendary
Activity: 2436
Merit: 6643
be constructive or S.T.F.U
Please use the code function to post kernel logs, it's really annoying and hurts the eyes. Sad, use something like this

Code:
[2020/11/25 14:09:43] WARN: chain[2] - 44 of 65 chips detected
[2020/11/25 14:09:46] WARN: chain[1] - 44 of 65 chips detected
[2020/11/25 14:09:49] WARN: chain[0] - 44 of 65 chips detected

What the kernel log suggests is that all 3 hashboards have only 44 asics, which means the signal at chip 45 gets interrupted, it is unlikely that all 3 hash boards will have the exact same problem, so my guess is your PSU got a problem, can you test the miner with another working PSU?
sr. member
Activity: 604
Merit: 416
This is a new one, I've never seen any unit drop all hashboards at exact same chip (44).

Have you tried custom firmware or newest firmware from Bitmain? Could you try changing PSU? Could you try disconnecting one of the boards and running 1 or 2 hashboards only?

Please use [code/code] for logs.
newbie
Activity: 21
Merit: 7
An AntminerT17+ been running perfect for some months suddenly and after some power cuts, did not restart. Flashed it with Awesom FW and getting this in kernel log, is all boards dead or just the hashboard? Thanks Mike.

Code:
[2020/11/25 14:09:25] INFO: Power ON
[2020/11/25 14:09:27] INFO: Starting FPGA queue
[2020/11/25 14:09:27] INFO: Initializing hash boards
[2020/11/25 14:09:27] INFO: chain[2] - Initializing
[2020/11/25 14:09:27] INFO: chain[1] - Initializing
[2020/11/25 14:09:27] INFO: chain[0] - Initializing
[2020/11/25 14:09:43] WARN: chain[2] - 44 of 65 chips detected
[2020/11/25 14:09:46] WARN: chain[1] - 44 of 65 chips detected
[2020/11/25 14:09:49] WARN: chain[0] - 44 of 65 chips detected
[2020/11/25 14:09:54] WARN: chain[2] - 44 of 65 chips detected
[2020/11/25 14:09:57] WARN: chain[1] - 44 of 65 chips detected
[2020/11/25 14:10:00] WARN: chain[0] - 44 of 65 chips detected
[2020/11/25 14:10:06] WARN: chain[2] - 44 of 65 chips detected
[2020/11/25 14:10:06] ERROR: driver-btm-chain.c:488 chain[2] - Failed to detect ASIC chips
[2020/11/25 14:10:06] INFO: chain[2] - Shutting down the chain
[2020/11/25 14:10:06] ERROR: driver-btm-base.c:356 chain[2] - Initialization failed
[2020/11/25 14:10:09] WARN: chain[1] - 44 of 65 chips detected
[2020/11/25 14:10:09] ERROR: driver-btm-chain.c:488 chain[1] - Failed to detect ASIC chips
[2020/11/25 14:10:09] INFO: chain[1] - Shutting down the chain
[2020/11/25 14:10:09] ERROR: driver-btm-base.c:356 chain[1] - Initialization failed
[2020/11/25 14:10:12] WARN: chain[0] - 44 of 65 chips detected
[2020/11/25 14:10:12] ERROR: driver-btm-chain.c:488 chain[0] - Failed to detect ASIC chips
[2020/11/25 14:10:12] INFO: chain[0] - Shutting down the chain
[2020/11/25 14:10:12] ERROR: driver-btm-base.c:356 chain[0] - Initialization failed
[2020/11/25 14:10:12] ERROR: driver-btm-base.c:2154 Failed to initialize hash boards
[2020/11/25 14:10:12] INFO: Shutting down the miner
[2020/11/25 14:10:12] INFO: Stopping FPGA queue
[2020/11/25 14:10:12] INFO: chain[0] - Shutting down the chain
[2020/11/25 14:10:12] INFO: chain[1] - Shutting down the chain
[2020/11/25 14:10:12] INFO: chain[2] - Shutting down the chain
[2020/11/25 14:10:12] INFO: Power OFF
newbie
Activity: 25
Merit: 11
I will share all the info soon. I have a youtube channel. I will post a link soon.
legendary
Activity: 2436
Merit: 6643
be constructive or S.T.F.U
I wouldn't say the fixture is lying, it is doing its job, which can't be perfect all the time, I suppose the solder balls you found around the chip came from "overdosing" by bitmain staff/robots, after being put under heat that extra solder has to go somewhere when it goes outside of where it has to be it will short the hash board, it probably does something to the component that reads the temp-sensor and that could explain why both the fixture tool and the miner kernel log mention the temp-sensor when sensors are actually good, and since temp-sensors are essentials the miner won't start thinking that all 4 sensors are bad, these are some terrible quality gears.

I am glad you managed to fix some of the hash boards, it would have been a lot better if you could take some photos, also share how did you manage to remove that solder without damaging anything else.
newbie
Activity: 25
Merit: 11
I spent all night playing with the boards. So here is a some what of a fix. The fixture is not accurate at all. I put these same boards into my miner and put asic.to or bitmain firmware on it and ran it as is. I worked!!! It looks like the test fixture is showing that temp sensors are bad but that's not true. They are good. I would double check the chips to have their volts in order. Such as clock and RO, also check the rest as well but clock and ro are good way to find a bad chip. Anyway I found some chips that have solder balls next to it. Remove those balls. Once the chips all have good volts run asic.to firmware and it should work. The test fixture is lying. lol. Dont believe it. I fixed 6 boards last night once I stopped believing the test fixture. They are sill working as of this morning. I will update you when I fix more boards.
legendary
Activity: 2436
Merit: 6643
be constructive or S.T.F.U
Nothing is permanent, that arctic adhesive won't hold against a hammer or a heat gun direct to that heatsink, the same thing applies to any other solder you might use.

What matters in your kernel log is only these 4 lines:

Code:
1970-01-01 00:01:52 temperature.c:744:get_temp_info: read temp sensor failed: chain = 0, sensor = 0, chip = 14, reg = 0

1970-01-01 00:01:54 temperature.c:744:get_temp_info: read temp sensor failed: chain = 0, sensor = 1, chip = 10, reg = 1

1970-01-01 00:01:54 temperature.c:744:get_temp_info: read temp sensor failed: chain = 0, sensor = 2, chip = 54, reg = 0

1970-01-01 00:01:55 temperature.c:744:get_temp_info: read temp sensor failed: chain = 0, sensor = 3, chip = 50, reg = 0

The kernel log can be a bit confusing, it isn't saying that those 4 chips are bad, it's only trying to tell you that the temp sensor next to those 4 chips is bad, each board has temp sensors located near the chips mentioned 10,14,50 and 54 something like this:



But this isn't even accurate either, because it's unlikely that 4 temp sensors would die, and the real actual cause must be one of two.

1- If all temp sensors across 3 hash boards (total of 12 temp sensors) show "failed" then the problem is the PSU
2- If one hash boards temp sensors show "failed" then one or more heatsink/chip isn't in 100% contact and needs replacement, and more often than not the first chip (chip 0) is the bad one

notice that, the PSU theory still stands even if 1 hash board is having a hard time reading the temp sensor, it's hard to explain but take it as is.
newbie
Activity: 25
Merit: 11
I wouldnt want to use any permeneatny adhesive. Im looking to apply solder. What solder do you know I can use for this that can be taken off and on.

Also I have a whole bunch of boards that have the same exact errors when I run the test fixture. All the asic are found but I still get a temp sensor error.

Code:
1970-01-01 00:00:52 main.c:45:main: Ready for test
1970-01-01 00:00:59 single_board_test.c:2336:get_eeprom_info: get EEPROM info success!
1970-01-01 00:00:59 single_board_test.c:2585:single_board_test: g_test_level 7, pattern_test_time 1
1970-01-01 00:00:59 single_board_test.c:2375:do_single_board_test: Begin test
1970-01-01 00:00:59 fan.c:276:front_fan_power_on: Note: front fan is power on!
1970-01-01 00:00:59 fan.c:288:rear_fan_power_on: Note: rear fan is power on!
1970-01-01 00:00:59 driver-btm-api.c:1165:miner_device_init: Detect 256MB control board of XILINX
1970-01-01 00:00:59 driver-btm-api.c:1106:init_fan_parameter: fan_eft : 0  fan_pwm : 0
1970-01-01 00:01:05 driver-btm-api.c:1090:init_miner_version: miner ID : 805445801c20881c
1970-01-01 00:01:05 driver-btm-api.c:1096:init_miner_version: FPGA Version = 0xB031
1970-01-01 00:01:06 board.c:36:jump_and_app_check_restore_pic: chain[0] PIC jump to app
1970-01-01 00:01:08 board.c:40:jump_and_app_check_restore_pic: Check chain[0] PIC fw version=0x88
1970-01-01 00:01:08 thread.c:807:create_pic_heart_beat_thread: create thread
1970-01-01 00:01:12 power_api.c:228:set_higher_voltage_raw: higher_voltage_raw = 2100
1970-01-01 00:01:12 power_api.c:278:set_to_higher_voltage: Set to voltage raw 2100, one step.
1970-01-01 00:01:14 power_api.c:85:check_voltage_multi: retry time: 0
1970-01-01 00:01:15 power_api.c:40:_get_avg_voltage: chain = 0, voltage = 20.926828
1970-01-01 00:01:15 power_api.c:53:_get_avg_voltage: average_voltage = 20.926828
1970-01-01 00:01:15 power_api.c:71:check_voltage: target_vol = 21.00, actural_vol = 20.93, check voltage passed.
1970-01-01 00:01:15 uart.c:71:set_baud: set fpga_baud to 115200
1970-01-01 00:01:15 driver-hash-chip.c:245:dhash_chip_set_baud_v2: chain[0]: chip baud = 115200, chip_divider = 26
1970-01-01 00:01:26 driver-btm-api.c:1030:check_asic_number_with_power_on: Chain[0]: find 65 asic, times 0
1970-01-01 00:01:29 driver-hash-chip.c:266:set_uart_relay: set uart relay to 0x330003
1970-01-01 00:01:29 driver-btm-api.c:363:set_order_clock: chain[0]: set order clock, stragegy 3
1970-01-01 00:01:29 driver-hash-chip.c:502:set_clock_delay_control: core_data = 0x34
1970-01-01 00:01:29 driver-hash-chip.c:502:set_clock_delay_control: core_data = 0x34
1970-01-01 00:01:29 driver-hash-chip.c:517:set_clock_delay_control: singe chain mode
1970-01-01 00:01:30 temperature.c:320:calibrate_temp_sensor_one_chain: chain 0 temp sensor NCT218
1970-01-01 00:01:31 temperature.c:488:temp_statistics_show:   pcb temp 17~20  chip temp 18~20
1970-01-01 00:01:31 uart.c:71:set_baud: set fpga_baud to 12000000
1970-01-01 00:01:31 driver-hash-chip.c:245:dhash_chip_set_baud_v2: chain[0]: chip baud = 12000000, chip_divider = 3
1970-01-01 00:01:31 temperature.c:488:temp_statistics_show:   pcb temp 18~19  chip temp 19~21
1970-01-01 00:01:31 power_api.c:222:set_working_voltage_raw: working_voltage_raw = 1950
1970-01-01 00:01:31 frequency.c:808:inc_freq_with_fixed_vco: chain = 255, freq = 625, is_higher_voltage = true
1970-01-01 00:01:42 power_api.c:348:set_to_voltage_by_steps: Set to voltage raw 2070, step by step.
1970-01-01 00:01:44 power_api.c:85:check_voltage_multi: retry time: 0
1970-01-01 00:01:45 power_api.c:40:_get_avg_voltage: chain = 0, voltage = 20.217949
1970-01-01 00:01:45 power_api.c:53:_get_avg_voltage: average_voltage = 20.217949
1970-01-01 00:01:45 power_api.c:71:check_voltage: target_vol = 20.70, actural_vol = 20.22, check voltage passed.
1970-01-01 00:01:45 driver-btm-api.c:666:set_timeout: freq = 625, percent = 10, hcn = 4915, timeout = 7
1970-01-01 00:01:45 power_api.c:306:set_to_working_voltage_by_steps: Set to voltage raw 1950, step by step.
1970-01-01 00:01:50 power_api.c:85:check_voltage_multi: retry time: 0
1970-01-01 00:01:52 power_api.c:40:_get_avg_voltage: chain = 0, voltage = 19.097436
1970-01-01 00:01:52 power_api.c:53:_get_avg_voltage: average_voltage = 19.097436
1970-01-01 00:01:52 power_api.c:71:check_voltage: target_vol = 19.50, actural_vol = 19.10, check voltage passed.
1970-01-01 00:01:52 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 14, reg = 28
1970-01-01 00:01:52 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 14, reg = 28
1970-01-01 00:01:52 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 14, reg = 28
1970-01-01 00:01:52 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 14, reg = 28
1970-01-01 00:01:52 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 14, reg = 28
1970-01-01 00:01:52 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 14, reg = 28
1970-01-01 00:01:52 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 14, reg = 28
1970-01-01 00:01:52 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 14, reg = 28
1970-01-01 00:01:52 temperature.c:744:get_temp_info: read temp sensor failed: chain = 0, sensor = 0, chip = 14, reg = 0
1970-01-01 00:01:52 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 14, reg = 28
1970-01-01 00:01:52 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 14, reg = 28
1970-01-01 00:01:52 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 14, reg = 28
1970-01-01 00:01:52 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 14, reg = 28
1970-01-01 00:01:52 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 14, reg = 28
1970-01-01 00:01:52 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 14, reg = 28
1970-01-01 00:01:52 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 14, reg = 28
1970-01-01 00:01:52 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 14, reg = 28
1970-01-01 00:01:52 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 14, reg = 28
1970-01-01 00:01:52 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 14, reg = 28
1970-01-01 00:01:52 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 14, reg = 28
1970-01-01 00:01:52 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 14, reg = 28
1970-01-01 00:01:53 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 14, reg = 28
1970-01-01 00:01:53 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 14, reg = 28
1970-01-01 00:01:53 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 14, reg = 28
1970-01-01 00:01:53 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 14, reg = 28
1970-01-01 00:01:53 temperature.c:744:get_temp_info: read temp sensor failed: chain = 0, sensor = 0, chip = 14, reg = 1
1970-01-01 00:01:53 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 10, reg = 28
1970-01-01 00:01:53 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 10, reg = 28
1970-01-01 00:01:53 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 10, reg = 28
1970-01-01 00:01:53 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 10, reg = 28
1970-01-01 00:01:53 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 10, reg = 28
1970-01-01 00:01:53 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 10, reg = 28
1970-01-01 00:01:53 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 10, reg = 28
1970-01-01 00:01:53 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 10, reg = 28
1970-01-01 00:01:53 temperature.c:744:get_temp_info: read temp sensor failed: chain = 0, sensor = 1, chip = 10, reg = 0
1970-01-01 00:01:53 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 10, reg = 28
1970-01-01 00:01:53 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 10, reg = 28
1970-01-01 00:01:53 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 10, reg = 28
1970-01-01 00:01:53 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 10, reg = 28
1970-01-01 00:01:53 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 10, reg = 28
1970-01-01 00:01:53 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 10, reg = 28
1970-01-01 00:01:53 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 10, reg = 28
1970-01-01 00:01:53 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 10, reg = 28
1970-01-01 00:01:53 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 10, reg = 28
1970-01-01 00:01:53 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 10, reg = 28
1970-01-01 00:01:53 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 10, reg = 28
1970-01-01 00:01:54 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 10, reg = 28
1970-01-01 00:01:54 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 10, reg = 28
1970-01-01 00:01:54 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 10, reg = 28
1970-01-01 00:01:54 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 10, reg = 28
1970-01-01 00:01:54 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 10, reg = 28
1970-01-01 00:01:54 temperature.c:744:get_temp_info: read temp sensor failed: chain = 0, sensor = 1, chip = 10, reg = 1
1970-01-01 00:01:54 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:01:54 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:01:54 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:01:54 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:01:54 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:01:54 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:01:54 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:01:54 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:01:54 temperature.c:744:get_temp_info: read temp sensor failed: chain = 0, sensor = 2, chip = 54, reg = 0
1970-01-01 00:01:54 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:01:54 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:01:54 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:01:54 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:01:54 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:01:54 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:01:54 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:01:54 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:01:54 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:01:55 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:01:55 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:01:55 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:01:55 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:01:55 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:01:55 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:01:55 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:01:55 temperature.c:744:get_temp_info: read temp sensor failed: chain = 0, sensor = 2, chip = 54, reg = 1
1970-01-01 00:01:55 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:01:55 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:01:55 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:01:55 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:01:55 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:01:55 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:01:55 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:01:55 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:01:55 temperature.c:744:get_temp_info: read temp sensor failed: chain = 0, sensor = 3, chip = 50, reg = 0
1970-01-01 00:01:55 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:01:55 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:01:55 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:01:55 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:01:55 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:01:55 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:01:55 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:01:55 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:01:56 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:01:56 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:01:56 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:01:56 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:01:56 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:01:56 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:01:56 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:01:56 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:01:56 temperature.c:744:get_temp_info: read temp sensor failed: chain = 0, sensor = 3, chip = 50, reg = 1
1970-01-01 00:01:56 single_board_test.c:1659:wait_warm_up: temper sensor bad

I have this same exact error for over 10 boards. Also the the error states chip 50, 54, 10, 14 as being bad chips but I get this same exact error on many boards. Cant all have the same exact chips with the same exact error. What could this be? My boards are in 100% perfect condition and all the sudden stopped working one day. Showed temp sensor errors and then stopped working completely. Running this test fixture is showing temo sensor is bad but that cant be either. Something is wrong and I dont know how to find the problem. Please advise. Thank you.
legendary
Activity: 2436
Merit: 6643
be constructive or S.T.F.U
I am not really an expert in this field, but why would you use flux? it's not like you are soldering the chip on the hash board, maybe you could use flux to clean the heatsink before gluing it on the chip but I don't think you need to put anything else besides the thermal adhesive, by the way, here is a slightly different way of doing it > https://www.youtube.com/watch?v=378FPjkHQJc.
newbie
Activity: 25
Merit: 11
So you put the thermal solder on the chip? Do I need to put anything on the heat sink such as flux or something else?
legendary
Activity: 2436
Merit: 6643
be constructive or S.T.F.U
You can use something like https://www.amazon.com/Arctic-Silver-Premium-Adhesive-ASTA-7G/dp/B0087X7262, or use the black glue from the same website you got the tool, adding more solder to the existing isn't a good idea, you should clean the chip's surface and then start fresh, ensure that the amount of solder paste is equally even across the whole chip.

Watch this video https://youtu.be/5WH7g61d90w, it's helpful.
newbie
Activity: 25
Merit: 11
Another question I have is on some of my other boards when I take off a heat sink there seems to be very little solder on the heat sink. Looks like at manufacturing they did not apply enough solder. I want to add more solder to the heat sink and put it back on the chip. What do I use for this? Also how do I do this? I have low temp solder paste but is there a procedure to add more solder to the heat sink? Maybe some flux or some other chemical? I tried adding flux and then put some low temp solder paste but it does not stick to the heat sink edges where the solder is missing. The solder paste just melts off and connected to the solder that is already there. It does not stick to the heat sink edges where the solder is missing. How can I make it stick? Thank you.
legendary
Activity: 2436
Merit: 6643
be constructive or S.T.F.U
Here here is the log from the test fixture.
http://servervideos.hopto.org/error.jpg

I am not familiar with the fixture tool but looks like it is telling you that chip no 50 is bad, no?

Anyway you should contact zeusbtc and ask for the voltage reference range, each domain (group of chips) has its own normal voltage values, you just need a volt-meter and the reference table.
newbie
Activity: 25
Merit: 11
Here here is the log from the test fixture.
http://servervideos.hopto.org/error.jpg
hero member
Activity: 544
Merit: 589
Can you post the log from the test fixture run showing the bad temp sensors?

If the issue is only a poorly connected heat sink, then it would only fail when the chip overheats. Going to be hard to find that.

You might try feeling the temperature of the individual heat sinks right after it fails to see if a heatsink is warmer or cooler than the rest.
newbie
Activity: 25
Merit: 11
I have that test fixture. It tells me that all temp sensors are bad. Also Im testing one board at a time. Im using a brand new psu for testing.  How do I test the voltage and resistance of the chip? Also where do I start at? Chip 1 or somewhere else. I have no idea what chip to start at since none of the heat sinks came loose. Is there a quick way to test each chip? Im 100% the problem has to be that a heat sink is not fully on. All I need is to find it and I have all the tools to re seat it back on.
legendary
Activity: 2436
Merit: 6643
be constructive or S.T.F.U
I have all the tools to fix the problem. Solder, heat gun, tin, watt meter, oscilloscope, psu, extra chips and so on. I just dont know which chip is at fault. What are the ways to find that bad connection? Thank you.

You will need a fixture tool like this one , what does tool does is tell you where was the signal interrupted, and then to double-check you could measure the voltage and/or resistance of that chip, keep in mind that in the event of all 3 hash boards throwing that temp-sensor error then the problem is most likely a bad PSU.
newbie
Activity: 25
Merit: 11
I have this error for all my bad boards. I have 20+ bad boards. S17+ 73th.
read temp sensor failed
After some time the board shows 0 asic found.
After doing a lot of research the problems seems to be bad contact between the heat sink and the chip.
So my question is how do I find that bad contact? I tryed tapping on the heatsinks lightly to see if any heat sinks will come off. I tryed banging the board lightly to get some heat sinks to come off. But all the heat sinks are still on. I have all the tools to fix the problem. Solder, heat gun, tin, watt meter, oscilloscope, psu, extra chips and so on. I just dont know which chip is at fault. What are the ways to find that bad connection? Thank you.
hero member
Activity: 544
Merit: 589
This is a good place to start: https://www.zeusbtc.com/NewsDetails.asp?ID=182

There is a download link on the page for the full repair manual, but it is in Chinese. You can load it into google translate to get a somewhat understandable translation.

If you want to try to fix them yourself, then you'll need a test fixture, a multimeter,  adjustable heat gun, soldiering iron, ...etc. An oscilloscope is also helpful. I wouldn't recommend it unless you already have a background in electronics and have some experience doing surface mount rework.
Pages:
Jump to: