Problems with T17s from Alibaba | Bitcointalksearch.org

Bipede85

newbie

Activity: 27

Merit: 2

Quote from: mikeywith on February 25, 2021, 07:20:23 PM

65m³/min is (2275 CFM) which should be enough for these 4 miners, but you know you can't exhaust more air than you intake, so even installing 2 or fans might not solve the problem, 400m tube is pretty small especially if the spacing between these miners isn't enough. So until you figure out a way to get rid of that heat, I would suggest you underclock all 4 gears, if it was S9s I would not even bother with 90c, but these 17 series are pretty sensitive to heat and they could die faster than you think if you keep running them this hot.

what would be normal temperatures for the 17 series? and is there any specific firmware you would recommend for the 17s and for s9s?

mikeywith

legendary

Activity: 2478

Merit: 6693

be constructive or S.T.F.U

65m³/min is (2275 CFM) which should be enough for these 4 miners, but you know you can't exhaust more air than you intake, so even installing 2 or fans might not solve the problem, 400m tube is pretty small especially if the spacing between these miners isn't enough. So until you figure out a way to get rid of that heat, I would suggest you underclock all 4 gears, if it was S9s I would not even bother with 90c, but these 17 series are pretty sensitive to heat and they could die faster than you think if you keep running them this hot.

Bipede85

newbie

Activity: 27

Merit: 2

I'm living and working in a developing country and managing the room temperature it's not easy because its pretty hot in here, it's around 30ºC sometimes less, just now it's 33, it has AC but it's such high volume of air moving that I find it hard to get it lower than that.

In retrospective I would have made things differently concerning the movement of air, I put 2 x 65 m3/min exhausts pulling hot air from aluminium exhausts connected to several machines that connect to 400 mm tubes.

s17+ s17+ T17 T17
| | | | | | | |
----------------------------------
/
400 mm tube / exhaust pulling 65 m3/min
/
-----------------------------------

mikeywith

legendary

Activity: 2478

Merit: 6693

be constructive or S.T.F.U

Quote from: Bipede85 on February 23, 2021, 12:12:53 PM

2021-02-23 17:05:52 thread.c:968:asic_status_monitor_thread: ERROR: chain 2 get hashrate_reg_counter 0, require 65, failed times 1 2021-02-23 17:05:53 temperature.c:838:get_temp_info: read temp sensor failed: chain = 2, sensor = 0, chip = 14, reg = 0 2021-02-23 17:05:53 temperature.c:838:get_temp_info: read temp sensor failed: chain = 2, sensor = 0, chip = 14, reg = 1 2021-02-23 17:05:53 temperature.c:838:get_temp_info: read temp sensor failed: chain = 2, sensor = 1, chip = 10, reg = 0 2021-02-23 17:05:54 thread.c:996:asic_status_monitor_thread: chain 2 can't get enough hashrate reg val for 0 times. 2021-02-23 17:05:54 temperature.c:838:get_temp_info: read temp sensor failed: chain = 2, sensor = 1, chip = 10, reg = 1 2021-02-23 17:05:54 thread.c:968:asic_status_monitor_thread: ERROR: chain 2 get hashrate_reg_counter 0, require 65, failed times 1 2021-02-23 17:05:54 temperature.c:838:get_temp_info: read temp sensor failed: chain = 2, sensor = 2, chip = 54, reg = 0 2021-02-23 17:05:54 temperature.c:838:get_temp_info: read temp sensor failed: chain = 2, sensor = 2, chip = 54, reg = 1 2021-02-23 17:05:55 temperature.c:838:get_temp_info: read temp sensor failed: chain = 2, sensor = 3, chip = 50, reg = 0 2021-02-23 17:05:55 temperature.c:838:get_temp_info: read temp sensor failed: chain = 2, sensor = 3, chip = 50, reg = 1

I have a similar issue which I posted about just a few minutes ago, mind you the board that gives me this error was a dead board which I froze in the freezer, worked fine for a while, and then started to throw these temp sensors errors, so in other words, chances are this is a dead hashboard.

We know that a loose heatsink causes temp sensors not to read, but that usually comes along with some missing asic chips, but the overall temp sensor issues do not mean the sensor is bad, it means the hash board is dying or otherwise, dead.

You could get lucky with custom firmware, I want to try that on the miner I was testing but for some stupid reason when I removed BraiinOS it flashed the latest Bitmain firmware instead of the MP test so now I need to go get that miner and Sdcard my way into it, but I would without a doubt test Vnish when I have the time for that.

Quote from: Bipede85 on February 23, 2021, 12:12:53 PM

Should I just install a new firmware and change the max temp for the chips? Because it can't rly be that high when the other boards are fine?

The other boards are not fine, the nearest sensors to the exhaust fan are reporting 86c, I wouldn't be surprised if the middle board is at or above 90 or hot enough to trigger the firmware to shut it down, so this could be more than just a temp sensor glitch, it could be that chain is actually running too hot, so I would fix that part first, and once you are certain the board isn't getting hot and it's just a bad sensor, maybe you could increase the max temp by a little, but be careful, if the board does indeed run way too hot it could melt and set the whole place on fire.

Bipede85

newbie

Activity: 27

Merit: 2

Without wanting to create a new thread, I've received the s17+, one was working fine for a few minutes, but then started hashing with only 2 chains, this is the log:

https://pastebin.com/embed_js/XuJW8fZn

while I was writing this the s17 dropped another chain:

Code:

2021-02-23 17:05:52 thread.c:968:asic_status_monitor_thread: ERROR: chain 2 get hashrate_reg_counter 0, require 65, failed times 1
2021-02-23 17:05:53 temperature.c:838:get_temp_info: read temp sensor failed: chain = 2, sensor = 0, chip = 14, reg = 0
2021-02-23 17:05:53 temperature.c:838:get_temp_info: read temp sensor failed: chain = 2, sensor = 0, chip = 14, reg = 1
2021-02-23 17:05:53 temperature.c:838:get_temp_info: read temp sensor failed: chain = 2, sensor = 1, chip = 10, reg = 0
2021-02-23 17:05:54 thread.c:996:asic_status_monitor_thread: chain 2 can't get enough hashrate reg val for 0 times.
2021-02-23 17:05:54 temperature.c:838:get_temp_info: read temp sensor failed: chain = 2, sensor = 1, chip = 10, reg = 1
2021-02-23 17:05:54 thread.c:968:asic_status_monitor_thread: ERROR: chain 2 get hashrate_reg_counter 0, require 65, failed times 1
2021-02-23 17:05:54 temperature.c:838:get_temp_info: read temp sensor failed: chain = 2, sensor = 2, chip = 54, reg = 0
2021-02-23 17:05:54 temperature.c:838:get_temp_info: read temp sensor failed: chain = 2, sensor = 2, chip = 54, reg = 1
2021-02-23 17:05:55 temperature.c:838:get_temp_info: read temp sensor failed: chain = 2, sensor = 3, chip = 50, reg = 0
2021-02-23 17:05:55 temperature.c:838:get_temp_info: read temp sensor failed: chain = 2, sensor = 3, chip = 50, reg = 1
2021-02-23 17:05:55 temperature.c:865:get_temp_info: ERROR: chain 2 can get NONE temp info or temp value abnormal, power it off 
2021-02-23 17:05:56 thread.c:996:asic_status_monitor_thread: chain 2 can't get enough hashrate reg val for 1 times.
2021-02-23 17:05:56 thread.c:968:asic_status_monitor_thread: ERROR: chain 2 get hashrate_reg_counter 0, require 65, failed times 1
2021-02-23 17:05:57 frequency.c:205:get_ideal_hash_rate_GH: ideal_hash_rate = 23849
2021-02-23 17:05:57 frequency.c:223:get_sale_hash_rate_GH: sale_hash_rate = 22000

About the T17s, the one that was working fine started giving some temp errors, I've disabled the 2nd chain that was giving high temperature errors, this is the log:

https://pastebin.com/embed_js/USiuURJS

T17: the other chains give normal temperature, it's just that 1 chain giving me 104 chip temp. https://imgur.com/a/LkGBJNh

Should I just install a new firmware and change the max temp for the chips? Because it can't rly be that high when the other boards are fine?

wndsnb

hero member

Activity: 544

Merit: 589

OK, the chip in the bottom of the photo is physically damaged, the heatsink was hit with enough force to break the top asic package off. Pieces of it are probably still attached to the heatsink. Not going to be able to glue that back on and get it working. The chip may be functional electrically, but I don't think you'd be able to get a good enough connection to the heatsink to allow it to operate without overheating. That chip will have to be replaced to get the board working.

The other two are not physically damaged, but the copper plating that interfaces between the asic package and the heatsink has delaminated. Normally, that copper will accept the solder they use to attach the heatsink. Without it, the solder won't flow and will ball up creating a bad connection that won't transfer the heat well enough. Gluing those two heatsinks back on with a thermally conductive adhesive could work though.

Bipede85

newbie

Activity: 27

Merit: 2

I uploaded the pic to imgur, should be ok now.

wndsnb

hero member

Activity: 544

Merit: 589

In my experience, it is not likely just re-attaching the heatsinks would fix it anyway. Possible, but not likely. I've been through about 10 hashboards with either shifted or detached heatsinks, and every one of them had issues with other chips that did not have visible heatsink issues.

You're photo links got removed, so I can't see what you mean by "damage", but if it is burn damage then most likely the board is not repairable. When an asic gets hot enough to char the PCB, it's hot enough to delaminate the copper traces and pads on the board. So you might be able to remove the burnt chip, but you would not be able to replace it with a new one.

The test fixtures don't actually locate bad chips, they just run the board in a test mode that allows a technician to probe test points on the board with a volt meter or oscilloscope. Some percent of the time the test fixture will tell you some number of chips were found, but that would just be the same output as in the miner log, when it tells you "find 24 asic", when there should be 30. Most of the time it just will say 0 asics found.

Bipede85

newbie

Activity: 27

Merit: 2

Are the bad chips only the ones with loose heatsinks? or could they still have heatsinks but be damaged? for example fried? is there anyway to tell besides looking for the ones without heatsinks? And a chip without a heatsink is it automatically a bad chip or he can still work?

I saw once a guy on youtube with some kind of device that located the bad chips, if I buy a few loose chips and solder replace the bad ones is that viable too?

For example this ones look damaged beyond the heatsink

https://imgur.com/a/o6CxYTN

I'm afraid the 2 s17s I'll receive in a week will be the same crap.

mikeywith

legendary

Activity: 2478

Merit: 6693

be constructive or S.T.F.U

They are terribly glued but I doubt shipping with proper protection will harm that, it's heat that does, but then shipping them to EU or USA for repair and then shipping them back will probably cost you more than what those hash boards are actually worth, only you know the shipping cost so do run the numbers.

Plan B would be to give them a mobile/pc repair shop, in many cases, the damage is physically obvious which is a heat sink or a few of them that fell off, you need to get something like Arctic silver adhesive and ask them to glue the heatsink for you, all they need is a bit of skill, stable hands, and a heat gun.

If you can't find the bad heatsink do apply some force using your finger, the ones that are too loose will fall off easily, If the problem is beyond that then it's beyond repair, you will need some tools and some training.

Plan C, sell the dead hash boards, some people like wndsnb who are experts in fixings these hash boards might be interested in acquiring your dead hash boards, I don't expect him or anyone else for that matter to offer you a lot since they are taking a risk as some boards can't be fixed, but something is always better than nothing.

Quote from: Bipede85 on February 15, 2021, 04:06:01 PM

Is there any point in trying to dispute the sell in Alibaba and trying to get at least a parcial of the money back? or there's no point?

Chances are slim, but you lose nothing, a few emails back and forth and you may get yourself a refund of some kind.

Bipede85

newbie

Activity: 27

Merit: 2

I'm in Angola, Africa lol so I'm guessing my fixing options are very limited, even if I do send the boards for repairs, they can get damaged again in the shipping, no idea how fragile they are. Is there any point in trying to dispute the sell in Alibaba and trying to get at least a parcial of the money back? or there's no point?

The chains in machine 3 I tried on the others and all of those chains have bad chips. I'm thinking in just taking the psu fans from machine 3 and replace the psu fans in machine 1 to see if they work, maybe that will bring back the 2nd chain on machine 1.

I have 2 S17 coming in the next week, I'm afraid they will have issues too. Unfortunetly only read about the 17 series problems after buying them.

mikeywith

legendary

Activity: 2478

Merit: 6693

be constructive or S.T.F.U

Quote from: Bipede85 on February 15, 2021, 07:09:03 AM

seems like a bad psu on machine 3? What are my options? a new psu?

If you did replace the 2 fans on the PSU and saw no difference, i.e you confirmed it's a dead PSU, then your only option would be getting a replacement, unfortunately, bitmain is out of stock so you will need to search for it, where are you located?

wndsnb

hero member

Activity: 544

Merit: 589

Unfortunately, the way these hashboards are built makes it impossible to run unless all chips are working, so there is no firmware that will run a board that doesn't have all chips working.

The only way to get those boards working at any capacity is to have them repaired, see the "Where to fix your ASIC miners" topic.

For machine 3, swap the PSU with one of the working miners to see if the PSU is the only problem. No sense spending $$$ on a PSU if all the hashboards are dead.

Bipede85

newbie

Activity: 27

Merit: 2

After opening up the machines and changing some boards here and there I managed to get machine 2 to work with 3 boards.

Machine 1 (with 2 psu fans not working) had 2 chains working but now only 1 chain working:

https://pastebin.com/embed_js/nyJMMaZx

Machine 3 nothing new, not enough voltage

https://pastebin.com/embed_js/Ti2ux15m

seems like a bad psu on machine 3? What are my options? a new psu?

what about the bad chains, is there a firmware that I can use to make it use the ones that are available? for example a board with only 18 good chips it uses the 18 chips on that board?

mikeywith

legendary

Activity: 2478

Merit: 6693

be constructive or S.T.F.U

2 dead fans usually mean a dead PSU and not just bad fans that need replacement, you can double check by using the 3rd working fan in one of the bad fan's slots, if it works, then it's a matter of fan replacement, if it does not, then it's a bad PSU, which is more likely than not.

Bipede85

newbie

Activity: 27

Merit: 2

240v.

Yes about the nicehash I had to add that adress to the t17s, that's how i got 2 to run, one with the 3 chains and the other with only 2.

mikeywith

legendary

Activity: 2478

Merit: 6693

be constructive or S.T.F.U

I think he also needs to use the asicboost stratum URL.

Code:

stratum+tcp://sha256asicboost.LOCATION.nicehash.com:3368

All the new gears come with asicboost enabled by default and for some weird reason, some of them won't work on the normal Sha256.

OP, what is the voltage at your farm/house?

wndsnb

hero member

Activity: 544

Merit: 589

Code:

2021-02-12 15:32:36 driver-btm-api.c:1042:check_asic_number_with_power_on: Chain[0]: find 30 asic, times 0
2021-02-12 15:32:46 driver-btm-api.c:1042:check_asic_number_with_power_on: Chain[1]: find 30 asic, times 0
2021-02-12 15:32:56 driver-btm-api.c:1042:check_asic_number_with_power_on: Chain[2]: find 11 asic, times 0
2021-02-12 15:33:06 driver-btm-api.c:1042:check_asic_number_with_power_on: Chain[2]: find 11 asic, times 1
2021-02-12 15:33:16 driver-btm-api.c:1042:check_asic_number_with_power_on: Chain[2]: find 11 asic, times 2
2021-02-12 15:33:16 driver-btm-api.c:1069:check_asic_number: Chain 2 only find 11 asic, will power off hash board 2

Yeah, chain 2 has issues, bad connection on the board or bad ASIC chip.

Also, for nicehash you should add #xnsub to the end of the pool URL.

And yeah, 2 fans not running is eventually going to cause a problem. Might just be dead fans, so you could try just replacing them.
https://www.zeusbtc.com/ASIC-Miner-Repair/Parts-Tools-Details.asp?ID=235

Bipede85

newbie

Activity: 27

Merit: 2

About the first machine, I've changed the mining pool since it seems T17s don't like Nicehash

https://pastebin.com/embed_js/Txv8XwYa

But the hashing rate is much lower than expected, and only 2 boards show? pic:

https://drive.google.com/file/d/1pfR46oVVK-gUKTrbMfEeGwy0lxzHzewh/view?usp=sharing

also this 1st machine has 2 of the psu fans not working. could this be related? or just a problem that is lurking?

wndsnb

hero member

Activity: 544

Merit: 589

If that 1st log does just stop there it could indicate a PSU problem as well. Does the miner just lock up at that point?

I'd try pulling the ribbon cable to chain 0 in #2 first. Then I'd try taking the PSU from #1 and try it in #3. If it starts to die in the same place as #1 did, then you've got another bad PSU. If the PSUs from #1 and #3 look bad, you could try the PSU from #2 in #3 and #1 to see if either of them has 3 working boards.

philipma1957

legendary

Activity: 4382

Merit: 9330

'The right to privacy matters'

1st image shows

Code:

2021-02-12 08:42:25 driver-btm-api.c:1939:bitmain_board_init: Fan check passed.
2021-02-12 08:42:26 board.c:36:jump_and_app_check_restore_pic: chain[0] PIC jump to app
2021-02-12 08:42:30 board.c:40:jump_and_app_check_restore_pic: Check chain[0] PIC fw version=0xb9
2021-02-12 08:42:31 board.c:36:jump_and_app_check_restore_pic: chain[1] PIC jump to app
2021-02-12 08:42:35 board.c:40:jump_and_app_check_restore_pic: Check chain[1] PIC fw version=0xb9
2021-02-12 08:42:37 board.c:36:jump_and_app_check_restore_pic: chain[2] PIC jump to app
2021-02-12 08:42:40 board.c:40:jump_and_app_check_restore_pic: Check chain[2] PIC fw version=0xb9
2021-02-12 08:42:40 thread.c:880:create_pic_heart_beat_thread: create thread
2021-02-12 08:42:40 power_api.c:55:power_init: power init ...
2021-02-12 08:42:40 driver-btm-api.c:1949:bitmain_board_init: Enter 30s sleep to make sure power release finish.
2021-02-12 08:42:40 power_api.c:46:power_off: init gpio907

at this point did it die or do more.

second image is it the second unit?

it shows

Code:

2021-02-12 09:06:24 temperature.c:697:get_temp_info: read temp sensor failed: chain = 0, sensor = 3, chip = 184, reg = 1
2021-02-12 09:06:27 temperature.c:697:get_temp_info: read temp sensor failed: chain = 0, sensor = 3, chip = 184, reg = 0
2021-02-12 09:06:27 temperature.c:697:get_temp_info: read temp sensor failed: chain = 0, sensor = 3, chip = 184, reg = 1
2021-02-12 09:06:29 temperature.c:697:get_temp_info: read temp sensor failed: chain = 0, sensor = 3, chip = 184, reg = 1
2021-02-12 09:06:32 temperature.c:697:get_temp_info: read temp sensor failed: chain = 0, sensor = 3, chip = 184, reg = 0

that is likely a bad board.

third image is this third unit?

Code:

2021-02-12 09:23:43 power_api.c:86:get_average_voltage: chain[0], voltage is: 5.020898
2021-02-12 09:23:44 power_api.c:86:get_average_voltage: chain[1], voltage is: 5.069883
2021-02-12 09:23:46 power_api.c:86:get_average_voltage: chain[2], voltage is: 5.094375
2021-02-12 09:23:46 power_api.c:97:get_average_voltage: aveage voltage is: 5.061719
2021-02-12 09:23:46 power_api.c:110:check_voltage: target_vol = 17.00, actural_vol = 5.06, more than 1.0v diff.
2021-02-12 09:23:47 power_api.c:124:check_voltage_multi: retry time: 29
2021-02-12 09:23:49 power_api.c:86:get_average_voltage: chain[0], voltage is: 5.118867
2021-02-12 09:23:52 power_api.c:86:get_average_voltage: chain[1], voltage is: 5.045391
2021-02-12 09:23:55 power_api.c:86:get_average_voltage: chain[2], voltage is: 4.965791
2021-02-12 09:23:55 power_api.c:97:get_average_voltage: aveage voltage is: 5.043350
2021-02-12 09:23:55 power_api.c:110:check_voltage: target_vol = 17.00, actural_vol = 5.04, more than 1.0v diff.
2021-02-12 09:23:56 power_api.c:182:set_iic_power_by_voltage: now set voltage to : 17.000000
2021-02-12 09:23:56 driver-btm-api.c:205:set_miner_status: ERROR_POWER_LOST
2021-02-12 09:23:56 driver-btm-api.c:146:stop_mining: stop mining: power set failed!
2021-02-12 09:23:56 thread.c:930:cancel_read_nonce_reg_thread: cancel thread
2021-02-12 09:23:56 driver-btm-api.c:131:killall_hashboard: ****power off hashboard****

this is certainly a bad psu.

good news is I think pulling a psu from unit 1 or unit 2 and putting it on unit 3 will get unit 3 to work.

on second unit pulling the dead board could work to give you 2 good boards in that unit.

on unit 1 I am not sure what is wrong with it.

Bipede85

newbie

Activity: 27

Merit: 2

Hi, I've recently bought 3 T17s from Alibaba, and the 3 are not working 100% success rate right?

I want to try and fix them locally before shipping them to another continent, here are the errors the present:

1st: https://pastebin.com/embed_js/5NWKUXFS

2nd: https://pastebin.com/embed_js/ZRG4Nttc

3rd: https://pastebin.com/embed_js/p4vja5Eb

on the 1st machine 2 of the 3 psu fans I've noticed aren't working.

Any help would be appreciated.

Topic: Problems with T17s from Alibaba (Read 363 times)