Pages:
Author

Topic: T17/S17 malfunction: cases, solutions, remedies, RMA history - page 11. (Read 6919 times)

sr. member
Activity: 604
Merit: 416
I have never done soldering before and the only experience with heat gun that I have was bending hard-line tubbing. I wouldn't dare to do that yet, I might buy broken S9s to learn how to do it on them but I would rather RMA this unit right now.

I could provide you a better and zoomed in pictures of chip and heatsink.

Solder snapped with heatsink for sure, the only thing that I could see on bottom of the heatsink was name (or part of it) of the chip, which might or might not indicate that it took a part of chip as well.

That chip group did have a problem with overheating now that you mention it, but never above 90C, it was usually in mid 80s and during winter in low 70s.

It is one of the last chips on the board if I remember right.

During my initial testing when I didn't even open the unit, that hashboard would at least pop up in voltage check. But now after loose heatsink it doesn't even show up in Kernel Log which is normal but it might mean that it is possible to save it, but as I said, I am not up to that challenge.
hero member
Activity: 544
Merit: 589
Moving on, after the futile attempt at "fixing" it, I gently pushed the heatsink in case with my middle finger and this was the result:

https://i.imgur.com/8Nayhat.jpg

May that hashboard rest in peace.

That may not actually be as bad as it might look. If the solder snapped without taking part of the chip with it, then re-attaching the heatsink with a heat gun might fix it (probably very slim chance though).

I think all of the shiny solder on the chip in your photo was not in contact with the heatsink, the part that snapped when you pushed on it would be the small dull section in the top left of the chip. Since only a small portion of the heatsink was actually making contact, it would have been overheating, so this would have probably caused issues and lines up with other suggestions that poorly attached heatsinks are a major problem with these miners.

So maybe, if the chip didn't already overheat to the point of failure, just re-attaching the heatsink might do it.
sr. member
Activity: 604
Merit: 416
After careful and thorough examination of hashboards, I have found in addition to oxidised copper plate (thank you wndsnb for clarifying that), there was a heatsink which was misaligned in comparison to others see pictures below:



Note that they were touching at top most part but with a gentle push of finger I made space between them in order to test if the hashboard will power on, but it was futile attempt. The bottom part of loose heatsink is around 0.2mm away from nearby one, but as top part was touching, there was probably a shortage there.

Moving on, after the futile attempt at "fixing" it, I gently pushed the heatsink in case with my middle finger and this was the result:



May that hashboard rest in peace.

Now, I was thinking about 2 other hashboards that were working fine during the hottest day of summer (35C+ ambient) but as soon as ambient temperature dropped below 5C (few weeks ago) they started reading 0 out of 65 chips or 12 out of 65 chips (it warried from time to time). I figured maybe it had something to do with Thermodynamics law of expansion and contraction in hot and cold time respectively. So I took my Z11 (which is underclocked for better efficiency and therefore doesn't produce that much heat anyway) and placed it in front of my S17+ intake (picture below). And I was astonished by result. After first boot, both hashboards found all 65 out of 65 chips and started hashing away instantly.



To further secure my unit to hash as much as possible and as long as possible, I updated to newest firmware which seems to fix the exact problem I have but I was unaware of this before I tested the method I noted above.

Quote
1. Enhance the working stability of the miner under special conditions (e.g. a certain range of low or high ambient temperature environment).

I hope this helps someone in future.
hero member
Activity: 544
Merit: 589
The 2nd board is still not coming up after moving heatsinks back into position. Looks like an issue with the pic microcontroller on the board not enabling power so I ordered a pic ISD (pickit 4) to investigate that further. Hoping that Bitmain did not set the read protection on the pic flash so I can read out the firmware to be able to program a replacement chip if necessary, but I'm thinking chances of that are close to 0. If not, it looks like zeusbtc sells a download for pic firmware for all Bitmain's miners.

Well, the problem with this board was not the pic microcontroller. Turns out I was just not looking for its enable signal at the right time. The test jig I have only runs the board for about 1.5 seconds when it doesn't detect any ASICs, and I was measuring the pic outputs after the board had already powered down.

I was able to connect to the pic with the pickit4 and read the pic firmware out of a known good board and use it to verify the firmware in the bad board. So it does not look like there is a need to buy a download... a little sketchy that Zuesbtc is trying to sell it when you can read it yourself with the tools needed to program the chip. I haven't actually programmed a chip with the firmware I read out, so maybe there is some issue with doing that that I'm not seeing. Or maybe some boards do have the read protection bit set and I got lucky.
sr. member
Activity: 604
Merit: 416
You might be right, I could be very wrong. If you are right, then that would mean that there is a different problem and my initial thought was totally wrong. PSU might still have caused initial droppage of chips and I am nowhere near closer to narrowing down what is bad. I will just RMA it and let Bitmain decide what they did wrong.
hero member
Activity: 544
Merit: 589
Looks to me like just oxidation or staining, probably happened during the reflow during assembly or in a cleaning process after.

I think that copper plate is connected to the positive rail of the PSU input, it is there just to increase the current capacity of the connection between the + input and the FETs used to switch the main power to the asics on the board. The FETs are the 4 8-pin chips just below the copper piece (Q1, Q2, Q5, and Q6 in the schematic mikeywith posted). Those are pretty beefy FETs, but they would have gone up in smoke long before a bare copper plate would have heated up to the point of discoloration.
sr. member
Activity: 604
Merit: 416
I'll try to get better picture tomorrow, but it does look like a scorch mark to me.

During the next week, I might bring it to my local electrician to check it. He might be able to tell me more. But until then, I am opening a RMA ticket to see if Netherlands repair shop has opened and when can I send it for repair.

EDIT:

I've made a "good" picture of copper plate in case and I am almost certain that it is not oxidation, but I will have to check with professional. I tried scratching it off with my nail but it doesn't budge.

Ignore the discoloration on copper plates around the one in middle of picture. Does are due to light bouncing in weird ways as well as compression of google photos. Also, that brownish substance around copper is only dirt. There are no signs of scorch marks on PCB.



Second not so good image (but it does show uncompressed color of copper plate) is here: https://imgur.com/a/cyzvv2k
legendary
Activity: 3612
Merit: 2506
Evil beware: We have waffles!
I don't think those are chips, I think it is literally just a piece of copper. Possibly used to increase the current capacity of the signal. Can't really tell from your photo if it is just oxidation or scorch marks.

That is my guess as well - they are simply thick copper busbars soldered onto the power planes in the PCB in that area and are used to increase the current capacity of the power planes. The fact that they are located around where power comes in reinforces that idea. Only way to tell would be to check the resistance between some of them - if they are all soldered to the same power planes it should be very close to zero.

While that suspect bar looks funky, to me it does not suggest overheating - more like simple oxidation and/or bad plating. Either way it should not affect their function and cause problems.

Tip when checking low resistances: First press the meter probes together to establish the resistance of the meter leads. Most of the time it will be around 0.1 ohms. If the meter has a zeroing function, use it otherwise just subtract the meter lead resistance from what you measure across the bars to get the true value.
legendary
Activity: 2170
Merit: 6279
be constructive or S.T.F.U
I think Wndsnb is correct, this doesn't seem anything like a chip or even a diode, it's just a 10*3mm piece of copper connected to "copper 27" as shown in the image:



Here is the repair manual from Zeusbtc for S17+ https://drive.google.com/uc?id=1TfNMnpEFxEGMHbTr3lmy2-9Ry7XzlgP2&export=download , it has all the pinouts and details about how that hashboard actually works, perhabs someone who understands electronics well enough can help you further (I think NotfuzzyWarm is your guy).

I am also interested to know what happened here, that piece does indeed look burnt/shorted, maybe too much current passed in there?
hero member
Activity: 544
Merit: 589
My assumption is that it bricked the PSU which caused other hashboards to not work as well. I would like input from others about this assumption.

Discoloration on one of chips (in middle of picture) means that it probably burnt. How or why I am not sure. What that one does I am also not sure. It's name is  "copper_26". I'd appreciate if someone more tech-savy with Antminers could tell us what that chip is used for and why might it have burned?

I don't think those are chips, I think it is literally just a piece of copper. Possibly used to increase the current capacity of the signal. Can't really tell from your photo if it is just oxidation or scorch marks.
sr. member
Activity: 604
Merit: 416
Changing PSU did not help. A single board that sometimes had 0 chips instantly started hashing, while other two were not hashing like on old PSU but after a single restart it disappeared as well. 0 out of 65 chips read on all boards.

PSU that I put in now was new (ordered from Bitmain) and was tested on working unit, which means that it is not a problem. Control board cannot be the problem as I have tested it on working unit as well. Data cables I am yet to test but I doubt they are the problem.

I will quote this message in THIS THREAD as that is where it might help people. I will provide pictures of a single hashboard which I think started the chain reaction of all boards going down.

Here are three photos of hashboard that first started making problems: https://imgur.com/a/ZhxhMsO

Here is why I think this board is the reason my old PSU and other two hashboards are dead as well:



My assumption is that it bricked the PSU which caused other hashboards to not work as well. I would like input from others about this assumption.

Discoloration on one of chips (in middle of picture) means that it probably burnt. How or why I am not sure. What that one does I am also not sure. It's name is  "copper_26". I'd appreciate if someone more tech-savy with Antminers could tell us what that chip is used for and why might it have burned?

I am used to seeing heatsinks shorting hashboards or falling off, but I've never seen this chip get burned.
hero member
Activity: 544
Merit: 589
S9s are a tough one at current difficulty and price... most likely better off tossing the broken ones and buying used replacements. I think you can get them for under $50 at this point. But if you want to start repairing other miners practicing on S9s is a good way to start.

At a minimum, you'll need a multimeter, adjustable heat gun, soldering iron, solder wick, solder wire, solder paste, tinning stencil. A test jig is very useful.

Best place to start is probably just doing an internet search for miner repair videos.

Also, let us know what the repair places tell you. Interested to know what they charge.
newbie
Activity: 3
Merit: 0
Got ya thanks! I’ve sent them an email to.

Could anyone help me make a minimum requirement list for equipment needed for changing chips on boards. Would like to start trying it out since I a ton of s9 dead boards. Would prefer with equipment from Europe. Thank you!
hero member
Activity: 544
Merit: 589
Would it be smart to glue it with thermo adhesive cpu past in the mean time until i can change it? Would that help in any means?

Heatsinks on these miners are not attached using adhesive. Instead, the heatsinks are soldered directly to the top of the chips with low-temperature melting point solder (138 deg C). Using thermal adhesive to attach the heat sink to the chip may work, but it might be extremely difficult or impossible to remove it once it's on.

Zeusbtc has listings for independent repair centers on their website you could try contacting (bottom of this page https://www.zeusbtc.com/Repair.asp).
newbie
Activity: 3
Merit: 0
Hello, I just got a T17+ and it arrived with a damaged chip/sink and it won’t read the other chips. Any suggestions what I should do and if there is anyone that could help me change it?

Would it be smart to glue it with thermo adhesive cpu past in the mean time until i can change it? Would that help in any means?

Thank you.
legendary
Activity: 2170
Merit: 6279
be constructive or S.T.F.U
Your post is so good, I spent all my left merit on it.  Grin

Started work on the 1st S17 from my host. 2 hashboards showing 0 chips. Turns out both of them have heatsinks that are misaligned enough to short to the adjacent row.

After moving this heatsink back into position, this board comes up and identifies all chips, but the test fixture is showing a couple of bad chips so it looks like I'll need to replace a few.

Make sure you measure the voltage/resistance of the potential bad chips before replacing them, I have a reference for the voltage/resistance range but only for S9, so you will have to ask ZeusBtc for the reference, you do need the reference so if they don't have it, then get them from another working board, keep in mind that different regions of chips have different normal range, hard to tell which is which unless you can compare it against something else.

Quote
Also, just another thing I noticed with both these hashboards was that there were insect wings (looked like moth wings) blocking a few of the input side of the heatsinks. Could be what caused some chips to overheat enough for the heatsinks to slip.

Not sure about the wings, but the majority of the boards I had on these 17 series gears were clean as brand new, on the other hand, some boards were pretty dirty and were hashing without an issue, these miners are badly built and that's all about it, even if the heatsink was blocked from dirt, it should be able to handle the max-safety-temp and the miner would shut-down before temps are higher than what that paste/solder can handle, but that isn't the case (sadly).

Please keep us updated.
hero member
Activity: 544
Merit: 589
Started work on the 1st S17 from my host. 2 hashboards showing 0 chips. Turns out both of them have heatsinks that are misaligned enough to short to the adjacent row.



After moving this heatsink back into position, this board comes up and identifies all chips, but the test fixture is showing a couple of bad chips so it looks like I'll need to replace a few.

The 2nd board is still not coming up after moving heatsinks back into position. Looks like an issue with the pic microcontroller on the board not enabling power so I ordered a pic ISD (pickit 4) to investigate that further. Hoping that Bitmain did not set the read protection on the pic flash so I can read out the firmware to be able to program a replacement chip if necessary, but I'm thinking chances of that are close to 0. If not, it looks like zeusbtc sells a download for pic firmware for all Bitmain's miners.

Also, just another thing I noticed with both these hashboards was that there were insect wings (looked like moth wings) blocking a few of the input side of the heatsinks. Could be what caused some chips to overheat enough for the heatsinks to slip. The solder holding the heatsinks on is low-temperature solder, so it will melt at under 150deg C.
hero member
Activity: 544
Merit: 589
Depends a lot on how much you can get hashboards for. Just took a quick look on ebay and the only S17 hashboards I see are $390 each and come from China. Better off trying to find a used working S17. Kind of surprised there aren't more dead or partially working T17/S17s on ebay with all the reports of QC issues. Maybe they are all waiting their turn at Bitmain repair centers... I sent my S17 pro to the California center in May and I still don't have it back, although I got a notification that a replacement is on its way a few weeks ago.
sr. member
Activity: 604
Merit: 416
Has anyone experienced what a Bitmain repair costs?
If I add the verand, is it better to simply order 3 new hashboards?

I did, I've posted it in multiple threads by now. My S17 Pro was RMA-ed twice, once in warranty period and once outside of it. Hashboards were dying both time. It cost me 60$ for repair and around 300$ for shipping from Serbia to Netherlands.

Sometimes it's not just hashboards so it's hard to fix. But if you have good offer for hashboards, I'd try buying them.
member
Activity: 61
Merit: 29
Has anyone experienced what a Bitmain repair costs?
If I add the verand, is it better to simply order 3 new hashboards?
Pages:
Jump to: