Did Toomim Brothers already give a timeline when first, or the complete, results will be available?
We essentially completed our investigation of the damaged units a week ago, but now we're busy trying to repair them (and the PSUs that ASICSPACE included with them) and to install another 100 S5s, so we haven't had much time to comment here. We kept CryptoCoin2015 updated about the progress of our investigation as we performed it. Here is some information:
Electrical problemsAbout 40% of the hashboards are damaged and defective, and do not hash at all. My initial count was 89 damaged hashboards out of 200, but I think that was a slight overestimate because I made that count on a weekend and misinterpreted a pile that one of my employees had made. I'll have to make another count soon, but the total is probably closer to 70 damaged hashboards. There may be another half-dozen hashboards that still hash with some dead ASICs. We did not observe any otherwise dead hashboards working if we reduced the clockspeed.
About 30% of the damaged hashboards had visibly burned or popped SMT G337 2V capacitors. This means about 12% of all of the hashboards. Many had several capacitors damaged on a single board. We didn't count all of these; this is just an estimate based on a small random sample.
The resistance across the 12V and Gnd pins for a disconnected hashboard should be around 56 to 58 ohms. This translates to about 4 ohms per stage. (I think it's actually around 4.5 ohms per stage with a parallel resistance of ~330 ohms from 12V straight to Gnd, but that doesn't matter much for this.) All of the working hashboards were in this 56 to 58 ohm range. All of the broken hashboards had lower resistances than this. About 30% had resistances close to 53 ohms, suggesting that they had one bad stage. The rest had resistances below 50 ohms, and about half of them were 30 or below. This suggests several bad stages is typical for the failed hashboards.
We measured the voltage across each stage for a few different hashboards. Here is one such:
15-14: 0.990
14-13: 0.953
13-12:
0.05512-11: 1.023
11-10: 0.999
10-09:
0.78009-08:
0.36008-07: 0.994
07-06: 0.948
06-05: 1.054
05-04:
0.47704-03: 1.017
03-02: 0.995
02-01: 1.061
01-gn:
0.478All of the working stages have nearly equal voltages. On this hashboard, those voltages are around 0.980 V +/- 0.070 V. On a good hashboard, they are around 0.800 V, +/- 0.070 V. The voltage abnormalities indicate failed stages. I don't have the resistance measurements recorded, but they were similar, with powered-on voltage per stage corresponding closely with powered-off measured resistance. Several stages were measured at 0.8 ohms, which is the same resistance I measure when I short the leads of the multimeter together. (Multimeters typically are not good at measuring very small resistances, due to the additional resistance of the test probes and leads.)
All of the visibly damaged capacitors were on stages with abnormal stage voltage and stage resistance. Not all stages with abnormal voltage or resistance had visibly damaged capacitors. Removing and replacing damaged capacitors did not fix the abnormal voltage or resistance. (I did not test to see if they hashed after replacing the caps; I just assumed they didn't.) I also tested three severely damaged capacitors after removing them, and despite the crazy physical damage, they still showed the correct electrical characteristics (320 to 370 µF capacitance, DC resistance > 10 Mohm). This suggests that the functionally important damage was to a component other than the capacitors. The most likely candidate is the ASICs themselves.
The G337 2V capacitors are probably aluminum polymer capacitors. Those are typically rated for 105°C operating temperatures. This is lower than the 125°C silicon operating temperature for most mining ASICs and most other discrete power components. I think it's possible that a cooling failure could have caused the capacitors to explode before or at the same time as damage to the ASICs occurred, even though the capacitors were not themselves generating any heat.
We overnighted four bad hashboards to Bitmain Warranty in Denver, CO. I haven't heard back from them about those specifically, but my guess is that they discarded them as unrepairable.
Plastic shield deformationsMost of the S5s we received from ASICSPACE have some deformation of the plastic air guide shield panels. The severity varies widely. There seems to be a correlation between the deformation of the plastic shields and whether the hashboard is broken. There may also be a correlation between deformation and presence of damaged capacitors. I haven't made a close study of that, though. Mostly, all I know is that the shields we took off when investigating broken hashboards tend to be worse than the shields that we left on.
It appears that the worst point on the plastic shields is on the "tail" end (exhaust side) of the miners. The deformation is typically greatest (i.e. shortest radius of curvature) in between the screws. The top edge (where the PCIE connectors and control board are) is also heavily deformed on many machines. In all cases, the deformation causes the shields to bend away from the case of the Ant. In a few cases, the plastic has a wavy appearance in between the screw holes up to approximately 1 cm in from the edge, suggesting that the plastic there had been stretched or elongated and had caused the surface to take a non-linear path between the two fixed screw points. None of the S5s that have been in the Toom.im facility since January show any hints of deformation like this. The only machines like this that we've seen were in ASICSPACE during early Aprli.
Most of the shields have no scuff marks or indications of physical trauma. The deformations are smooth, with no creases. I do not think these deformations were caused by contact with a solid object. Deformation due to air pressure would explain these deformations in every case I've looked at, as long as the plastic were soft enough. I'll describe how I think that happened after I mention some of what I've seen about the environment in which they were operating.
First-person observationsWhen I visited to pick up the S5s, ASICSPACE's cold aisle was negatively pressurized relative to the outside air. The hot aisle was positively pressurized relative both to outside and to the cold aisle. As a result, the cold aisle was also by no means "cold". Near the cold air supply ducts, the air felt like it was about 15°C. In most of the rest of the cold aisle, it felt like it was 35°C. When I first arrived, I walked past a gap in the cold aisle containment where an Antminer S4 had previously been. The velocity and volume of airflow through that hole was comparable to the airflow coming out of their cold air supply duct outlet, except a lot hotter. I think it felt like about 500 to 1000 CFM through that hole. As we removed S5s, and more holes appeared in their cold air containment, the velocity through each hole decreased substantially. ASICSPACE had noticed this pressure difference, and as a way of mitigating it had set up 9 air ducts (approx. 0.5 m in diameter each) going from their garage door to their cold aisles to serve as supplementary air intake. Note that in proper cold aisle containment design, the cold aisle should be positively pressurized, so ducts like this would normally let air out (and thus normally would not be used). I also spoke with Robert about this, and he was (fortunately) aware of this problem. The ducts looked like they had been hastily added, likely within the previous week. While I was there, their network was experiencing severe problems, causing a large proportion of their machines to stop hashing (and thus not produce heat). It was also not a very hot day, and it was evening when I arrived, making it about 11°C cooler than the daytime high the day before. I can only imagine what their facility was like during the weeks prior.
I did not see any significant exhaust fans installed at ASICSPACE to remove hot air. There were some small fans mounted above the hot aisle, but they were unducted, small, and not very numerous, so I guess they were not significant. There may or may not have been some exhaust fans on the roof of ASICSPACE. From what I've heard through other channels, they chose not to install exhaust fans, and were instead relying on the stack effect to move heat up through their tall (30m?) building and exhaust through natural convection at the top. Unfortunately, the stack effect relies on the interior of the building being hotter than the outside temperature, so when the outside temperatures increase to 30°C, their interior temps would have to rise too. Since they're pre-cooling with evaporative coolers, this means that the air inside their building would have to recirculate several times until it was enough hotter than the outside air for the stack effect to blow it out. However, if they have 300,000 to 800,000 cfm of exhaust fans that I didn't see, this paragraph is irrelevant. (Edit 5/6/2015: We visited ASICSPACE again today, and Robert showed me their exhaust system. They do not have exhaust fans, and are relying on the stack effect plus (in principle) positive pressure from their intake fans.)
The containment sealing system (which looked pretty tight, to be honest -- kudos to ASICSPACE for that) sealed off the exhaust end of each miner, with a gap between the case and the edge of the sealing panel on the order of 1 or 2 cm.
The S5s were installed next to pairs of S4s along both sides of the cold aisle on each of the shelves. This indicates that the S4s and S5s were competing with each other for airflow. The S4 is a sealed tunnel with 4 fans arranged in a push-pull configuration -- i.e., 2 in parallel by 2 in series. Placing fans in series multiplies the amount of static pressure they can produce, and also allows them to maintain their airflow quite well when working against a significant positive pressure gradient, but does very little for the airflow when working in a neutral-pressure regime. This means that the S4 fans were able to to pressurize the hot aisle quite effectively. The S5, on the other hand, is an open semi-tunnel configuration with a single fan in push configuration. The semi-tunnel has large gaps near the exhaust end through which air can escape out the top, as well as small gaps on the bottom.
The power supplies which ASICSPACE had obtained for these S5s were the DPS-800GB using the Gigampz breakout boards. These boards are
miswired to connect pin 30 (voltage adjust pin) to the 12V rail, which increases their output voltage to around 12.80 V (no load). Power consumption for the S5s' BM1384 (as with most ASICs) is proportional to frequency times voltage squared, so the 6.6% higher voltage should result in 13.8% more power consumed and heat generated.
What I think happenedAs the single S5 fans were unable to compete with the doubled S4 fans in terms of pressure output, the airflow from the S5 fans would instead curve out through the gaps in the S5 pseudo-tunnel. Meanwhile, the positive pressure from the hot aisle may have been strong enough to cause airflow to go in reverse from the hot aisle into the "exhaust" port of the S5, and then out the top of the miner. This retrograde flow was likely strongest in the side areas, in between the hashboards and the plastic shields. These two effects caused the interior and side spaces of the S5s to get very hot and positively pressurized. The heat caused the plastic sides to become soft and "plastic" (in the non-elastic sense of the word). The positive pressure then stretched and deformed the heat shields away from the mounting screws. The deformation caused a small gap to appear at the tail end of the shields in between the two screws. This gap allowed for a very short path -- approximately 3 cm -- for the air to go from the hot aisle to the cold aisle, by entering the tail of the miner through the side slots, making a roughly 150° turn to curve around the steel frame (and partially bouncing off the plastic shield), and then exiting backwards/sideways through the tail/side gaps in between the screws. This air was the closest to the hot aisle, and thus the hottest and highest pressure, and consequently caused the greatest deformation, enlarging these gaps considerably over time. Additional airflow that came in through the side slots would have passed out the top of the miner. Though this airflow was greater, the size of the gaps on the top were much larger, causing lower pressure differences, which made the radius of curvature of the deformation in that area smaller.
The intersection of the anterograde and retrograde airflows near the center of the miner caused airflow there to be relatively stagnant. This, combined with the higher temperature of the retrograde flow, would have caused very high temperatures about 3 to 10 cm away from the miner's exhaust port.
The Antminer S5 has a bug (linked elsewhere on this page) in which the miner's fans will stop immediately when cgminer dies (or when the network is disconnected), but the hashboards will continue to generate heat for several minutes. Given that ASICSPACE was having frequent and persistent networking problems during this time, I expect that to be a contributing factor in this case. However, I think that is at most a contributing factor, since many people have reported that bug in the absence of heat damage, and I haven't seen any other reports attributing actual heat damage to that bug.
Based on what I saw when I visited and what I know had been changed recently at ASICSPACE, I estimate cold aisle temps were likely 20°C hotter during early/mid April at ASICSPACE, and likely reached about 55°C. I have also heard people allege to have seen 57°C intake temps on their miners at ASICSPACE at one point (I think that was March), although I can't mention the source for that right now so you should treat it as a rumor. I have also seen screenshot evidence from a KNCMiner device indicating temperatures in the same ballpark. With mostly S4s and a few SP35s, plus with the fans working against positive pressure and consequently having reduced forward airflow, their delta-T might have been around 10°C, or possibly a little higher. I thus estimate their hot aisle temps at around 65°C. The air passing from the hot aisle through the S5 heatsinks would have gotten even hotter before hitting the side panels; perhaps 15°C hotter. Many plastics have glass transition temperatures around 80 to 120°C. ABS, for example, is 105°C. I thus think it is plausible that the plastic side panels were heated close enough to their glass transition temperature that air pressure differences caused them to permanently and plastically deform. Having a 55°C cold aisle with strong positive pressure in the hot aisle would also explain why the ASICs and capacitors in the S5s would have burned out in such large numbers, especially if the 80°C protection was bypassed either due to a Bitmain bug or due to an attempt to get the S5s to hash despite the poor working conditions.
Note: As the head of Toomim Bros, I am a competitor of ASICSPACE, and clearly have a conflict of interest in this case. Apply salt liberally before hashing.