Author

Topic: Hacking miners, general info on troubleshooting. (Read 1970 times)

legendary
Activity: 3164
Merit: 2258
I fix broken miners. And make holes in teeth :-)
December 09, 2019, 07:46:52 PM
#10
My it has been awhile.....

Things here have been busy, been fixing some miners from time to time, mostly Antminers. They are a pain to work on sometimes, most issues are either:

  • extremely simple (faulty fans, FETs shorting out)
  • moderately simple (burned power connectors)
  • annoying (blown chips, heatsinks falling off and burning chips)
  • Odd: (control chips failing on boards)

For awhile I was using an S9 board and logging into it to check status, but I finally broke down and bought one of these:



An Antminer tester. It has a USB cable that can hook up to a PC using Putty serial (which is nice) but to be honest 99% of the time the display will tell you what's up.



An Antminer S9 board under test, light means the SPI bus is communicating. Also nice you can do a quick test outside of the assembly and not have to wait forever for the board/system to initialize.



Board connected and ready to test.



Board under test

And of course one can cross-check it with the Power Meter to make sure it's pulling current.



Overall a pretty handy tool to have. I've got an S17 coming in for repair later this week, I think I'll start a separate thread with some pictures and documentation as there doesn't seem to be a lot of data out there on them....
legendary
Activity: 3164
Merit: 2258
I fix broken miners. And make holes in teeth :-)
So.... Antminers...

As the warranties expire on these things and they still break I figured it was time to do some analysis. To that end I'll open up a thread this weekend on hacking them, along the lines of my other threads (information is free, skills are paid for). But some basic thoughts:

1) There is a reason the S9's fail.
2) S7's are still banging along and worthy of repair

The basic design of an Antminer is surprisingly simple: Instead of powering each chip or die individually with a DC-DC, they put the dies in series strings and run them right off the 12v supply with a single boost-buck to control voltage. This is clever because the buck converter doesn't have to step the voltage down from 12 to .8 volts (requiring a high frequency shift on the +12 rail with a very sharp cut-off) instead it's more like 12v to 11v or so (which being only a 1v drop is 18 times more efficient than going to .8v or so). It does however mean that if a single die goes bad a string dies or more likely a zipper effect happens that takes out all 15 or so chips in the string, but that's another issue.

Trimming the power voltage can be done either by a resistive stepper (S7) or via a PIC changing frequencies (S9). Either was not that complicated.

Clocking is provided by the usual 25mhz clock crystal. So when an S7 or S9 fails the key places to look are either the power system or the clocking system. Now to find out which one of these fails in the cold, time to find the fridge!
legendary
Activity: 3164
Merit: 2258
I fix broken miners. And make holes in teeth :-)
Reserved for some more before and after pictures.
legendary
Activity: 3164
Merit: 2258
I fix broken miners. And make holes in teeth :-)
Reserved for other miners (Hashfast, BW, BFL, etc) as needed.

I do know that the BW series of miners can burn all of the PCIe plugs if the power supply goes bad. Replacing them is something I will be doing next weekend, if this happens to you let me know but always use a good power supply.

Good example of a BW miner with six burned plugs that came in for service:



All fixed and on the way out!

legendary
Activity: 3164
Merit: 2258
I fix broken miners. And make holes in teeth :-)
Antminers are interesting little devices, especially the R4's. I know people have had issues with boards not starting, here is one fix that works. Heat the incoming air to the miner with a hair dryer to warm the boards up, then apply power. Do this a few times in a row, the problem is the chips develop micro-fractures in the solder that close when warmed up. Once running never shut down of course.

A permanent fix could be done by reflowing the boards, I'd be willing to give it a try but don't do it in your toaster....

C
legendary
Activity: 3164
Merit: 2258
I fix broken miners. And make holes in teeth :-)
KNC Titan and Neptune miners can be pretty reliable units, but as shipped from the factory they have a few flaws you need to be aware of:

First: Watch the power draw: Running all four dies at 300mhz for Titans or 600mhz for Neptunes pushes the molex PCIe connectors to the limits. If the connector gets overloaded it will warm up, then heat up, then pins will start to delaminate from the board or go high resistance from the heat. When this happens the remaining pins take more heat, until either the cube shuts down or the grounds lift in which case all hell breaks loose, the ribbon cable becomes the ground, and the cube, controller, and several other cubes are destroyed. Burned plugs are fixable (see use lots of pre-heat), but dead-shorted cubes are not.

A second problem is a cube that shuts down the power supply. This is normally caused by blown power FETs on an internal DC-DC power supply. Trying to force more current in the cube can cause components to burn, which if the cube is full of dust bunnies will cause the nice fire being blown out the front by the very handy fan. This can also be fixed, but cleaning a burned cube is a mess.

Another issue is a dead controller, where the green light on the side of the controller won't come up. Sometimes this is due to a cube shorting, try powering everything off, disconnecting all cubes, and bringing up just the controller. If the light comes on, then the controller is good and one cube is bad. Turn everything off, then plug in one cube, then keep repeating until the bad cube is found. If it's the controller, they can be fixed.

Another issue is a dead controller where the lights on the Pi don't come up. This is caused by a Raspberry pi shorting. You have to either replace it with another older 1.2B Rpi (newer 1.2 B Pi's don't work with the Titan code) and fix the bridgeboard, or use a Neptune BeagleBoneBlack with the Lightfoot code (for a Titan) or 1.06 code (for a Neptune).

Finally follow general best practices: Keep the die temperatures below 45c for best result. Never plug or unplug any connector with power on (this can damage the drivers, fixable but a pain). Keep DC-DC temperatures below 80c or so, going much above 90 invites a FET cut-through and short. And every once in awhile put a finger on the PCIe plug, if it's warm then something is wrong. Warm plugs over time become hot plugs which melt and make a mess....

legendary
Activity: 3164
Merit: 2258
I fix broken miners. And make holes in teeth :-)
Avalon has two types of miners still in use these days, the A6 series and the 721/741 series. Here are some thoughts:

  • If the miner is running slow or not hashing at full speed, check the voltage at the power supplies. The 12 volt rail voltage is displayed in the interface, it should be around 12 volts. If it's 11.5, the miner will slow itself down to keep from burning the plugs. If it's 10.5 the miner will be a lot slower. Replace the supply or use two supplies (one per side)
  • With the miner disconnected, a flashing green light on the miner means idle, a flashing blue light means mining, and a flashing red light means a problem. No light is equally weird and could indicate a problem with the on-board MCU.
  • On the A6 miners at least, the 4 pin plug to the serial board needs to be in the left socket as viewed from the rear.

Another item to check is fan direction: The fan should always blow air *out* of the miner (air comes from the fan). This is because pulling a fluid with a pump is always easier than trying to *push* the fluid. In this case the fluid is air, and trying to push it just creates pockets of turbulence that reduces cooling ability. So always make sure the fan is set to pull air through the miner and out the fan front.
legendary
Activity: 3164
Merit: 2258
I fix broken miners. And make holes in teeth :-)
General information:

This is information that applies to pretty much any type of miner.

  • Keep the die temperatures on the hashing engines below 45c for best results. As dies get hotter they draw a lot more current as they hash faster, it's not a 1-1 relationship. Likewise as they get hotter the chances a die will short increase exponentionally. So run it a bit slower, and life will be longer. Run super fast and be sure to budget in repairs.
  • Never plug or unplug any connector with power on. This includes control cables as well as power cables. The reason is mining companies design products for minimal manufacturing costs (cheap), and sometimes things like drivers and static sensitive buffers are left out of the design. The voltage and signal spikes caused by unplugging or plugging can blow drivers, or FPGA controllers being used to control miners.

    Likewise unplugging or plugging a PCIe power connector can cause a spark at the points, putting a small burn into the plug. Over time that can cause increased resistance, which leads to the plug warming up, then heating up, then melting with the usual fireworks.
  • Try to keep DC-DC temperatures below 80c or so. Going much about 90c on a DC-DC will cause the current loss in the supply to go way up, which means heat which means greater chances of a FET shorting which results in the usual fireworks.
  • If you're using a really big power supply to power a bunch of miners, make sure to wire in a good fuse on each miner power supply line. The reason for this is as follows: A 4,000 watt power supply can put out 333.33+ amps of current into a direct short before the supply will crowbar (shut down). If you're feeding miners using 16 gauge wire (30 amps max) and the miner shorts internally you will have the force of a pair of clothes dryers focused in an area about an inch or so square. This *will* cause wires to melt, insulation to catch fire, and miners to burst into flames.
  • Make sure the power supply you buy is big enough and good enough quality to handle the miners you run on it. Don't spend thousands on a miner, then try to cheap out with a $30 power supply. A cheap power supply will cause all sorts of weird problems, from random crashing to melted connectors (due to the wiring being too small), to burned connectors to a miner running at lower speeds than expected. Check the voltage of the supply at the miner while it is running, if voltage is below 12v (11.5 is a bit low, 10 would be seriously bad) upgrade the supply. Everyone has their favorites, I like the Corsairs without the modular plugs.
  • Check to see if your power supply has multiple rails. If so don't plug two different rails into a single side/board/blade of a miner. Doing so can cause odd problems. Normally all PCIe plugs on a single card are connected together inside the board; you want to ensure that each board has its own rail powering it.

That's it for now, will add more later.
legendary
Activity: 3164
Merit: 2258
I fix broken miners. And make holes in teeth :-)
So far I have information completed on:

General problems and things I can fix.
Avalon series miners
Antminer R4 systems
KNC Neptunes and Titans
BW series miners.

If you want something else, let me know.
legendary
Activity: 3164
Merit: 2258
I fix broken miners. And make holes in teeth :-)
Ok, after answering the same question a bunch of times about Avalon things, NotFuzzyWarm pointed out something:

There really isn't a good thread for general information about miners.

Common information like why do they run, why do they shoot flames out the front from time to time, and of course why did the power plugs melt on them. Or how about tips on making them run well, what to do with heat (send it out a window....), and common simple troubleshooting techniques that can get people mining again?

Since I fix these things and figure stuff out the old fashioned way (probes and a general understanding) I thought I would use this thread to post some of my findings and let others contribute to general knowledge. It will take me a few days to get stuff written down, so be patient and check back from time to time. If it gets good enough maybe it can be locked to the top page.

Anyway, on with the show. I'll update this thread regularly with new information as I learn it, hopefully this will help people.
Jump to: