Hi Folks,
I got lucky in fixing a corrupted control board, and was able to bring it back to life using the serial console and uboot. Wanted to mention my story to open a discussion on debricking S9/T9. Also, if anyone wants to sell me a bricked S9 I'm interested lol
The symptoms were that upon power on, the green lights lit and the ethernet port showed connection (yellow) and data (green). Data was occasionally blinking. Really, the device looked like it was alive but didn't mine. It never did its earsplitting singleboardtest, never even requested an IP from my router, so I could not SSH in or access the web server. 5 second reset, and IP reporter reset didn't help either.
I decided to check out the terminal console. I disconnected the ribbon cables and pulled the control board out of the miner, then used a $2 USB to serial converter board to connect to the RX, TX, and GND headers on the board. [NOTE: Use 3.3v mode] I powered the board up using the PCIE-6 plug from a spare PC power supply. I got text!
blah blah lots of miscellaneous messages blah blah hex characters
Copying Linux from NAND flash to RAM...
NAND read: device 0 offset 0x1100000, size 0x800000
8388608 bytes read: OK
NAND read: device 0 offset 0x1020000, size 0x20000
131072 bytes read: OK
## Booting kernel from Legacy Image at 02000000 ...
Image Name: Linux-3.14.0-xilinx-gb190cb0-dir
Image Type: ARM Linux Kernel Image (uncompressed)
Data Size: 3779264 Bytes = 3.6 MiB
Load Address: 00008000
Entry Point: 00008000
Then a complaint about invalid kernel, and it dropped into the uboot console. Boo! So the thing wasn't even starting to boot, but since I had terminal access from the serial port I was hopeful. It seems that in a borked upgrade (at least this case) the kernel image is what goes bad. We will learn more over time, I'm sure.
Here's what I did to address it
1) Studied the shit out of all the different bootup types that can be seen using the "printenv" and "?" commands in uboot. Bitmain must have 8-10 types of boot scripts in the environment variables, accessed by the run command. "run qspiboot", "run nandboot", etc. Lots of interesting reading, but they all do similar things - load stuff to ram and execute. Hey, I'm capable of doing that! Let me jump in the mix!
2) Formatted an SD card to FAT32, and unpacked the "xilinx" folder from a valid firmware image onto it. This gives me BOOT.bin, devicetree.tdb, rootfs.jffs2, uImage, and upgrade-marker.bin. Everything needed for a zero-state boot except uramdisk.image.gz.
SO IF ANYONE HAS EXTRACTED THIS, PLEASE LET ME KNOW. I'd like to make an unbricking toolkit.
NOTE: for S9 without an SD card, you're going to have to TFTP your files. I don't think this is a big issue though, just download a TFTP server and drop the files in it's root dir. Should work the same.
3) Stumbled my way through the SD card commands: mmc rescan, mmc list, mmc part to find out that windows format tool had called the FAT32 partition 5, so thats where my "xilinx" files were. Why 5? No clue, but you need to know where they are so you can tell uboot where to load from.
4) Hacked an mmc load of the kernel from SD partition 5 into the memory location uboot was expecting: Note that I am using 0:5 to denote mmc 0, partition 5. ${kernel_image} is a variable defined in the environment that stores the filename uImage. You can simply type uImage into the command below, it will work fine.
fatload mmc 0:5 0x2000000 ${kernel_image}
5) Booted the little bastard using bootp. Here I got lucky that the ramdisk and other parts of linux were not corrupt, just the kernel image. Worse corruptions may need more surgery, hence the need for a valid ramdisk image. But in my case, injecting a new kernel was all it needed.
6) Once it was up, connected via the web interface and flashed a new firmware image as fast as I could. Woo! The miner is back! After that, it was cured and the board went back into the miner.
All in all, very simple. But having a $2 USB to serial adapter on hand saved me the cost of a new control board, as well as the hassle and delay of shipping, waiting, dealing with bitmain lol. Well worth it's weight in gold this evening, for sure. Amazon or ebay, mine was called "FT232 Mini USB to Serial". About two inches long, mini USB on one side, 6 pins on the other side for RX, TX, GND and other signals I never use.
Anyway - if anyone else has a bricked control board you want me to hack on, just PM me. I'd like to see some other examples of how boards brick, because developing a generic "unbricking tool" would be really nice to have.