Pages:
Author

Topic: Hacking KNC Titan / Jupiter / Neptune miners back to life. Why not? - page 17. (Read 76765 times)

legendary
Activity: 2450
Merit: 1002
Is this the stuff from whats his face that was selling the firmware for 10btc? or did you dev your own?
Dear God No. This is me pounding my own head against the KNC wall. Besides I would rather blow up beaglebones on controllers trying to figure out some of these failures than the now limited supply of Raspberry Pi's.

Note: Everything I do here is my own work or is based on publicly accessible documents. I have no encumbrances or NDA's in this venture and I would prefer to keep it that way.

C

Cool beans =) was just curious.
legendary
Activity: 3164
Merit: 2258
I fix broken miners. And make holes in teeth :-)
Is this the stuff from whats his face that was selling the firmware for 10btc? or did you dev your own?
Dear God No. This is me pounding my own head against the KNC wall. Besides I would rather blow up beaglebones on controllers trying to figure out some of these failures than the now limited supply of Raspberry Pi's.

Note: Everything I do here is my own work or is based on publicly accessible documents. I have no encumbrances or NDA's in this venture and I would prefer to keep it that way.

C
legendary
Activity: 2450
Merit: 1002
Update: I have the code working for a Neptune 100% on a beaglebone (yeah, big deal but it's a bitch to compile and get working) however Titan is still down. It's somewhere in the titan.c files in Bfgminer. But anyway in the meantime know that the newer version of fpga code on the Rpis can work with Neptune boards, so it's an enhancement to the Neptune code and not a replacement.

BBB's compile so much faster than Rpi's. I hate Rpi's.

Is this the stuff from whats his face that was selling the firmware for 10btc? or did you dev your own?
legendary
Activity: 3164
Merit: 2258
I fix broken miners. And make holes in teeth :-)
Update: I have the code working for a Neptune 100% on a beaglebone (yeah, big deal but it's a bitch to compile and get working) however Titan is still down. It's somewhere in the titan.c files in Bfgminer. But anyway in the meantime know that the newer version of fpga code on the Rpis can work with Neptune boards, so it's an enhancement to the Neptune code and not a replacement.

BBB's compile so much faster than Rpi's. I hate Rpi's.
legendary
Activity: 3164
Merit: 2258
I fix broken miners. And make holes in teeth :-)
Someone here made a replacement, but the gen two versions have a few issues. You can fix the Gen 1 versions by replacing the Pi and powering it from USB power if you really want. Or send it to me to fix.

C
sr. member
Activity: 387
Merit: 254
legendary
Activity: 3164
Merit: 2258
I fix broken miners. And make holes in teeth :-)
Yeah, I know. And I got two of them working on all dies (ok, one die out of 8 is a bit wonky but that's fucking amazing!) The trick is if it has only one die hashing it's possible the other dies have cold-soldered themselves to the point where they don't work. It's possible to reflow but be warned: IT IS VERY EASY TO FUCK IT UP AND WIND UP WITH A DEAD UNIT. Also the line drivers were damaged on one board.

Speaking of which: I finally figured out what line drivers/buffers these KNC people used: It's this:

https://www.digikey.com/product-detail/en/fairchild-semiconductor/NC7WZ17P6X/NC7WZ17P6XCT-ND/

They are super small, and super critical: If you plug in your 10 pin header so that it goes over pins 13579 with 246810 then you will probably blow up a few of these, your cube will appear, but will never hash. It will also blow out the FPGA on your controller, so a twofer failure.

So I have now expanded what I can fix:

If your Titan has one die, I can try to get it to 3-4 dies
If your Titan comes up with no hashing but can be seen in KNCminer, I can probably fix it
If your Titan has a burned power connector I can fix it
If your Titan totally burned the board under the power connector then sigh, I can fix it.
If your controller is dead, I can fix it
If your bridgeboard is dead, I can fix it but you have to replace your Rpi too.
If your Rpi is dead, just get a new one and power it via the USB port. Lazy but works.

And as for the crystals, it's probably one of these:

https://www.digikey.com/product-detail/en/txc-corporation/7B-24.576MBBK-T/887-1898-1-ND/

But I need to measure it with the digital calipers. Note though the cross-grounds that's part of the problem with the QB BB.

C
copper member
Activity: 2898
Merit: 1465
Clueless!
Meantime back to work. Some things I have been working on:

The qberty boards. Something is fucked on them, and I'm looking into it.

Got a Titan that was hashing on one die, problem is another die is mumbling garbage that shuts down the rest. Working to isolate.

Got another Titan with the normal 4-6 short. However I did find something: Pins 4 and 6 both are power to SCI type channels. Normally they are isolated, but on many cubes they share a common rail at the junction between U9 and U10. Disconnecting this clears the fault on pin 4 but not 6. However this does clear a lot:

Pin 4 powers the SCL lines for the DC-DC's, the EEPROM, LM75, and one of the level shifters. So it's not that.

Pin 6 powers U17,18,19 (the clock circuitry), and the 8 lines under each die. Which can't be isolated and sucks.

Pin 7 is a Sdi clock channel for the DC-DC's and house stuff
Pin 2 is the SDA channel for the DC-DC's and house stuff

Pin 8 is another interesting one: It goes to each of the corners of the chip, with isolation. I think that's either the SDI clock or signal for the chips themselves. This is how you can bypass a mumbler.

So if Pin 6 is what is blowing up, then it is either the clocks or the chip die (in which case we're fucked). Hm.


ACK! Those are my cubes. ACK!

(It is like public surgery on the walking dead TV show....)

ACK! The horror!

legendary
Activity: 3164
Merit: 2258
I fix broken miners. And make holes in teeth :-)
Meantime back to work. Some things I have been working on:

The qberty boards. Something is fucked on them, and I'm looking into it.

Got a Titan that was hashing on one die, problem is another die is mumbling garbage that shuts down the rest. Working to isolate.

Got another Titan with the normal 4-6 short. However I did find something: Pins 4 and 6 both are power to SCI type channels. Normally they are isolated, but on many cubes they share a common rail at the junction between U9 and U10. Disconnecting this clears the fault on pin 4 but not 6. However this does clear a lot:

Pin 4 powers the SCL lines for the DC-DC's, the EEPROM, LM75, and one of the level shifters. So it's not that.

Pin 6 powers U17,18,19 (the clock circuitry), and the 8 lines under each die. Which can't be isolated and sucks.

Pin 7 is a Sdi clock channel for the DC-DC's and house stuff
Pin 2 is the SDA channel for the DC-DC's and house stuff

Pin 8 is another interesting one: It goes to each of the corners of the chip, with isolation. I think that's either the SDI clock or signal for the chips themselves. This is how you can bypass a mumbler.

So if Pin 6 is what is blowing up, then it is either the clocks or the chip die (in which case we're fucked). Hm.
hero member
Activity: 808
Merit: 502
I like to run two cubes per PSU it is more energy efficient. But the short issue is a concern. My units are on a fire proof rack. Even if they smoke there is nothing else to burn. I also temperature monitor the room and have video monitor as well.
legendary
Activity: 3164
Merit: 2258
I fix broken miners. And make holes in teeth :-)
I use and like the Corsairs. 750 is the biggest I will use, if I am using bigger on a set of systems then IMO a short can cause a fireball before a shutdown. For normal units I'll go with Corsair 500's per each system.

Only Corsiar failure I have had was when the shed door blew open in a hurricane and the PS was heavily rained on. The chilli attached to it was also dead, but cleaning it up got it running again.
hero member
Activity: 808
Merit: 502
Oh man am I glad to hear you say corsair are working good for you. That is what I switched too. I think they will last longer if you don't run them at full capacity. I try to run a 1000 watt at no more than 600 watts. This will hopefully allow them to last longer. I wonder if there is some sort of heavy duty industrial grade power supply that would work better. I suspect the component that causes failure most often is capacitors that go bad.
sr. member
Activity: 342
Merit: 250
yeah most of my seasonics(1000w platinum) failed within 2 weeks of each other, it was pretty hectic -- and you're right they usually fail after sitting for awhile, when they get powered back up -- I think all mine failed that way, sometimes even after an extended internet failure -- they'll stop hashing, cool down, I-net comes back up, they start hashing again and then they'll fail

So far the 1000w corsair 80+ gold have been great for half the price and if you look at the specs 80+ gold means about 91% efficient @ 240v so not far behind the platinums
hero member
Activity: 808
Merit: 502
I can tell you evga is no good either I had 4 die on the same day. I came back from vacation after they were off for 5 days and they all smoked when I fired them up. It seems most of these Atx  supplies have trouble going more than a year. They are really not designed to work full on 24/7. What is the best Atx supply? any recommendations?
sr. member
Activity: 342
Merit: 250
good job, I hate those thermaltake psu's (seasonics too)
hero member
Activity: 808
Merit: 502
I reapplied paste 3 times, twice with Gelid and then I used another brand. It is not the paste. Still not hashing right anymore. The heat sync is down tight. It runs for a minute or so then the power to the dies cuts off drops to zero. If I run it at real low frequency the cube seems to hash okay. I don't see any indication of over heating. I am not sure what is wrong at this point.

Epilog:

Good news I believe I have found the problem. I switched out the power supply and the cube started hashing at full power. The Tekamake power supply could no longer supply power to the cube. Good thing I have some power supplies being shipped. This is good news as the cube is still good!
This will most likely fix the other cube as well as they were on the same power supply at one time. Hurray! Thanks for the input.
sr. member
Activity: 342
Merit: 250
sounds like your heat sink isn't seated properly. redo the repaste, this time double check the alignment pins stay in the holes as you tighten it down
hero member
Activity: 808
Merit: 502
Still having an interesting problem with 2 of my cubes. They run for a couple minutes and then shut down. Temperatures are low according to KNC advanced page but I recall Gen Tarkin saying these temps are not really the die temps. So I am thinking maybe the dies are over heating. Both of these cubes started acting up after I reapplied some Gelid heat sync compound from a new jar. Both cubes had the compound from the new jar of paste. I noticed that the consistency of the new paste seemed much more fluid than normal. Other than that it looked the same. Since I applied this compound these cubes will not run for more than a couple of minutes. I tried lowering the frequency to 50 MHz and now the cube runs flawlessly. Can anyone recommend a good heat sync compound for the titans as I now don't trust this brand. I will continue my experiments to confirm it is the heat sync compound.  
legendary
Activity: 3164
Merit: 2258
I fix broken miners. And make holes in teeth :-)
Neptunes have never done this. I have a bunch of boards and have tried to replicate this failure, no luck so far...
legendary
Activity: 1526
Merit: 1000
the grandpa of cryptos
i have to check if my neptune is still working Smiley
Pages:
Jump to: