Pages:
Author

Topic: Hacking KNC Titan / Jupiter / Neptune miners back to life. Why not? - page 42. (Read 76860 times)

legendary
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
Been a productive week, turns out the 600gh Neptune I fixed continues to hash at 600gh, the controller boards continue to work fine, and all the solder is off the wrecked neptune board.

I did promise a picture of the board showing the shorts, in this picture look for the little bridges in the middle of the board, typically 3 balls wide, right where the burned part is. That is the proof that the boards are overheating under the chips, melting the solder, and shorting the power lines to the spi bus which is why the board goes into the drink.



Now to find a reballing stencil that will fit this thing. It's too big for the normal 90*90mm reflow table, so I will have to either cobble together a custom stencil or do it in quarters. That will be a *lot* of fun... Fortunately there is a lot of redundancy in the balls on this chip, so missing one or two won't sink the whole project. For the record it's .6mm balls, 1.0mm spacing.

Next up, Titan work!
legendary
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
Happy MacArthur Day! Had some free time so I spent it melting off one of the chips on a board that was a complete failure.

By "complete failure" I mean the board failed every attempt to talk to a controller. Shorts out the controller board basically, different values on pin 1 of the 10 pin adapter. Dead short in one of the DC-DC converters, black burn spot on the back.

Pulling the power supplies and caps did not clear the short so I pulled the chip. Note that these chips are *stupid big*. As in 380 degree bottom heat for 30 minutes. As in 470c top heat all over the chip before it finally came loose. As in lift off chip and have it fall back on the board because my picker doesn't have enough suction.

Oh well.

However I lifted the chip to preserve the top right side (the shorted one) and I can see the problem: The chip underside got so hot at some point it reflowed the solder. Resulting in the chip literally shorting itself. In this case I'm guessing one of those shorts was to the SPI control line which would render the board fucked.

Interesting. So much heat it literally melted the solder under it. No wonder why the board was charred...

Fixing this is going to be a serious bitch on wheels. Hm.
legendary
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
Well it may be harder to overvolt Titans. Scrypt has all sorts of memory crap as well as CPUs so just piling it on with more power might not merit the extra performance like a simple sha die.

In other news I am noticing that the FPGAs do get kind of warm when the board is sitting on a carpet. I could see those caps failing if they were not vented properly.
legendary
Activity: 1596
Merit: 1000
Fast update: Might have a Titan controller and dead units coming in, we'll see. If so I'll post pictures and updates on those, will be interesting to see what they look like. ...


Judging by the pictures you've taken of neptunes, you'd be appalled at how similar every thing is with essentially the same exact placements and design (except for the low binned' chip).

Yep, pretty much the same. Power supplies are 40A on Titans instead of 50A like Neptunes. Stupid KFC... Roll Eyes

Loving the progress, lightfoot. Keep it up. Smiley
sr. member
Activity: 405
Merit: 250
Fast update: Might have a Titan controller and dead units coming in, we'll see. If so I'll post pictures and updates on those, will be interesting to see what they look like. ...


Judging by the pictures you've taken of neptunes, you'd be appalled at how similar every thing is with essentially the same exact placements and design (except for the low binned' chip).
legendary
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
Fast update: Might have a Titan controller and dead units coming in, we'll see. If so I'll post pictures and updates on those, will be interesting to see what they look like.

Also put on the FPGA on the first board and now I have two running Neptune controllers. The secret to reflowing 100% pin BGA's is heat under the board, zephlux, and a really good watch loupe to verify the balls are on the pads.

Next up in the meantime is a weird Neptune board: This one has a short on one of the low voltage sides, but does not respond at all to a controller. It looks like it's hanging up the FPGA side of the SPI interface, might be shorted. So I'm going to pull the two bad power supplies and see if that clears it.
legendary
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
And a few pictures. First, one with the Neptune power supplies removed.



And the water cooled Neptune, actually works great and allows the power supplies to breathe better...


Working on some other things for a bit, but still a lot of fun to be poking around in all of this.
legendary
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
Note: Board 1 fully operational. Power supply #2 runs a bit hotter than the others, that's probably why it blew the caps apart. New caps on, speed is 620gh at 270 watts with three dies running at full power (450) and the slower die at 350. Peak dc temp 78c, chip temp 45c.

So on to the supplies. Pictures in a little bit.

Ok, pulling the supplies sucks. I might get some cheat sauce if I do more of them. That said:

It's best to heat the board at full temps. Need to get the board over 150c to have a shot.

Flux the tops of the power supply pillars

Apply heat from the *top* of the power board.

Go 400c with the heat, moderate flow

Lift the supply lines first, then the back

Be prepared to lift off straight. Otherwise things on the back of the board will go flying.

That said, with both supplies off I still see a short. Which means either it's a trimmer cap on the board or the CPU is shorted. But there are 30 caps, you can't check them all...

However you *can* use an old trick of powering the short and looking for things that are warm. Warm things are probably the source of the short. It's usually only a few degrees C, but you can pick it up with an IR temp tool. Which I have.

However you can't apply a voltage significantly about the max voltage the component can take, otherwise if the part blows open everything else will fail big time. So I need a 1 volt supply at a thousand or so amps peak.

Fortunately I know exactly where to find such a voltage. Next week will be using explosive power to troubleshoot technology.
legendary
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
Ok, boards. They need more heat, need to be careful I don't accidentally re-flow the hashing chips, so I will put tinfoil over the underside of the chip. Kind of like how you keep your turkey from melting.

So anyway, board #1 in the screwed up world. Plugging it in with a Corsair heat sink on top (water cooling is so cool!) gave me a unit that came up but would barely hash. Checking the voltages showed a few things:

1) The layout of the power supplies is not easily apparent. It's actually:

2          0
3          1
4  5  6   7  
(Maybe, first shot at mapping them)
According to the code, power supply 3 was reading no current, no temp and power supply 2 was reading almost no voltage, 1 amp, and high temps (70c). Unusual. Sure enough the 1v rail normally will have a 30 ohm resistance cold, these two had a zero. Since one was trying to come up, I think the failure is in the other one shorting to ground and locking out the first one.

Great.

Now to figure out where it's shorting. These little power supplies are kind of cute: They are self-contained, can do all sorts of cool stuff, and have a pair of high side FETs on top and three low side FETs on the bottom. Low side carries a lot of current but has very low R(ds). High side is the opposite and they usually get hot as hell. So design makes sense.

On the low side rail short there are two places where it can happen:
1) Caps short
2) FETs on bottom side short.

Now to do some melting and testing...

legendary
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
On to the next problem, dead boards.

Taking a look at a Neptune hashing board I can see they have a big chip in the center which appears to be 4 separate dies in one package. Makes sense, as the design of this board is an implosion type, with 8 DC-DC power supplies around the chip and every two supplies power one die/side of the board. That way you don't have to schlep all the power from one point across the board. Wish everyone did it this way, oh well.

Anyway, board #1 has a nice brown discoloration under 1/4 of the die and sure enough that's where the problem is: The 1 volt line to the chips is shorted there, 20 or so ohms on the other three. My guess is a blown something, now to find what....

C
legendary
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
Ok, so let's see. First, here's a map of the KNC board with the locations of the main caps.



Next a picture of the running board with a reflowed TPS chip (check it out) and some of the caps removed.



Next we have what powers most of my work around here.



And finally the equal to the above pic in the re-work world. Seriously, a good preheater is the difference between using a blowtorch or a crem brule torch to warm your coffee. By bringing the board up to 200c or so you can quickly remove and reflow components with just a touch of hot air heat.



legendary
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
Fantastic progress, lightfoot. I just bought a cheap SMD rework station partly because of this thread. Hoping to fix a Titan cube and a few Neptune cubes with this info.
Thanks! Go for it, this is how we all learn and get better and stuff. I should do another talk at Defcon or something about this, need a good talk title (how to figure shit out when it's on fire? Hm...)

Quote
When you say you pulled the caps, I assume it goes without saying that you replaced them with new ones. Did you use caps with the same values as stock? I'm wondering what I need to order before attempting any of this. Sorry if this is a dumb question, I've never done this type of work before. If you ever feel the urge to post pictures showing what you replaced, don't hold yourself back.  Wink Tongue
Well, sort of. On the hashing boards I will replace the filtering caps because they serve the purpose of both stabilizing the power input which is being whacked around by the DC-DC's, and because they can help in making the supply more efficient (power factor stuff, really interesting reads out there on that).

For the controller board, the caps are important, but a bit less so. You put them on the inputs for a similar reason but since the FPGA is only pulling .5a at 3.2000 volts on the input that's only 1.5 watts and only .001a on the 1.2 volt lines). So the exact values are a bit less critical and if you leave them off for testing purposes the world will not come to an end. So you can play fast and loose on these caps in the short term without destroying too much stuff. I left a few on (the most important one is the one next to the TPS65217 chip because that's where the DC-DC conversion and the chokes are) and for the rest I'll put them back on "later".

Finding out the values when the manufacturer doesn't give out schematics (boo!) is a bit complicated, but a $49 or so good Radio Schlock meter with the capacitance testing function is pretty good for getting close.

Now, if we were talking about caps in a RC circuit (for timing, checking waveform ripple across an inductor, or as part of a current sensing detector for an overloaded FET) then the values are more critical. But on the low power stuff on a Neptune board (aside from the fact that there is probably something like that around the TPS chip's regulator points) this is once again not too much of a problem.

Sure, I'll post pics of this reflow repair, and the times it took with heat to flow the components. Will grab the cam....
legendary
Activity: 1596
Merit: 1000
Fantastic progress, lightfoot. I just bought a cheap SMD rework station partly because of this thread. Hoping to fix a Titan cube and a few Neptune cubes with this info.

Now for the n00b questions:

When you say you pulled the caps, I assume it goes without saying that you replaced them with new ones. Did you use caps with the same values as stock? I'm wondering what I need to order before attempting any of this. Sorry if this is a dumb question, I've never done this type of work before. If you ever feel the urge to post pictures showing what you replaced, don't hold yourself back.  Wink Tongue
legendary
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
Given that KNC seems to be a bit... repetitive... in how they build things (not a bad thing actually), it might be possible to repair a Titan with downed engines. I'm watching this thing and I can see that some power supplies may be in the perumba of the airflow stream, couple that with people's desire to run these things like hell-beasts and you could blow out some of the FETs.

Maybe I should re-title this KNC Neptune and Titan and Jupiter miners....

copper member
Activity: 2898
Merit: 1465
Clueless!
Pulled the bad caps on the other board, and we're up and hashing. So I now know how to fix a blown Neptune board, and probably by extension Triton board and later Jupiter boards.

Mining companies really should hire someone to troubleshoot and fix this stuff. Now on to the next part, fixing the hashing boards...

C


on the knc swedish miner thread...what is being discussed is if there is ANY differnace on the 6 port boards on either the titan or the neptune (and supposedly by extension at least the jupiter nov boards and just maybe the oct  juptiter/saturn/mercury boards as well.

A guy made a 10 lot run of the bridge adapter daughter boards from the raspberry pi to the 'supposed'  identical 6 port board (neptune etc)

I was told by KNC that the raspberry pi B+ 512mb (I think it was) can be replaced w/o issue if the PI is broke (or at least what knc tech said on forums back in the day
when broken pi's showed up...damn wish someone would have archived forums ..i to go way back machine and it ends at the titles damn it so close)

Also this Titan Bridge should work for folk with blown Titan Bridges

the last 'hurrah' is to take a bought PI and this clone board and fire it up on a Neptune board (and go down to the other boards as bravery strikes) and see if WITH  the
clone bridge ACTUALLY have a work around for Titan Controllers...

Alas no one is brave enough to toss out a cube...I'm hoping someone someplace has a Titan cube that is so IFFY or so LAME in hashing like 1 die working at low settings and/or
a cube that works in some manner but flakes out after 2 hrs or something...I mean real ugly no use for the Titan cube ...except for perhaps it could run for say 15min in some manner
badly or not to prove the concept that the above arrangement would work

Likely the FPGA is firmware on the 6 port card is different on the chip between Titan and Neptune 6 port card ..but knc are 'evil' don't ya know ..would be like them to just ahem 'recycle' old Jupiter 6 port boards from their farm as they pulled them to put in Neptunes and recyled them into use on the Titans (if so another well played evil genius fix by knc)

anyway thought it was of note here with your fixes and delving into the Neptune board if/any differances or whatever you can point out

but where such meanderings are on the swedish asic thread below

https://bitcointalksearch.org/topic/swedish-asic-miner-company-kncminercom-170332

Myself at a 'guess' I'd put the odds at about 70/30 that the only diff between the Titan Controllers and the Neptune Controllers is the PI and the bridge and the SD firmware

anyway like the work you are doing on this thread ..interesting thou I don't have a Neptune. (Do have a Jupiter ..would be nice to convert this up to titan controller someday dare to dream)



legendary
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
Putzing around with board #1 here, this one blew two of the trimmer caps off the board and shorted the third. Right now I need to figure out all of these settings, but it's purring along at 400gh @200 watts, so that's not too bad. All eight supplies are under 60c, which is about as high as I would take them. Really will have to find better ways to cool those DC-DC's, I'm guessing that will take down part of a hashing board.

Will let it purr for a bit, then slow it down for the night.
legendary
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
Pulled the bad caps on the other board, and we're up and hashing. So I now know how to fix a blown Neptune board, and probably by extension Triton board and later Jupiter boards.

Mining companies really should hire someone to troubleshoot and fix this stuff. Now on to the next part, fixing the hashing boards...

C
legendary
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
Ok, so anyway I see that my other board has shorted caps somewhere as well, now that I have a board that at least powers stuff I can see where I am:

On the board without the FPGA but with the fixed caps:
Voltages:
3.3 volts is on the left side of the chip along that row of caps.
1.2 volts is on the right side of the chip, between the first four caps (center rail, edge ground)
2.5 volts is on the right side of the chip, outside the last two caps (edge rail, center ground).

These are correct to 3 significant figures, so I think they are highly regulated. My other board has... different values.

I'm going to work on this some more tomorrow, maybe try to put on the FPGA chip after cleaning the pads. Which is always fun.
legendary
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
Powered up the board. Ran the IO powerup

root@Neptune:/etc/init.d# io-pwr init
TPS65217 OK. Modification A, revision 1.1
root@Neptune:/etc/init.d#

In technical parlance: That's it.

Now to see if I can *gently* figure out which cap is blown on the other board here.

Progress!
legendary
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
This is *very* interesting.

So with all chips off I decided to check the pads to ground. Because I noticed one of the caps seemed "shorted" on pin 1 to the tps65217 chip. Still a short. Odd.

Put the new TPS chip on and checked again. Pin 1 shorted to ground. According to the docs, pin 1 is vout2. Odd. So I pulled the filter cap next to pin 1 and ground. Pad still shorted at the chip.

Odd.

Checked the rest of the board. Only other caps on that line are the two under the FPGA pads (FPGA is off). Removed both. OPen circuit to ground. Checked both. One was shorted, put the other back.

Odd.

What I am beginning to think is this: The trimmer caps on the Neptune board can short. When one of them does it drives the associated voltage line to zero and the FPGA is fucked. Checked my other board (the control, did nothing to it) and sure enough, one of the voltage lines is shorted to ground through one of those caps on the left side.

Interesting. Anyone else with a dead-sih neptune board bored this weekend and want to play with a VOM?

Pages:
Jump to: