Pages:
Author

Topic: Hacking KNC Titan / Jupiter / Neptune miners back to life. Why not? - page 7. (Read 76765 times)

legendary
Activity: 3164
Merit: 2258
I fix broken miners. And make holes in teeth :-)
In the meantime, all caught up on repairs for the moment, taking a breather before the next batch arrives.

Got my first real return; this was one I fixed about a year ago; at the time it had burned out drivers so I fixed those and had all four dies running. Came back "not working" along with a couple of new cube repairs so I took a good look at it.

The entire power plug was burned to a crisp. The plastic had been broken off and it was six burned power pins. Drivers were ok oddly enough but it had a pin 8 short (8 ohms) and burn marks on the back. Checked the signal lines to the chip and sure enough; one of the lines was dead so I isolated it, put a new plug on it and cleaned out the insect parts under the power supplies with alcohol and Q-tips.

3 dies running, cause of death was running it way too hard. Please keep the hashing speed on these things at around 1,000mhz per board. That would be 250mhz per die (275 maybe) with 4 dies, 300 with 3 dies. Anything more really damages the connectors and can short out a die or the whole cube.

Hope this is a useful tidbit.
legendary
Activity: 3164
Merit: 2258
I fix broken miners. And make holes in teeth :-)


Pin 10 is ground, as is the three top pins of the PCIe plug.

And oddly enough that could make sense; when you shorted it the current flowed through all the other cubes, but since that one did *not* have a ground it suffered a different kind of damage. Sucks, I know but check again.

C

Still 0 ohm Sad
Damn, sorry to hear that. Those miners cannot be fixed, but the last one might be fix-able. If you want to send the last one in let me know.
legendary
Activity: 1281
Merit: 1000
☑ ♟ ☐ ♚


Pin 10 is ground, as is the three top pins of the PCIe plug.

And oddly enough that could make sense; when you shorted it the current flowed through all the other cubes, but since that one did *not* have a ground it suffered a different kind of damage. Sucks, I know but check again.

C

Still 0 ohm Sad
legendary
Activity: 3164
Merit: 2258
I fix broken miners. And make holes in teeth :-)
Yes, the other controller blinks similarly it broke earlier.
I guess all the cubes are done. When I tried them with a working controller, the controller didn't boot correctly. I got light to only one led (out of 3) and blinking 'bright light'.
That's normally not good. The way to be sure is to test the cube with an ohm-meter/VOM for resistance. Check the resistance between pin 4 and ground and pin 6 and ground (on the 10 pin connector where pin 2 is the top left one and pin 10 is the top right one). If 0 ohms on pin 4 then things are complex. If 0 ohms on pin 6 the board is shot.

I'll buy ohm-meter and check, thanks! n00b question: what pin is ground?  Embarrassed

e. I checked cubes and if ground is the pin 8, then 0 ohms between it and pin-4 / pin-6. So 0-ohms for 3 cubes, only one without 0-ohms was the one caused this mess (the one with bare atx-6 pins).

Pin 10 is ground, as is the three top pins of the PCIe plug.

And oddly enough that could make sense; when you shorted it the current flowed through all the other cubes, but since that one did *not* have a ground it suffered a different kind of damage. Sucks, I know but check again.

C
legendary
Activity: 1281
Merit: 1000
☑ ♟ ☐ ♚
Yes, the other controller blinks similarly it broke earlier.
I guess all the cubes are done. When I tried them with a working controller, the controller didn't boot correctly. I got light to only one led (out of 3) and blinking 'bright light'.
That's normally not good. The way to be sure is to test the cube with an ohm-meter/VOM for resistance. Check the resistance between pin 4 and ground and pin 6 and ground (on the 10 pin connector where pin 2 is the top left one and pin 10 is the top right one). If 0 ohms on pin 4 then things are complex. If 0 ohms on pin 6 the board is shot.

I'll buy ohm-meter and check, thanks! n00b question: what pin is ground?  Embarrassed

e. I checked cubes and if ground is the pin 8, then 0 ohms between it and pin-4 / pin-6. So 0-ohms for 3 cubes, only one without 0-ohms was the one caused this mess (the one with bare atx-6 pins).
legendary
Activity: 3164
Merit: 2258
I fix broken miners. And make holes in teeth :-)
Yes, the other controller blinks similarly it broke earlier.
I guess all the cubes are done. When I tried them with a working controller, the controller didn't boot correctly. I got light to only one led (out of 3) and blinking 'bright light'.
That's normally not good. The way to be sure is to test the cube with an ohm-meter/VOM for resistance. Check the resistance between pin 4 and ground and pin 6 and ground (on the 10 pin connector where pin 2 is the top left one and pin 10 is the top right one). If 0 ohms on pin 4 then things are complex. If 0 ohms on pin 6 the board is shot.
legendary
Activity: 1281
Merit: 1000
☑ ♟ ☐ ♚
I have a broken KNC Titan. One of the cubes had melted power connector, and because of that the connector was removed. So, I connected power cable to the wrong pins, and I think the entire machine was fried (cubes and controller). I'm currently planning to sell it.
Do you think it can be fixed, and how much would the repairing cost? I assume shipping is to US?

e. I have also second fried controller (I guess FPGA, because of the blinking of the bright light).

Hm. I assume the first controller was blown out as well (no proper lights when you plug it in without the cubes?) At the very least you blew up the cube and controller, I wonder if it spread to the other drivers. I'd have to take a look at the controller, the cube you reversed, and one of the other cubes to make a determination.

This is also a problem with people who pull the plastic off the power connector and run the pins bare. Works, but if you ever put the plug on so that iit's only on the top three then you blow everything up. Have it fixed properly.

Yes, the other controller blinks similarly it broke earlier.
I guess all the cubes are done. When I tried them with a working controller, the controller didn't boot correctly. I got light to only one led (out of 3) and blinking 'bright light'.
legendary
Activity: 3164
Merit: 2258
I fix broken miners. And make holes in teeth :-)
I have a broken KNC Titan. One of the cubes had melted power connector, and because of that the connector was removed. So, I connected power cable to the wrong pins, and I think the entire machine was fried (cubes and controller). I'm currently planning to sell it.
Do you think it can be fixed, and how much would the repairing cost? I assume shipping is to US?

e. I have also second fried controller (I guess FPGA, because of the blinking of the bright light).

Hm. I assume the first controller was blown out as well (no proper lights when you plug it in without the cubes?) At the very least you blew up the cube and controller, I wonder if it spread to the other drivers. I'd have to take a look at the controller, the cube you reversed, and one of the other cubes to make a determination.

This is also a problem with people who pull the plastic off the power connector and run the pins bare. Works, but if you ever put the plug on so that iit's only on the top three then you blow everything up. Have it fixed properly.
legendary
Activity: 1281
Merit: 1000
☑ ♟ ☐ ♚
I have a broken KNC Titan. One of the cubes had melted power connector, and because of that the connector was removed. So, I connected power cable to the wrong pins, and I think the entire machine was fried (cubes and controller). I'm currently planning to sell it.
Do you think it can be fixed, and how much would the repairing cost? I assume shipping is to US?

e. I have also second fried controller (I guess FPGA, because of the blinking of the bright light).
legendary
Activity: 3164
Merit: 2258
I fix broken miners. And make holes in teeth :-)
Interesting repairs came in today, 4 cube boards, all shorting the power supply. 3 of them turned out to have blown FETs, I'm getting better at finding those with the air heat and the pre-heater. Hook up your test leads and watch the resistance change as you move hot air over components. When the resistance changes quickly, you're near the source of the problem.

Last one had a blown die that was shorting out the SPI bus. Still have to find those with trial and error, but got it and board is now up on 2 of 4 dies. Not too bad...
legendary
Activity: 2450
Merit: 1002
This is interesting:

So I've been thinking about the power supplies and some of Tarkin's thoughts. He discovered that the supplies have different programming and I was wondering why. Why are the supplies programmed differently and what can you do with that?

Out came the scope to take a look at the phase varience on each supply. We know that two supplies should beat 180 degrees out of phase with each other, that's the master/slave relationship on each side. Provides smooth power for the chip with less capacitors required.

What I see now is that they had a master clock signal coming from one of the supplies, and the 4 pairs adjusted their phase to be 180 degrees from their pair as well as 45 degrees offset from each other pair.

In other words, the first set fired at 0 and 180. The second set fire at 45 and 225. The third set fire at 90 and 270, and the fourth fire at 135 and 315. That way the 12 volt rai sees an 8 phase power supply pulling from it, which once again makes for a smooth ride.

It also means you can't swap supplies without knowing the exact position, and why new supplies don't work right. Fascinating.

Meantime I fixed some more boards, most complex was one that had two supplies blown and 3 of the 4 drivers. Interesting.

C

Yeahp, I get that same feeling when looking at the register configurations on each DCDC. It seems they have each pair of DCDC's for a die set to be a current sharing group and auto phase control is turned on for the pair. Furthermore, each pair has its own unique group identifier and unique interleave setting ... which may be the phase variance between each group you mention. If not then theres some other group registers that seem to be configured similarily.
I imagine if someone wanted to swap the 40A ones for the 50A DCDC variant, all they would have to do is ensure the configuration of each is copied over.
When I spoke w/ Ericsson, they have a usb-pmbus adapter which you could use to hook up to the 10pin connector on each cube, this can be used to run their free gui software suite for configuring the DCDC's. Fascinating stuff to say the least.
Also, Im happy to report my Titan is now going on 203+ hrs of uptime w/o any dies going to sleep ... think this is a record.
Been like this since I changed to 200khz switching freq, could just be coincidence tho w/ some cable rearrangement on the power side of things haha! Also, DCDC's are still bout 2-4C cooler each.
One thing thats also different which I didnt think of before is ... since I updated to 200khz and RESAVED the USER_STORE_ALL values for each DCDC ... that means my custom voltages are now set in the USER_STORE_ALL ... so from the very first power on of the DCDC's they are supplying the specified voltage to the dies(not default voltage which I think is .85v) and not waiting till the rpi boots up fully to run waas to set voltage values from the advanced.conf file.
I have no idea if that would really affect anything.
legendary
Activity: 3164
Merit: 2258
I fix broken miners. And make holes in teeth :-)
This is interesting:

So I've been thinking about the power supplies and some of Tarkin's thoughts. He discovered that the supplies have different programming and I was wondering why. Why are the supplies programmed differently and what can you do with that?

Out came the scope to take a look at the phase varience on each supply. We know that two supplies should beat 180 degrees out of phase with each other, that's the master/slave relationship on each side. Provides smooth power for the chip with less capacitors required.

What I see now is that they had a master clock signal coming from one of the supplies, and the 4 pairs adjusted their phase to be 180 degrees from their pair as well as 45 degrees offset from each other pair.

In other words, the first set fired at 0 and 180. The second set fire at 45 and 225. The third set fire at 90 and 270, and the fourth fire at 135 and 315. That way the 12 volt rai sees an 8 phase power supply pulling from it, which once again makes for a smooth ride.

It also means you can't swap supplies without knowing the exact position, and why new supplies don't work right. Fascinating.

Meantime I fixed some more boards, most complex was one that had two supplies blown and 3 of the 4 drivers. Interesting.

C
legendary
Activity: 3164
Merit: 2258
I fix broken miners. And make holes in teeth :-)
Shoveling through the work, normal stuff like blown plugs and such. Now I'm working on an odd one: This is a clean Titan board from China. Shuts down controller, but pins 4,6,8 are ok. So I check pins 1,3,5,7,9 and find that pins 1,3,5 and 9 are way off (blown drivers, both of them) and 7 is like 100 ohms (scl for the LM75 and Eeprom and power supplies). Really weird, what the heck happened to this thing.

So I pulled the drivers, replaced, pin 7 is still low (it should be 6k). Pin 2 is ok (SCK) so I am wondering what is up. Use my heat trick and see that power supply 3 is changing the resistance, so I pull it. Now up to 2k, but the board still doesn't respond when plugged in. With no power and the 10 pin hooked up I don't see the EEPROM or LM75 and if I put 12v on it the controller stalls (good indication of a power supply problem).

Weird. Never seen two supplies go out together. I'll lift a few partially and see if I can isolate it.

Never dull in repair land.
legendary
Activity: 3164
Merit: 2258
I fix broken miners. And make holes in teeth :-)
Thank you. Back from the con, going over to the PO this morning, got a lot of packages there to work on . Should have the backlog cleared in a few days, will post a status update.

C
legendary
Activity: 1078
Merit: 1050
Well I'm here at Defcon taking a bit of a break. You can find me pretty easily, I have a mining board instead of a badge around my neck. Feel free to say hi, already have met a few people here.

I'm going to go down to the Hacking village on Friday and see if they would like me to do a talk. I've brought the bag of tricks with me, so I could do a presentation on how to spot and fix some common issues on bitcoin mining gear and how to figure out what's what. If so I'll probably do it Saturday, will check.

Anyone else here and want to hit a bitcoin machine for the hell of it?

C

I'm not there, wish i was, have fun mate. Sounds like you are.
legendary
Activity: 3164
Merit: 2258
I fix broken miners. And make holes in teeth :-)
Well I'm here at Defcon taking a bit of a break. You can find me pretty easily, I have a mining board instead of a badge around my neck. Feel free to say hi, already have met a few people here.

I'm going to go down to the Hacking village on Friday and see if they would like me to do a talk. I've brought the bag of tricks with me, so I could do a presentation on how to spot and fix some common issues on bitcoin mining gear and how to figure out what's what. If so I'll probably do it Saturday, will check.

Anyone else here and want to hit a bitcoin machine for the hell of it?

C
sr. member
Activity: 703
Merit: 272
Well, there's always something new. Got two cubes in today for repair. One had a burned plug (fixed, I need to order another box of plugs) with all four dies now working. The second one was weird: The user said it would start mining then shut down the whole contoller within a new minutes. So I hooked it up, put a good heat sink on it, and took a look.

Interesting one. Very, very interesting. Yes, it does come up on four dies (one supply is bad on die 1) and yes it does crash the controller and in one case shut down my power supply. Never seen this one.

Did some work on it. Die 1 is bad. It works, but it's still bad, what happened is that it overheated and damaged both its power supply and the rest of the board to some extent. In this case, when it comes up it goes into thermal runaway as it heats up, ultimately shorting the supplies and the board. You can see the burning on the back of the board, it got very hot.

Turning it off clears that but the other three dies also run very hot. After experimenting for a bit and doing a reflow I found a solution: Run the remaining three dies at a much lower voltage. Running dies 2,3,4 at -.0806 at 275mhz gives you a cube that runs at 57mh while running at 42c in a box with the original style fan. That's a pretty good running cube and about what you should expect from a cube normally. So that's fixed as well.

Never know what you will run into in this business....

Just a note..a maxumark bracket (get rid of u shaped knc kludge heatsink) and putting on the individual heatsink option on each dc/dc chip

might further reduce his heat....the noctura fan mod at 3000 rpm (20% more noise) vs standard 1500 rpm knc also may help

anyway if you wanted to extend the life of a cube the 10 buck maxumark bracket and heatsinks direct on dc/dc's and maybe the fan is how

I'd go..just a thought to pass on to the guy.

With the above mods my temps on my dc/dc's went down 10-15c on ALL modified dies, this is with the maxumark bracket and the noctura 3000 rpm fan mods.

So my miners in the basement in the summer NOW run at the heat they used to run w/o any mod the previous Winter! 10c-15c is a big deal in real heat output

no more shutting down titans on 90F plus days now....it cut me that much slack in heat etc

brad


that would be me.

I just emailed maxumark for the brackets and heatsinks.  Hopefully he'll accept btc, i'm tapped for the ltc til after this weekend.

I'm in Texas now also.  My cubes were running in Maryland, but i moved to Texas over a year ago.  They've been running nonstop since i've had them (batch 1)
legendary
Activity: 915
Merit: 1005
Freshly repaired 3 cubes by the one and only lightfoot.  Hashes at 170 MH/s feel free to ping me an offer.  Would like to let them go instead of putting them in storage.  If not I might have to find some extra space to run them
copper member
Activity: 2898
Merit: 1465
Clueless!
Well, there's always something new. Got two cubes in today for repair. One had a burned plug (fixed, I need to order another box of plugs) with all four dies now working. The second one was weird: The user said it would start mining then shut down the whole contoller within a new minutes. So I hooked it up, put a good heat sink on it, and took a look.

Interesting one. Very, very interesting. Yes, it does come up on four dies (one supply is bad on die 1) and yes it does crash the controller and in one case shut down my power supply. Never seen this one.

Did some work on it. Die 1 is bad. It works, but it's still bad, what happened is that it overheated and damaged both its power supply and the rest of the board to some extent. In this case, when it comes up it goes into thermal runaway as it heats up, ultimately shorting the supplies and the board. You can see the burning on the back of the board, it got very hot.

Turning it off clears that but the other three dies also run very hot. After experimenting for a bit and doing a reflow I found a solution: Run the remaining three dies at a much lower voltage. Running dies 2,3,4 at -.0806 at 275mhz gives you a cube that runs at 57mh while running at 42c in a box with the original style fan. That's a pretty good running cube and about what you should expect from a cube normally. So that's fixed as well.

Never know what you will run into in this business....

Just a note..a maxumark bracket (get rid of u shaped knc kludge heatsink) and putting on the individual heatsink option on each dc/dc chip

might further reduce his heat....the noctura fan mod at 3000 rpm (20% more noise) vs standard 1500 rpm knc also may help

anyway if you wanted to extend the life of a cube the 10 buck maxumark bracket and heatsinks direct on dc/dc's and maybe the fan is how

I'd go..just a thought to pass on to the guy.

With the above mods my temps on my dc/dc's went down 10-15c on ALL modified dies, this is with the maxumark bracket and the noctura 3000 rpm fan mods.

So my miners in the basement in the summer NOW run at the heat they used to run w/o any mod the previous Winter! 10c-15c is a big deal in real heat output

no more shutting down titans on 90F plus days now....it cut me that much slack in heat etc

brad
legendary
Activity: 3164
Merit: 2258
I fix broken miners. And make holes in teeth :-)
Well, there's always something new. Got two cubes in today for repair. One had a burned plug (fixed, I need to order another box of plugs) with all four dies now working. The second one was weird: The user said it would start mining then shut down the whole contoller within a new minutes. So I hooked it up, put a good heat sink on it, and took a look.

Interesting one. Very, very interesting. Yes, it does come up on four dies (one supply is bad on die 1) and yes it does crash the controller and in one case shut down my power supply. Never seen this one.

Did some work on it. Die 1 is bad. It works, but it's still bad, what happened is that it overheated and damaged both its power supply and the rest of the board to some extent. In this case, when it comes up it goes into thermal runaway as it heats up, ultimately shorting the supplies and the board. You can see the burning on the back of the board, it got very hot.

Turning it off clears that but the other three dies also run very hot. After experimenting for a bit and doing a reflow I found a solution: Run the remaining three dies at a much lower voltage. Running dies 2,3,4 at -.0806 at 275mhz gives you a cube that runs at 57mh while running at 42c in a box with the original style fan. That's a pretty good running cube and about what you should expect from a cube normally. So that's fixed as well.

Never know what you will run into in this business....
Pages:
Jump to: