Pages:
Author

Topic: Hacking KNC Titan / Jupiter / Neptune miners back to life. Why not? - page 30. (Read 76860 times)

hero member
Activity: 808
Merit: 502
Amazing job Lightfoot. Nice fix... Keep up the good work.
legendary
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
Damaged die most likely. The power supplies are actually pretty reliable and the connections on the .6v side to the chip seem to be pretty good. So that's not it. These dies are flakey, I think if they go odd or down it's due to manufacturing failures. Hm.
 
In the meantime I spent the evening letting this unit hash while I looked at my sacrifice board. It has had a short on pin 6 since the day I got it, and nothing seems to clear the problem (the guy had three boards that all did this after a major power supply failure, one board was merely burned). This board was my sacrificial lamb, I used it to trace the shorts on pin 4 (to the chip but could be bypassed with jumpers), pin 8 (same) and pin 6 (which I could not clear).

So tonight I decided to clear the short. Pre-heated the board to full, fluxed everything, Pulled every component around the chip, every chip, capacitor, jumper, plug, supply, you name it. Nothing cleared the short. Then I pulled the chip, somewhat messy on the lift. Still shorted, but some solder smears. Cleaned up the smears and when I was on the last corner the meter stopped beeping.

Ran the solder ball back, BEEP! Sure enough I had cleared the final fault. Checked the chip at those pads and sure enough, short in the chip itself.

Fuck.

So the problem is in the chip. What is happening is that when the chip fails or gets voltage spikes or surges it takes out the housekeeping circuits on a die. This is a common line shared by all four dies on the chip and is not possible to isolate via the board. Without it, the board will not hash.

You're basically fucked.

Need to think for awhile about this: Without new hashing chips this really can't be fixed. I mean it's theoretically possible that one could apply a very high current 3.3 volt spike to try and blow the line open on the affected chip without taking out the other three chips, but that's really iffy. I might try this on board 2, but it is not what we call an optimal solution.

Hm. Bedtime.
hero member
Activity: 895
Merit: 504
legendary
Activity: 1428
Merit: 1000
https://www.bitworks.io
Very impressive! I dig what you did running the wire external to the board.
legendary
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
And how do you fix the above so it runs at a nice 60mh for 24 hours at a nice and smooth <50c power supply temps and <40c chip temps?

Why build a bypass of course!



This will go down as one of the cooler things I have done in a day. As discussed the via lines for the +12v supply to the board were destroyed. +12 is on the inside layers so you can't just connect. However what I figured might work is to build a bypass bus that starts on one side of the board, goes *through* the molex connector, then winds its' way around the board dropping power at several points to both minimize the load being dumped into a particular plane and to reduce potential noise coming from supplies near the drop site destabilizing ones further away.

First step was to pick the wire. After considering 10 gauge (way too big) and 12 gauge (stiff) I went with 14 gauge UL rated teflon coated copper stranded wire. Copper stranded because I could interleave the strands into the molex plug pins for maximum contact points, UL rated because typical automotive wire insulation is only rated to 80c whereas UL rated wire is 100c rated. We don't want the wire to smoke since it will be near the power supply inductors, which get hot. And 14 gauge because that is double the gauge of 16g wire which is considered "really good" in a power supply. 14 gauge has twice the capacity of 16g, so if I go in two directions I can equal 16g wire with 4 times the load.

Second step was figuring out a connection point. I thought about scraping the board by the power supplies (bad, hard to solder, and prone to lifting), the supply standoffs (close to electronic components under the power supplies, limited connection) then finally realized I could just make the wire flat, solder-tin it, cut it to shape, then tin and solder to the capacitor banks next to a power supply. That would give me 8 connection points into the board, with physical reinforcement (between supply and capacitors) and by using rosin core solder I could melt it onto the capacitors without having to worry about melting the connection between capacitor and board (which is ROHS solder that needs +100c more heat to melt).

Third step was forming the wire. This took awhile, with a lot of bends and thoughts. Ultimately I went for three connection points, at supply 2, 5, and 7. This leaves a slight imbalance on supply 4, but it's the best I can do right now. Other supplies will pick up power through the existing via and +12 ground planes. The power distribution would be odd, but it should hold as long as you don't pull max power on every supply.




Here is the wire being formed. The top one was the first attempt, I broke a strand while forming it, and since every strand is critical I gave it up and tried again. The middle one has the molex connector soldered on, once again I went with strands on both sides of the pins, yet high enough to keep it away from the board. The bottom (red) one is automotive wire. Thicker insulation for physical protection, but a lower rating and not UL listed.

Soldered the wire onto the caps making sure there was no tension/compression against any of the components (difficult) and soldered the Molex to the board. Tried it out for resistance, saw the usual 1k or so on the 12v lines. Then put the heat sink on and mounted it in the box:



Bent the wire a little bit with two pliers (remember *NO STRESS ON ANYTHING*) to ensure it doesn't touch the side (even insulated you never want things to rub) and tried my smallest supply.

Board came up. 50mhz got me 20mh total. 100mh got me 40, and 200mhz got me the magic *60*mh. So far the supplies are stable at .78v or so, with temps at 50-53c.

Titans were built to run at 60mh, running them over this starts increasing power draw and moves the supplies closer to their limits. On a normal board this is annoying, on this kind of board you will probably start to see instabilities on the 12v planes as power goes places it was not designed to go. At this point though the connector is cool, the voltage drop to the supplies from the 12v line is not bad (11.8v at connector input, 11.78v at furthest end) showing a .02v drop at 180 watts measured. Or a .3w drop in the wires, not too bad.

Very difficult to fix, took more time and effort than one might thing. Need to let it run for some more time, but so far so good. Moral: Don't burn boards. :-)

hero member
Activity: 808
Merit: 502
It might be a good idea for people to just step down the frequency or voltage just a dab in order to prevent the over current and extreme heating that leads to molex meltdown. That really looks like a mess. Its a good thing you guys are able to repair these for people.
legendary
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
I will share my trick to the molex.  I actually break the plastic all away - both cutting and prying.  Once that is all done, I preheat to 300 on bottom and use 425 hot air to each individual pin.  Remove them one by one with controlled MASS heat.  Be VERY careful with your aim with the hot air that hot....

There is a way to do it with less heat; alloy the solder. Specifically put flux on the pins and melt normal solder onto it using a normal iron at normal temps. The lead solder will mix with the ROHS stuff lowering the melting temperature by a *huge* amount (basically to the level of pb/tin solder). Then the pins can come right out without any fuss or worry about overheating the board.

However if the pins look like this.



It's going to be a bit more... complicated. (Note how the pins are pointing in different directions, the actual anchors into the board have been burned clear through by the heat).

Bad.

Quote
I also came across something really weird.  I have a couple of controllers that wont go in to advanced.  However, if you flash the board to rc6 it will work.  The second you update to latest software, no more advanced.  2 separate controllers. 

Hm. What kind of lights come on and do they hash with Neptunes?

Night!
full member
Activity: 133
Merit: 100
I will share my trick to the molex.  I actually break the plastic all away - both cutting and prying.  Once that is all done, I preheat to 300 on bottom and use 425 hot air to each individual pin.  Remove them one by one with controlled MASS heat.  Be VERY careful with your aim with the hot air that hot....

Now if I can figure out how to resurrect the bad titans -and the bad neptunes for that matter.  

I also came across something really weird.  I have a couple of controllers that wont go in to advanced tab from web software.  However, if you flash the board to rc6 it will work.  The second you update to latest software, no more advanced.  2 separate controllers.  

I wish that the details in Screen -r were as nice in Neptune as it is in Titan.  It would really help to see the dies broken down like the Titan software does.  If anyone knows how to see individual dies and cubes in command form, that would be very helpful.

Thanks!  Happy Hackin!

Boomin


 
legendary
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
Pounding away, busy weekend bending wire/ More later, moral please please do not blow out your connectors. Clean them, drop power usage below 200w/cube, use really good supplies, whatever. Bitch to fix.
full member
Activity: 133
Merit: 100
My theory is the engineers thought it to be a GREAT IDEA having all of the Dies on the bus so they could constantly analyze data.  Then they figured out its like trying to hear a conversation in the middle of a goal at a great hockey game.  God forbid one of the dies die -(that almost sounded funny) then the entire thing is a wreck. 

just my theory. 

Boomin
copper member
Activity: 2898
Merit: 1465
Clueless!
Maybe. I have one incinerated Y adapter, showing that even that can't make up for a bad power supply or overloading or a connector that has been plugged and unplugged with the power on (thus causing sparks which cause little bits of resistance which warm up which.....)

Still, it is also true that there seem to be these islands of stability on Titans, and if that island is at 300-325mhz then you're kind of stuck with it.

Back to work here. Have two more boards to keep me busy over the next few days.



If ya ask me, based on my single Titan up in hosted up in Alaska .... I believe heat is the number 1 killer of these things.... that combined w/ the fact how close these things are pushed to the edge by users.

But, my Titan has ran for less than 75-80C on all DCDC's since the day it arrived in 2014 ... clocks 325mhz, volts set to -0.0366v or lower.
It did survive one PSU dieing - modular connector sockets on the PSU actually burned up. Titan connectors were fine.
Other than that, it has hashed along happily the whole time, just like it did since day 1.

I could be totally wrong tho =P

But yeah, its a hunch based off of seeing the settings and temps some users are pushing these things through. I mean YOWCH!...


yeah i essentially agree...BUT again....was told back in the day my temps/speed with overclock etc was just dandy (silly me) and that it has run more or less as a champ at 15 months i lose a bit of hash on a few dies (not counting the 2 dead dies on 2 dead cubes that never really worked) well hell.....so yeah I likely will pay the piper ..due to the only lately (last couple months) seeings how close to the edge I am

but to be fair...to the folk over the last year running these things ..esp us 1st batch titans with the bad dies which I and others were told 'specifically' to run full out at 325 0.0366 etc...well ..clueless as I was I ran mine full out all summer of 2015 with die rates in the low 90's some at 95c 96c

the guy I got the used Titan from Australia (way in the outback I kid you not) I think was away on work and eventually could NOT run them remote fire hazard...also had burned out 2 sets of psu's I think..thus how I got them in the panic sale of march 2015 on the expectation of all that NEW scrypt equip that never came to pass

I should post a pic of those cubes...run hard ..really hard...the cubes are faded with heat spots ...looks like the hood of a hot rod in Arizonna after racing all summer...kinda cool....

and these 4 cubes are MY MOST STABLE OF THEM ALL ...FULL OUT at 325 and 0.0366 since March w/o missing a beat

go figure

being electronic clueless is there some kind 'burn in' in that they ran too hot ...adapted and then now 'like' it that way?

anyway ..take advice here of everyone on how to tweak this stuff...the recent cube I got is set to 0.06xx volts at 300mh
and stable...so I left it....no idea how he got voltage that low at 300mh I've never had much luck

anyway my setup is probably an 'freak' setup and due to decent high end psu's as stated in other posts etc

but again .....I was 'assured' when I did over clock these to 325mh by I think it was Kurt at KNC? All was fine. They
could run easy and safe between 90c and 95c no problem and up to 100c...I kid thee not..what I was and sure others
were told....there is a reason that knc forums was 'yanked' imho ......sheesh...I saved a lot of stuff I should see if I
can dig out that email exchange (hope it was email ...coulda been knc forum but i don't thinks so) I'll look

I'm sure others on here were told the same back in the day as first batch titan owners.

I should have known better ..but in my case NO real issues of note...thus never had to delve into tweaking much (till lately a bit)

anyway got on a side note got 3 cubes 1 cube only 1 die works...1 cube the 300mh at low 0.06xx low voltage..go figure stable and another one in the mail supposedly fully working...so figure about 190mh to 200mh for 260 ltc (I hoard so on paper I sold LTC at it was almost exactly 750 usd)

my way around it 2 cubes and an extra die as over kill ...when stuff starts for 'fade' as i should 'mentally prepare for"  a way to compensate for the likely 'dribble down in hash' coming for the as of march 8th 16months on both Titans the new and used one...both are Nov 2014 units

so hell 'overcompensating' with equipment....better then fretting about it ...like most of my life issues if it bugs me enough

toss money at it..seems to work Smiley


But listen to folk here...very likely ...very soon I will be scrambling like everyone else with different voltage and hash settings
due to the 'heat debt wear and tear I have put these units thru.....but as yet ..still good (knock wood)

later


Perhaps, in which case the only Titans that can be fixed are the ones that smoke power connectors or blow out power supplies, both resulting in shorting on power-up. This is possible, although sad.

However other things just seem odd. For example: I have another dead titan board in for a review. It hashes on one die. Checking in bfgminer I see that only two dies ever respond, the other two do *nothing* no matter what. It has the jumper missing.

I wonder what happens if I put the jumper in. Will all four dies respond like insane children? Why would KNC do this?

Then I look at the supplies. Two are missing, so I check resistances on the chip side. Normally I see around 7.5 ohms if chip is good, 0 if shorted (to ground). On this one I see four power supplies reading 7.5,7.5,7.5,7.5. The other four read 7.5,600 and 7.5,140 ohms.

THAT IS IMPOSSIBLE! The supply outputs are ganged together in parallel. There is no way on EARTH that the outputs can read higher resistance. What the heck is going on?

I'll pull a supply and see. But I'll also pop in a temporary jumper and see what happens.

Just saying KNC being 'evil' and all ...could they have PULLED these resistors when they sent out those 1/2 hashing titan cubes for replacements for 1 dead cube trick..thus making sure ..as they 'claimed' at the time these were UNFIXABLE?

I don't do electronics...but could it be...something like this to make sure we could not hash with them 'after the fact'?

just a thought ..but again I can barely follow your electronics sometimes Smiley





legendary
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
Perhaps, in which case the only Titans that can be fixed are the ones that smoke power connectors or blow out power supplies, both resulting in shorting on power-up. This is possible, although sad.

However other things just seem odd. For example: I have another dead titan board in for a review. It hashes on one die. Checking in bfgminer I see that only two dies ever respond, the other two do *nothing* no matter what. It has the jumper missing.

I wonder what happens if I put the jumper in. Will all four dies respond like insane children? Why would KNC do this?

Then I look at the supplies. Two are missing, so I check resistances on the chip side. Normally I see around 7.5 ohms if chip is good, 0 if shorted (to ground). On this one I see four power supplies reading 7.5,7.5,7.5,7.5. The other four read 7.5,600 and 7.5,140 ohms.

THAT IS IMPOSSIBLE! The supply outputs are ganged together in parallel. There is no way on EARTH that the outputs can read higher resistance. What the heck is going on?

I'll pull a supply and see. But I'll also pop in a temporary jumper and see what happens.
legendary
Activity: 2450
Merit: 1002
Maybe. I have one incinerated Y adapter, showing that even that can't make up for a bad power supply or overloading or a connector that has been plugged and unplugged with the power on (thus causing sparks which cause little bits of resistance which warm up which.....)

Still, it is also true that there seem to be these islands of stability on Titans, and if that island is at 300-325mhz then you're kind of stuck with it.

Back to work here. Have two more boards to keep me busy over the next few days.



If ya ask me, based on my single Titan up in hosted up in Alaska .... I believe heat is the number 1 killer of these things.... that combined w/ the fact how close these things are pushed to the edge by users.

But, my Titan has ran for less than 75-80C on all DCDC's since the day it arrived in 2014 ... clocks 325mhz, volts set to -0.0366v or lower.
It did survive one PSU dieing - modular connector sockets on the PSU actually burned up. Titan connectors were fine.
Other than that, it has hashed along happily the whole time, just like it did since day 1.

I could be totally wrong tho =P

But yeah, its a hunch based off of seeing the settings and temps some users are pushing these things through. I mean YOWCH!...
legendary
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
Maybe. I have one incinerated Y adapter, showing that even that can't make up for a bad power supply or overloading or a connector that has been plugged and unplugged with the power on (thus causing sparks which cause little bits of resistance which warm up which.....)

Still, it is also true that there seem to be these islands of stability on Titans, and if that island is at 300-325mhz then you're kind of stuck with it.

Back to work here. Have two more boards to keep me busy over the next few days.

copper member
Activity: 2898
Merit: 1465
Clueless!
@ Searing,  Yes but you have been running high end PSUs with your Titans the whole time. I think that makes all the difference.

  


thanks good to know.....but again likely we just plain need to replace all Y adapters after 1 year of use.....24/7 for
that much time is likely pushing it too fare....so to be safe 1 year probably should be tops maybe

again the stuff at 12 months (mostly knc) seems ok ..but the 3rd party stuff I have yet is still a bit warm to the touch
also about 12 months or less ...again the 15 months stuff (4 3rd party adapters) are GONE the hottest of the lot to
the touch



so......likely figure a year max then if still mining swap them All out with NEW would probably be the wise thing to do

and as you said if like a Corsair ax 1200i platinum helps high end psu's .....well I guess it could not all be luck Smiley

but 15 months was just probably to damn long to have them 'not' degrade as a y adapter of any kind knc or 3rd party

anyway always remember I know zip ...just tossing this stuff out there Smiley
full member
Activity: 159
Merit: 108
@ Searing,  Yes but you have been running high end PSUs with your Titans the whole time. I think that makes all the difference.

   
copper member
Activity: 2898
Merit: 1465
Clueless!
Meantime I got these pictures in. This is a Titan with both jumpers in place:



I know this had all cores running at full blast because you can't set a power plug on fire like this...



With anything less than a full component of hashing power. :-)

Side note: Do you really need to run at full power? 275mhz isn't that bad....


well I got dies that 'won't hash' on anything less then 325 and 0.0366 now ..then again I have been running them 24/7 for 15 months that way (1st titan)
because till you guys came along..that is what KNC told me....ie you have 2 dead dies ...er too bad ...with our new firmware mod set all our stuff on adv page
to 325 and reboot...viola you are getting 306.1mh yea....unfortunately now that we have gotten you to 300mh and above we WON'T RMA your dead dies for
2 reasons 1) no rma's on cubes with only 1 dead die 2) no rma's on titans that get over 300mh

evil geniuses indeed

anyway mine have run so long and so hot on the 'dark side' not sure wtf to do...again did try to set some down....the cubes did not like that one lick...put them
back at 325 and 0.0366 regular settings no issues

so I don't do electronics..I assume all the above is just due to me having dumb luck?


edit: although me reading the above posts I should probably be more on the ball with connectors....all but 4 of mine now have been up at least 12 months or so (all knc)
the 3rd party orig titan 4 y adapters..i replaced with NEW KNC Y adapters...due to getting too warm imho ..thus the replacement with the 4 knc new connectors I had laying about
unused from when KNC finally got around to sending me some 4 months later..so will watch and stay ahead of the curve for now...and plan for replacing such...but again
cool so far and no browning on the cubes or the y adapters....looked.......just another thing to watch for as these 'evil' machines' try to brick themselves ...I mean we are talking 15 months here on what I expected to last until (with difficulty if not bricking themselves) best case at my most optimistic end of life was July 2015!

Every day's a 'gift' I guess Smiley Thanks for the info.



full member
Activity: 133
Merit: 100
I think that the connector issue is directly related to heat.  Once you start heating up resistance goes sky high and so do the plugs temp! 

you should (c) your .02 cents line.  Literally made me LOL

Thanks,

Boomin
full member
Activity: 159
Merit: 108


Side note: Do you really need to run at full power? 275mhz isn't that bad....

What do you mean "full power" .... -0.0366v stock setting?
If so, 275mhz at that voltage would cause no harm, extremely inefficient but harmless.
Its running -0.0366v or HIGHER at 325mhz on all dies in a single cube which is dangerously close to meltdown of those connectors.

In my experience (17 months running multiple Titans) I have found the burnt connector issue to be a result of a reduction of connectivity more than what speed you are running it the cube at. As some of these connections have been in service for months, or now over a year running 24/7 as the connection looses connectivity it increases in temperature even though the chip is running at its normal operation temperature at 300 Mhz. As the temperature of the connection increases it looses more connectivity taking more and more watts from the PSU to deliver the required power through the bad connection. I have recorded temperatures at over 200 degrees from a "smoking" connection while the cube was hashing fine with normal temperatures. My recommendation is to use only High end PSU preferably 80+ Platinum Rated, and run them at no more than 80% Loading. I have several Rosewill 1000W 80+Platinum PSUs (not really best in class, not fully modular which disappointed me at first but after the connection issues, I have now enjoyed 50% less connections per power wire with them being hard wired at the PSU) running 2 cubes each, so at 65% to 70% of PSU capacity. These 1000 watt 80+ Platinum rated running at 650 to 700 watts run so cool that the fans are not even spinning, while my 80+ Gold and Silver PSUs have ran their fans into the ground at 70 to 85% loads melting connections, and literally catching on fire. You think I am kidding right, exaggerating, being dramatic, No really I mean catching on fire. I had a Gold rated 1200 Watt PSU attached to 3 cubes running at 300 Mhz so a load of 900 to 1000 watts or about a 80 to 85% load. See photo below.

http://postimg.org/gallery/1xkfxyyei/

Obviously I don't put as much loan on the PSU as I did in the beginning. Go Platinum run at less than 80% Load. If you must go Gold run at 70% or less, and if you run Silver run at 60% or Less, if you are going to run Bronze or less I guess you will need to learn for yourself and won't listen to me.  I wish someone would have said to me " You just spent $$$$ on equipment that is going to run 24/7, don't skimp on the PSUs! Just get the best Platinum rated PSUs that you can, and run them as close down to 50% to 60% load as you can. If I would have used Platinum rated PSUs at a 70% or less load to start with over last 17 months just in energy savings with equipment at a 10% greater efficiency I would have saved over $2,000.  

Just my .005769 LTC    
legendary
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
.0336v, 325mhz is a tad bit... much.

Temps that cause the power supplies to go over 85c

Stuff like that.

There's a pretty big power difference between 250-275 and 325. Bit less power seems to result in less problems.

C

Quote
What do you mean "full power" .... -0.0366v stock setting?
If so, 275mhz at that voltage would cause no harm, extremely inefficient but harmless.
Its running -0.0366v or HIGHER at 325mhz on all dies in a single cube which is dangerously close to meltdown of those connectors.
Pages:
Jump to: