Pages:
Author

Topic: My BFL SC Single 60 went out in a puff of smoke - what next? - page 2. (Read 5785 times)

legendary
Activity: 3388
Merit: 4775
diamond-handed zealot
What was the ambient/room temp?

Do you know what was the temp reported by CGminer also?

I think the ambient temperature was around 24 degrees C, temperature of the miner was 60 to 65 through the day. Cases were off, fans blowing in and down.
Cheers
Just to clarify, was the case open or completely off? Were the side fans on, how were they blowing?

C

yeah, I think we found the problem
legendary
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
What was the ambient/room temp?

Do you know what was the temp reported by CGminer also?

I think the ambient temperature was around 24 degrees C, temperature of the miner was 60 to 65 through the day. Cases were off, fans blowing in and down.
Cheers
Just to clarify, was the case open or completely off? Were the side fans on, how were they blowing?

C
full member
Activity: 237
Merit: 100
What was the ambient/room temp?

Do you know what was the temp reported by CGminer also?

I think the ambient temperature was around 24 degrees C, temperature of the miner was 60 to 65 through the day. Cases were off, fans blowing in and down.
Cheers
sr. member
Activity: 462
Merit: 250
What was the ambient/room temp?

Do you know what was the temp reported by CGminer also?
hero member
Activity: 924
Merit: 1000
The unit is on its way to cranky4u, hoping he can fix it. I decided not to return to BFL due to the time/cost for shipping and customs delays. My fingers are crossed.  Smiley

Wish you luck for a fast fix and back to happy mining.
full member
Activity: 237
Merit: 100
just wondering about the cooling for your 60g. was it in an aircon room ?
It was not air conditioned
full member
Activity: 237
Merit: 100
The unit is on its way to cranky4u, hoping he can fix it. I decided not to return to BFL due to the time/cost for shipping and customs delays. My fingers are crossed.  Smiley
legendary
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
I think there's probably a reason BFL now sell these a 50GH/s mining machines instead of 60GH/s ones.

https://products.butterflylabs.com/homepage/50-gh-s-bitcoin-miner.html

There is a lifetime warranty, I'd send it back.

There is far too much thermal paste or pads on those ASICs it looks like cake!


Mmmhm. I am not going to exceed 24-27gh on my jally single based on this. Can't wait to get another chip so I can see if the temps on the FETs go up linearally or geometrically with load.

The problem with the warranty though is that they should be able to replace your unit. And if it takes 2 weeks to do that, it's two weeks of not hashing at current difficulty, the result may be a 20% hit to your profits or more based on diff changes.

That could be fixed by BFL offering some sort of conditional cloud hashing during the time your unit is being fixed. Man that would be a public relations *WIN*! Say when you get an RMA you get put into a special hashing queue where your hashes are immediately escrowed while you send the unit in.

If you didn't mod it, and they fix it, they send it back and when you take delivery they stop the mining in their cloud and give the money to you. If you did mod it, they keep the hash money.

That is a no-lose solution, someone suggest it to BFL. In fact that would be the biggest PR win in this kind of community and it would not cost BFL much of anything to do.....
legendary
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
Just a side note, from what I am seeing the two 30gh sides of the 60gh unit may be independent. In other words, failing that burned side should not affect the right side's 1 volt supplies.

Or, remove those FETs, fire it up (with fuses in the power supply lines) and you should be able to restart hashing at 25-30gh on the remaining side. Half a loaf is better than none and all that. I'd say 75% probability of this working.

Good luck!
C
legendary
Activity: 3388
Merit: 4775
diamond-handed zealot
I milled the output plate of my 60 yesterday so it is just a thin frame around the outside to hold the case together.  It cut down the wooshing air noise and where before the input fan would run all the time now it throttles in and out, so it is definitely cooling more efficiently.

The thermal pads in the pics I have seen do look very thick, I may rework those in the future, also thinking about some thicker 92mm Noctua fans for on the heatsinks themselves might really cut the noise...but I hate the downtime to implement, lol.
legendary
Activity: 2128
Merit: 1002
just wondering about the cooling for your 60g. was it in an aircon room ?
hero member
Activity: 490
Merit: 500
I think there's probably a reason BFL now sell these a 50GH/s mining machines instead of 60GH/s ones.

https://products.butterflylabs.com/homepage/50-gh-s-bitcoin-miner.html

There is a lifetime warranty, I'd send it back.

There is far too much thermal paste or pads on those ASICs it looks like cake!

full member
Activity: 237
Merit: 100
Thanks everyone - I have a solder/hot air rework station but I agree it may be better to have someone more experienced look at it before I break something else.
I have got in contact with Cranky4u who has generously offered to have a look at it for me next week. Fingers crossed.
legendary
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
hope you're right lightfoot, I'm just going on my spotty success replacing the VRM FETS on videocards and motherboards; sometimes it works, sometimes it doesn't, and sometimes the FETs smoke instantly when you apply power.
Yup. Sometimes you're lucky and it just nuked the FET. However since the FET gates on low voltage stuff like this are rarely driven by optoisolators and isolated DC-DC's, it usually takes out the drive circuits as well. IGBTs in high voltage systems are typically driven that way, which firewalls the damage to the IGBT, the resistors between gate and collector, and the um... circuit that snoops collector to emitter voltage and sends the HOLY FUCK EC voltage is going through the roof! WE HAVE A SHORT CROWBAR GATE NOW! thing.

The latter is usually what saves that 50kw fire that results when one side of your IGBT shorts. The crowbar sees the voltage from C-E screaming up (which means high current draws), forces the gate down to ground and lights an optoisolator to light up the "you're fucked" light on the dash.

Then if you're a super-special person, you say "No I'm not", bypass the crowbar circuit, fire up, and send me a box that looks like someone used a flamethrower in it. Been there, dealt with that. :-)

Quote
I think OP should definitely get ahold of Cranky4u if he doesn't feel comfortable soldering it himself
Indeed. looking at the specs for these FETs I think they have a solder pad under them that is where the heat's supposed to go. Hot air is probably the best way (400c is what seems to work with my Aoyue unit) to float these off. Not hard, just need the right tools. But they're small, so a pair of soldering irons one on each end might work as well.

Never dull.

legendary
Activity: 3388
Merit: 4775
diamond-handed zealot
hope you're right lightfoot, I'm just going on my spotty success replacing the VRM FETS on videocards and motherboards; sometimes it works, sometimes it doesn't, and sometimes the FETs smoke instantly when you apply power.

I think OP should definitely get ahold of Cranky4u if he doesn't feel comfortable soldering it himself
legendary
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
So.... What to do? Note all of this is from a complete random stranger on the Internet dreaming about stuff, so disclaimer, disclaimer, don't do any of this in real life, you're responsible for your own universe, this is just me.

If *I* had a board like this....

Bit of knowledge about the BFL design: If the 60 single boards are like the little_single, then the power supplies for the FETs are connected together and connected to the 8 chips that run on the board. What I don't know is if the left side 8 chips are powered by the left two power converters (the FETs that went foom) independently of the right side 8 chips. They may all be on one bus, or two.

Update: Based on reviews of the SC30, it looks like the two sides run separate chip power converters. That is good and bad from a design stage, but it can explain how one can lose half their hashing power.

Anyway, what's the damage? Well, it could be limited to the FETs, I would say scrape off the wreckage, hope the Pads behind them are still good (test for shorts and opens to the other FET baks) and if there are no shorts, try firing it with a regulated 12 volt supply (without any mining software, just see if the chips come up with the remaining 3 supplies). If they do then either lobotomize 4 chips on that side to be safe by turning them off in software (making a 45gh unit tops) or if this thing has all the power supplied in parallel lobotomize six chips to keep the remaining three supplies from overloading (assuming they also have some damage). And make sure your power supply's 12 volt source has a *FUSE*, wire in a (12 watt*12=144 watt/12 volts=12 amp fuse *TOPS* since you blew things start with a much smaller fuse TOPS). If you blow the fuse without hashing, you have a board short, start digging.

That's the fast way. Sure it's 35-45gh instead of 60, but half a loaf is better than none, all that. Peter if you have SMD soldering skills and you were me, you might want to try taking off all six FETs, check for shorts on the 1 volt to ground and 12 volt to ground lines, put in the power supply fuse on your lines,  and see if anything works (you need to remove both sides because they might be back-feeding the oscillator)

What's the worst that could happen?

If you want to fix this whole power supply it's going to be harder. My experience is that this will blow the FETs and possibly the gate driver chips. I don't know if they have 1 gate driver powering all twelve chips on a side, or two sets of gate drivers. If the driver is blown and they have two drivers per side, the other side should still work and you can power 12 chips. If the sides are isolated and they have one set per side you can run 8. If the sides are not isolated and they use one honking driver to power everything then you're screwed till you replace the driver (a 2708 chip on the bigger controllers, look for something like that in the chip mainfest).

What's the worst that could happen?

If the FET pads are warped, you might not be able to get it going without bypassing the 1 volt lines or re-routing power from the 12 volt source supply. That's way beyond the source of my ability to help from the other side of the earth.

So anyway, a fast potential solution that I would use to get me something would be:

Remove the wreckage with a good pair of soldering irons. Don't go with insane amounts of heat; the carmelized board will conduct heat faster and melt more than normal.

Check for shorts

Try powering it up without hashing with a fuse in the supply.

Reprogram it to only run 8 chips instead of 16.

Try hashing.

Add a few more chips, do not exceed 11 of them if you want it to last.

Good luck. Time is of the essence since difficulty keeps going up and time waits for no man in Bitcoin world. Having 30gh running now is better than 60 in a month's time. Good luck and keep us posted.

And thank you for sharing this: It's given me a chance to think through a problem, which to me is always fun. If anyone else finds this useful and it makes them money do me a favor: Donate some of that to your local soup kitchen in the name of "Bob Dobbs". That way my rantings here make the world a bit better or something like that.

C
legendary
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
yup, VRM

question now is, did they fail because the ASIC failed and presented them with a bad load?  or did they take anything else out when they went?
Sharing my experience with a shorted chip, I can say it was probably not that.

Specifically as I add chips to my jally, the first chip I added I accidentally shorted some pins so +1 was connected to gnd. Happens when you move the chip while placing it hot. Anyway, the symptom (documented in the forum) was that the unit would "fast flash" on startup and I thought I was fucked. Removing the chip allowed the unit to start normally.

What seems to happen is that if +1 gets hard shorted, the board detects it (probably in hardware for speed) and the oscillator that gates the FETs shuts down. This immediately will collapse the voltage on the +1 rail to zero, and the board does it's "fast flash" dance. No damage. I think BTW this is why some people who take the heat sink off their jally to reprogram it get the fast flash; they torque down the damn heat sink too much and crush the chips into the board. Short develops, jally don't work no more. Solution would be to pull the chips and replace them, board is probably ok. But that's jally 101, we're in single 60 land.

Anyway, in this case it looks like running temps on the FETs have been high for some time. FETs don't share loads equally as temps go up, that's why on the better/bigger/more expensive electric car controllers you use a big 300-600amp IGBT instead of a bank of 20 30 amp FETs in parallel. but IGBTs cost large money and need more expensive drivers around their 2708's, but I digress. Moral is when a Curtis golf cart controller is used to drive a small car the FETs heat unevenly and the hot one is the first to short. When you short 600 amps at 150 volts, hilarity ensues as the other FETs blow up around it. :-)

I would guess that Q4 and Q(burned to a crisp) were heating up together, with Qcrisp finally failing shorted. That would probably also fail Q12 and provide a dead short from +12 to ground and blow the fuse in the power supply. End of story, board down.

What caused the total warping though on Qtoast is hooking up a bigger 12 volt supply. Since it's shorted, all current from the supply would shoot through Q4/Q12, and that would basically be a resistive load. It would heat up until either the silicon melted (which it did, letting out a lot of smoke) or the power supply fuse blew. The board couldn't stop it because the FETs were shorted, so the gates could not be opened under programatic control. Foom.

I've seen this happen on a 300 amp IGBT; the main pack fuse opened, but in the brief meantime there was about 90,000 watts of heat being generated in that IGBT. This is why they have big big big heat sinks. And why you have DC rated fuses, if someone was stupid enough to put an AC rated fuse in or bypass the fuse you would have a 90kw air heater in your aluminum box. Hilarity would... ensue...

What to do? I think there's a way forward, I'll post that next. Peter took the time to post this intel, I'll post a possible fix. Note I take no responsibility for *ANYTHING*, this is all pure speculation that I would never expect anyone to try in any way, shape, or form.

C
legendary
Activity: 1554
Merit: 1002
Build date 9-11.... no conspiracy theorists jumping on that one yet?  Grin

3rd image bottom right "fun pass" fun for who? BFL because of all the money?

anyhow it looks f***d XD and a bit of my ability XD that said it dosnt look like its dmaged anything else other than the 2 chips so it dosnt look like the voltage got any further meaning they did there jobs and failed (probably failed a bit to well and made a nice mess)
legendary
Activity: 1022
Merit: 1000
BitMinter
Build date 9-11.... no conspiracy theorists jumping on that one yet?  Grin

Saw that too. Suicide single of death Tongue
DrG
legendary
Activity: 2086
Merit: 1035
Build date 9-11.... no conspiracy theorists jumping on that one yet?  Grin
Pages:
Jump to: