Pages:
Author

Topic: Hacking BFL Monarchs and servicing them while times are weird. - page 8. (Read 21276 times)

legendary
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
I'll try to take one over the weekend. The unit is back together and hashing.

Very interesting tidbit: It runs a bit hotter/faster than mine (730gh vs. 700) and I noticed it cut out while mining. When it did I heard the "USB device disconnect and reconnect" sound which was very unusual.

Researched: I'm running both miners on Corsair CX500 supplies, which are being pushed to the edge. Apparently the extra draw on this unit is just pushing it over on the CX, which causes a power sag after a few hours.

It's on a CX750M now, seems to be happier. I also noticed the fans are running a bit faster, so it was probably pulling the 12 volt rail below 11 volts, which means "on the edge dude". So if you have a Monarch on a CX500 class power supply and it's dropping off after a few hours try a bigger power supply.

Back to work. And back to DigiKey for an order of the low side FETs for this other board. Might also throw in a FET driver as well. Still not sure why it is back-feeding power through the low side FETs on channel 3 only, really really weird.

C
donator
Activity: 4760
Merit: 4323
Leading Crypto Sports Betting & Casino Platform
So if you have a Monarch that isn't running, check the right side water block. It might be shorting a power line.

Pics?
legendary
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
Second one just came in from England. Very interesting, didn't work on arrival, would power up but did not hash. Running Chiliflash on it to do a ZCX showed the board as running but with 0 chips.

Got it in, powered up. Same situation. Checked the voltages at the FET chokes: .6 volts, solid as a rock. Chips were cold on the back. So something was wrong.

Removed the back heat sink and noted the insulator had slipped a bit. Put it back on with the same old heat sink compound, fired the board up. All engines showed up, 750gh potential.

Looks like what happened is the heat sink shifted slightly or something and was shorting out the 1.2 volt power supply to the Monarch chips. Like the Singles, the Monarchs appear to have at least two power supplies, one for the hashing engines (the big .6 volt ones) and one for the hotel circuits since you can't do signal switching on .6v.

Anyway, fired it up and it started hashing but the back of the board under the monarch chip started to get hot. Bad thermal coupling. Pulled the water block again, cleaned the chip and block with 95% isopropyl and then the 1/2 cleaning stuff for heat sinks from Radio shack and put on a thin later of AS5. Put block on, GENTLY screwed down the screws in a cris-cross pattern to even out the torque, and fired it up. 30c on each side, got the thermal interconnection.

Seems to run, ran for 15 minutes at 700gh. Will once again screw with it some more tomorrow.

So if you have a Monarch that isn't running, check the right side water block. It might be shorting a power line.

Good luck and mine like hell.
legendary
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
Update: Two Monarch have come in for servicing under the feed a homeless person program. And it looks like people will be fed this T-day. Good.

The first one was a 500gh Monarch that had the coolant leak out of it and was shorting out power supplies. Blown FETs as discussed. I've tried to put on new little metal ones, but I am running into three problems:

1) These boards are *heavy*. As in so much copper that they literally dissipate heat from my air tools faster than I can put it down. Even with a full 380F of pre-heat with a 15 minute preheat time *and* pulling off the back heat sink *and* using Kesterel liquid flux *AND* running my heat want at 425C I can barely get the metal FETs off.

2) I haven't figured out quite how to put them back on; the problem is the gate and source pins are *under* the FET and heating through those FETs with this much heat isn't the best of ideas. I might need a bigger preheater.

3) Did I mention these boards are heavy?

So I did what I usually do: I tried something else. BFL left on the old T-MAX pins for the bigger more traditional FETs, so I went to the Digi-Key cupboard and gave some a try.

I used two types of high frequency FETs on the Jalapenos: 052NE3LS types and CSD17506 types. The 17507s that were typically used had high gate capacitance, and running them in parallel was kind of a bad idea. 17506's and 052's have much lower capacitance, with the 052's trading some gate values for more power handling.

Bad idea here: Those Intersil drivers are running the FETs at way higher frequencies for power balancig. Gate float, blew the 052's. Boom.

Next up: 17506's. Put them on and even though they have lower on current max values they spend most of their time in transition switching and can get in and out of the death zone much faster. They don't even get *WARM* for Christ's sake.

So running 6 of them on the right side chip is giving me 275gh at 28c. Very cool, very smooth, running well. The left side is still shorted from the water damage, need to work on that a bit more to see what's up.

Now you may be thinking "How is he cooling the chip with a broken water block". Well...

BWAHAHAHAHAHA!!!!!!!!!



Yep. That's an old Single/50 heat sink on a Monarch with a Single fan on a stand-off. I actually used the little plate on the back as a gauge, went into the shed, and used the drill press to drill four holes around the edge, then tapped them with the 3mm tap. Then mounted it to the chip with AS5 heat sink compound and screwed it in the back with screws using the little springs to maintain tension without cracking the chip. The other side doesn't need a sink because the power FETs are not working (and three are removed).

Yep. It works. The back of the board behind the chip is reading a bit warm at 50c, but the heat sink is also reading 50c which means it's transferring heat optimally. And oddly enough it works, I haven't run it for more than 30 minutes but it is quite thermally stable.

I'm going to give Cool-IT another few days to respond, then try fixing the leak by potting the water block inside the housing. That should do it since the pressures are low, but it's water, so who knows.

More later. Moral: They can be fixed. Now you know how.
hero member
Activity: 532
Merit: 500
Ok. Well I got the new FETs in and decided to take the old ones off this board. Symptom was a .3 ohm resistance on the 12 volt line instead of the normal 300 or so ohms.

Started pulling FETs, then found they were not coming off. Fuck these things are small. Too small for my normal picker, I'm going to need a special nozzle. Then I realized my pre-heater was broken.

*grumble* Took pre-heater apart, wire had broken at the phenolic junction between normal wires and the nichrome type wire in the heater. Fixed it, back in business.

Here's a little thought if you think you can burn these FETs off with just hot air: Forget it. In order to remove them without damaging the board, you have to take the board to 380F pre-heat, then sit on them at 450c air for *30* seconds each. Say what you want about the board, but man does it pull HEAT away from these FETs.

Started pulling, did the left 6 and no change to the resistance. 7-10 same thing, started feeling really grumpt because I have to PUT THESE BACK! Then I did #11. Instantly resistance went to 300 ohms. And the underside of the FET was bad. Looks like I found the bad one.

Now to let things cool, then put on new fets tomorrow. I'm going to need flux for this one, these are going to be hell to solder back on. But do-able, and I have now proven that FET shorts are what shut down power supplies.

On ward. We're getting there. By the way if anyone else wants to follow along give it a go!

C
If you want to invest a little in equipment, I'd recommend a thermal imager. They're incredible tools for debugging these kinds of problems since they can detect <1C temperature differences. Solder on a couple wires and use a current limited power supply, and you can see quite quickly which fet is the one that's shorted.
Along these lines, if you have a constant current power supply and apply ~2-3v to the PCIe pins you can see the shorted FET glow under IR, saving the time and trouble of removing good FETs.
legendary
Activity: 1274
Merit: 1004
Might be a good idea, although I just burned my finger on a FET. On the positive side I have .598 volts on the chip now, however at least one of the six FETs is not placed right (sparked due to improper solder joint on the inside pins) and another one was drawing more current than it should.

Still, power is back on one side, we have control. I'll put this to bed now and look at it tomorrow, but these FETs are way more difficult to reflow solder than TO series parts. I might have to flux both the board and the fet pins on the inside to get enough heat transfer, and even that might not be enough.

I might need a full blast IR preheater that can take the board to molten Pb free temps. Time to check Ebay, anyone else got a good recommendation for a rework heater/reflow unit?

These are seriously high heat transfer components. Oh well, what better way to learn?
I use one of these, and it works quite well especially given the price.
http://www.circuitspecialists.com/bk7050.html
It even has a snazzy little adjustable holder for the air wand, so you can warm it up, set up the wand and then just hold the tweezers when it's time to pull it off.
legendary
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
Might be a good idea, although I just burned my finger on a FET. On the positive side I have .598 volts on the chip now, however at least one of the six FETs is not placed right (sparked due to improper solder joint on the inside pins) and another one was drawing more current than it should.

Still, power is back on one side, we have control. I'll put this to bed now and look at it tomorrow, but these FETs are way more difficult to reflow solder than TO series parts. I might have to flux both the board and the fet pins on the inside to get enough heat transfer, and even that might not be enough.

I might need a full blast IR preheater that can take the board to molten Pb free temps. Time to check Ebay, anyone else got a good recommendation for a rework heater/reflow unit?

These are seriously high heat transfer components. Oh well, what better way to learn?
legendary
Activity: 1274
Merit: 1004
Ok. Well I got the new FETs in and decided to take the old ones off this board. Symptom was a .3 ohm resistance on the 12 volt line instead of the normal 300 or so ohms.

Started pulling FETs, then found they were not coming off. Fuck these things are small. Too small for my normal picker, I'm going to need a special nozzle. Then I realized my pre-heater was broken.

*grumble* Took pre-heater apart, wire had broken at the phenolic junction between normal wires and the nichrome type wire in the heater. Fixed it, back in business.

Here's a little thought if you think you can burn these FETs off with just hot air: Forget it. In order to remove them without damaging the board, you have to take the board to 380F pre-heat, then sit on them at 450c air for *30* seconds each. Say what you want about the board, but man does it pull HEAT away from these FETs.

Started pulling, did the left 6 and no change to the resistance. 7-10 same thing, started feeling really grumpt because I have to PUT THESE BACK! Then I did #11. Instantly resistance went to 300 ohms. And the underside of the FET was bad. Looks like I found the bad one.

Now to let things cool, then put on new fets tomorrow. I'm going to need flux for this one, these are going to be hell to solder back on. But do-able, and I have now proven that FET shorts are what shut down power supplies.

On ward. We're getting there. By the way if anyone else wants to follow along give it a go!

C
If you want to invest a little in equipment, I'd recommend a thermal imager. They're incredible tools for debugging these kinds of problems since they can detect <1C temperature differences. Solder on a couple wires and use a current limited power supply, and you can see quite quickly which fet is the one that's shorted.
legendary
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
Ok. Well I got the new FETs in and decided to take the old ones off this board. Symptom was a .3 ohm resistance on the 12 volt line instead of the normal 300 or so ohms.

Started pulling FETs, then found they were not coming off. Fuck these things are small. Too small for my normal picker, I'm going to need a special nozzle. Then I realized my pre-heater was broken.

*grumble* Took pre-heater apart, wire had broken at the phenolic junction between normal wires and the nichrome type wire in the heater. Fixed it, back in business.

Here's a little thought if you think you can burn these FETs off with just hot air: Forget it. In order to remove them without damaging the board, you have to take the board to 380F pre-heat, then sit on them at 450c air for *30* seconds each. Say what you want about the board, but man does it pull HEAT away from these FETs.

Started pulling, did the left 6 and no change to the resistance. 7-10 same thing, started feeling really grumpt because I have to PUT THESE BACK! Then I did #11. Instantly resistance went to 300 ohms. And the underside of the FET was bad. Looks like I found the bad one.

Now to let things cool, then put on new fets tomorrow. I'm going to need flux for this one, these are going to be hell to solder back on. But do-able, and I have now proven that FET shorts are what shut down power supplies.

On ward. We're getting there. By the way if anyone else wants to follow along give it a go!

C
member
Activity: 89
Merit: 11
That's very interesting. So you didn't try to program it, but you hooked up the unit. Did you try reading the code? Did you get anything at all?

The reason it's el-interesting-o is that there's a little user page for data and the official page for code and all that stuff on the Atmels. You might have cleared the data field when you tried to read the code. Now, normally that would spell fuck-ola, but what if BFL used the same code load and put the "run at lower speeds" commands in the data field. By clearing it you got a default monarch (which I think has some sort of speed control like the Single/60's had, there is no way my Monarch comes up at 699gh every time)

Well, I got no firmware to program it with. Except for BFL FW from last year... I'm not entirely sure when the "reset/overclock" happened, but I think it was when I powered on the Dragon without doing anything. Or maybe when I read the chip info. Definitely the change happened before I tried to read the firmware.

And the firmware has the "security bit" set, so there's no way to read it with the Atmel Studio. Trying to read it just gives the security bit message.

Just power cycled the "550 GH" to try to get it to lower speed again but to no avail. At 700 GH indicated by cgminer the unit eats up 470W and my current psu for it is 500W, so I'm very close to danger zone again. I'd much rather have 650 GH @ 420W. But I did point more powerful fans towards the fets yesterday. The faster unit indicates 58-60 C and the slower one 50-52 C which  at least sounds fine. There could still be components crying in pain but I just wouldn't know about them.
legendary
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
Oh and put a nice strong fan in front of those FETs. They can drop heat both ways, take the heat off the top as well as the back and watch the back temps. Is it hashing faster?

Thinking about it, you may have found the way to speed up Monarchs. :-)

C
legendary
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
And today I gathered all my courage and tried it on my 550 GH Monarch. I was surprised to find the same Atmel controller on the Monarch that the Jallies had. And when I powered up the Dragon the Monarch somehow reset itself and overclocked itself from 650 to 710 GH. It was a bit terrifying experience when cgminer started spewing errors when I browsed through the device programming pages on the Atmel Studio. But yeah, the Molly didn't brick itself as a revenge for trying to read the secured firmware.

So what's next? We're waiting for somebody to post the firmware source code to wikileaks?
That's very interesting. So you didn't try to program it, but you hooked up the unit. Did you try reading the code? Did you get anything at all?

The reason it's el-interesting-o is that there's a little user page for data and the official page for code and all that stuff on the Atmels. You might have cleared the data field when you tried to read the code. Now, normally that would spell fuck-ola, but what if BFL used the same code load and put the "run at lower speeds" commands in the data field. By clearing it you got a default monarch (which I think has some sort of speed control like the Single/60's had, there is no way my Monarch comes up at 699gh every time)

Hm. There is a way to change the clock speed. Which means they needed to store state. And the chipset on there does support dynamic setting of voltage points, that's in the Intersil documentation. How can I write commands out to the USB port using something as dumb as HYPERTERM or PUTTY? Is it possible to just connect to the USB serial port and say "ZCX", "ZAA", "ZAB", etc and just fuzz the damn thing until we trip over something?

legendary
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
Well, here's a Monarch pic for everyone.



This is a view inside the pump from a leaking Monarch that came in to me for someone. The water was leaking, and the Monarch is dead. Dead part is simple: I can see that the FETs on the left side are shorted, the gate to drain resistance is 0 (ie: short) while the gate to drain on the other one is 50 ohms (good gate).

So I took off the water block, opened it up and found it flooded. Cleaned it out, put water in the unit by taking off the right side hose, and that's what I saw.

Those bubbles are indicitive of a water leak. Looks to be right at the junction in the pump where the water pipe goes into the pump. I am not certain of course, but it looks like a defect in the pump.

Interesting. Anyone else got a monarch that's dripping? Is it coming from inside the pump housing? Which side?

C
member
Activity: 89
Merit: 11
Chiliflash. Basically it's a serial app that sends a ZCX and reads back status. I've never been able to talk to these things with a serial program like PUTTY, never know the baud rate or stop bits. Oh well, someday.

I haven't tried to download the code yet, if the ROMs are protected that won't work. I don't think the Atmel will brick if you just try to read, I'll give it a shot tomorrow after running for a day with no problems.

Haven't tried chiliflash, but yesterday I tried my AVR Dragon and Atmel Studio for the first time ever. I bought a Jalapeno again just for testing it and flashed in a 1.29 firmware to try it out.

And today I gathered all my courage and tried it on my 550 GH Monarch. I was surprised to find the same Atmel controller on the Monarch that the Jallies had. And when I powered up the Dragon the Monarch somehow reset itself and overclocked itself from 650 to 710 GH. It was a bit terrifying experience when cgminer started spewing errors when I browsed through the device programming pages on the Atmel Studio. But yeah, the Molly didn't brick itself as a revenge for trying to read the secured firmware.

So what's next? We're waiting for somebody to post the firmware source code to wikileaks?
hero member
Activity: 568
Merit: 500
Nice, I hope the bfgm shut-offs are over here, seeing how it goes so far I'm confident. 400, 425 and 475GH rated, doing well. Go 4.2.0 man



Must send the cap one back....must se....must sssss...
legendary
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
Yep. I will say that even after rebooting with their specific OS it works great. Which is mildly odd, but odd is odd. I really should just install 64 bit Windows 8 on my mining laptop, it's one of those "one of those days" things.

Meantime 700gh, solid as a rock now.

C
hero member
Activity: 568
Merit: 500
Lightfoot you really want a 64bit OS and BFL's custom build 4.2.0 BFGMiner, all the errors from the screenshots I posted are gone and speed is 10% up. I'm running win7pro 64bit on an amd3200+ 1GB ram laptop. Too bad my hp mini's hardware can't handle 64bit.

hero member
Activity: 658
Merit: 500
CCNA: There i fixed the internet.
Well, with BFL under this cloud of stupidity, I realized that there's nowhere to send RMAs. Given that I own a Monarch and have taken it apart a fair bit, I'll make this offer to the community:

If your Monarch shuts down, blows up, or ceases to hash post here and let me know. I'll send you my address, and I'll fix the thing for you. Right now what I can see failing is the FETs, which I can fix as I have pulled a lot of them on the 65mn gear.

I have the tools, I have the cred, and I want to help the people who have these units. Price is free or whatever you want to pay me for this. I'd rather help than scalp.

Post wherever, this is an open offer. Let's pull together and mine like hell.

Lightfoot


+1++ cred
legendary
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
Well, with BFL under this cloud of stupidity, I realized that there's nowhere to send RMAs. Given that I own a Monarch and have taken it apart a fair bit, I'll make this offer to the community:

If your Monarch shuts down, blows up, or ceases to hash post here and let me know. I'll send you my address, and I'll fix the thing for you. Right now what I can see failing is the FETs, which I can fix as I have pulled a lot of them on the 65mn gear.

I have the tools, I have the cred, and I want to help the people who have these units. Price is free or whatever you want to pay me for this. I'd rather help than scalp.

Post wherever, this is an open offer. Let's pull together and mine like hell.

Lightfoot
legendary
Activity: 2744
Merit: 1193
I don't believe in denial.
[...] The problem is the FETs. Specifically high-side. I noticed even with that big heat sink and fan on the back of the board plus the heat sinks on top the one side was running at 100c. That's hot. Putting a fan on the front side dropped it into the 60's, but it points out that if we go faster we need to dump heat. And fast. [...] A standard heat sink isn't going to cut it. Maybe a heat pipe system designed for memory chips or something. Does anyone make custom heat pipes or something like it? [...]
Maybe something like this?

https://bitcointalk.org/index.php?topic=405986.320

Look at the pictures where there's a heatsink on the FET's (both front [Gleb Gamow September 15, 2014, 05:37:37 PM] AND back [LittleD September 15, 2014, 05:04:42 PM] side examples there) with a fan mounted over it...
Pages:
Jump to: