Pages:
Author

Topic: Hacking BFL Monarchs and servicing them while times are weird. - page 7. (Read 21259 times)

newbie
Activity: 8
Merit: 0
are you still repairing monarchs?
legendary
Activity: 3164
Merit: 2258
I fix broken miners. And make holes in teeth :-)
Yes, I have it although I'll have to re-zip. No biggie. Technically it was built off shareware license, thus I don't have a problem putting it somewhere. PM me with a box ID, I can also send over the FTDI config file from one of the units I have here, get FT-Prog.

C
full member
Activity: 431
Merit: 105
HI guys.


Does anyone have a compiled win7x64 bfgminer for bfl monarch pci?

usb works fine, cannot seem to even after drivers let any bfgminer recognize the thing or spit errors..

thanks again guys.
hero member
Activity: 650
Merit: 500
Pick and place? I need more coffee.
Does anyone have access to the firmware or AVR project files for these?  Would love to make some changes to the firmware.
sr. member
Activity: 345
Merit: 250
Will do, thanks.
hero member
Activity: 650
Merit: 500
Pick and place? I need more coffee.
I have a question...

Thanks to my countries magic power issues one of my boards got damaged. At a quick glance it seems that the IC for the fan control took the most damage. I have ordered more to replace the damaged on and the secondary as a precaution. However, even with this in mind it has issues while I'm trying to mine with it.

So the question is:
Could the blown chip be the source of the errors while testing mining or is there another problem that someone could be aware of if this has happened to them before/as well. If not I guess the best would be to replace the IC then test again. could end up a long process...

Lastly, I will have spares of these IC's as you can only order batches of 10. If anyone needs them feel free to ask.

Just PM lightfoot.  I did and he sent me a TON of info about replacing mosfets.  One of my cards blew a few due to a storm induced blackout. Tongue
sr. member
Activity: 345
Merit: 250
I have a question...

Thanks to my countries magic power issues one of my boards got damaged. At a quick glance it seems that the IC for the fan control took the most damage. I have ordered more to replace the damaged on and the secondary as a precaution. However, even with this in mind it has issues while I'm trying to mine with it.

So the question is:
Could the blown chip be the source of the errors while testing mining or is there another problem that someone could be aware of if this has happened to them before/as well. If not I guess the best would be to replace the IC then test again. could end up a long process...

Lastly, I will have spares of these IC's as you can only order batches of 10. If anyone needs them feel free to ask.
newbie
Activity: 37
Merit: 0
My 2 monarchs have been doing awesome pumping out 1.4 Th/s for over 3 months without any problems, and if you're having any "failed to find work for queue results" it's because you're using the wrong app. Get the one directly from BFL, a custom build for monarch, which will maximize your hash rate and get rid of that error, you can get it here: http://www.butterflylabs.com/drivers/bfgminer/
member
Activity: 89
Merit: 11
Wow... so the monarchs that did make it to light.... are on the edge of exploding.

Not all of them. My two Monarchs have soon been running for three months with no leaks, no overheating and zero problems. The only "problem" is slight dust accumulation between the fans and the radiators but I can solve that problem in no time.

So I feel good and bad at the same time after mining happily and profitably evil coins (Paycoin) with evil hardware.  Grin
hero member
Activity: 924
Merit: 1000
Thank you. I did what I could, since the FTC receiver wasn't doing....

WTF dude?

What sort of kool-aid are you on?

The FTC action didn't stop RMA processes or refunds read the court documents.

BFL did nothing to rectify issues even when they had the opportunity during the FTC receivership.

Let us see how many RMA's get done now... I suspect few. Considering they failed to RMA many in the past which is well documented.

Refunds and RMAs something that could be happening right now, could have been happening in the past, and should be continuing right now.

I wonder why they aren't?

---

Here is a thought stop propping up BFL and their crap.


legendary
Activity: 3164
Merit: 2258
I fix broken miners. And make holes in teeth :-)
Thank you. I did what I could, since the FTC receiver wasn't doing.... well much of anything. It was the best I could do under the circumstances.

That said, I just heard that BFL is sending out RMA responses and such, so as a result I am out of business. Hurray!

Now I get to think about what I want to do next. Not sure, need to think about that. Maybe the question is how can I help the home bitcoin mining community?
hero member
Activity: 728
Merit: 500
This is a great service you are doing for everyone. Kudo's.
legendary
Activity: 3164
Merit: 2258
I fix broken miners. And make holes in teeth :-)
Well, the 300gh unit is back up and running, but with a slightly higher error count than normal. 15%, normally I see 2-5%. It's also a tiny bit slower, 280 or so instead of 300. Most of the errors are on the front chip but it's purring along and mining for the owner.

The two 700's are a different pair: One of them is running with one of it's two chips normal, the other 100% errors. Checking the FETs I see the voltage is a lot lower on that side; either the whole chip is toast or all of the FETs on that side are damaged. Or a *low* side FET failed, but that would create a hell of a hot spot. I may hook it up to the scope this weekend to see if the gates are all smooth or if one channel is odd (pointing to an intermittent FET).

The second one is dead. Removing the FETs I found that the PCB under the FET itself was burned in two places on both sides. Nothing spectacular, just a <1mm pad had vaporized under the FET. Result is I can't put a new one on, I did a quick try swapping in 17506 FETs but they can't hold 350gh a side without blowing up. They can hold 250, so I may rebuild this one as a 500gh unit.

Or I will have them RMAed with BFL. The water blocks on all of them share a common thread: Leaks or serious weirdness in the front block. I haven't see this on other units and the guy has other Monarchs that are fine. This miht just be a case where a bad batch of cooling blocks leaked over time, which then caused the chips to overheat with spectacular power draws and interesting results.

The 300 held because when it heated up the damage was minor. On the 700's one knocked out a side, and the other ran till the FETs failed hard. Not unbelievable.

Interesting. Well, I'll try replacing the FETs on the left side this weekend to bring the one unit back to normal, and will then try lower speed FETs on the other one and drop the voltage to the point where it will run at 500 or so. That should do it.

In any event since I am experimenting at this point on my own time, flag, and dollar I will stand behind my work personally: If I have to donate a SP20 (same 1.7th) to make things right I'll do that.
donator
Activity: 4760
Merit: 4323
Leading Crypto Sports Betting & Casino Platform
*Nod*. What kind of symptoms do you see with the overheat? Can you post a picture of the one that's doing it?

Actually you still have that temp sensor thing. Check on the back of the unit in that square of space behind each chip and tell me what the temp is. That's the temp of the back of the chip coming through the board and should be pretty accurate. Also check the temp on the top of the radiator manifold, point it at the inlet hose and the return hose, that tells us the radiator efficiency.

I'll do the same once I wake up and get some coffee into me.

Everything is in line with my other Monarchs temperature reading wise using the gun.  I just had to point an extra house fan at it and move it to a cooler spot and it's not throttling anymore.  It ran as cool as the others before I noticed the "bubble" on the radiator (looks like swelled paint over a leak).  I imagine it lost a little bit of water and that's causing it to run about 6 degrees hotter according to cgminer.  The two Monarchs showing these symptoms weren't sitting level, for airflow reasons.  I suspect that Monarchs need to be run while sitting level, or else it will cause the radiators to leak over time.  I've got them sitting level now, so I am curious to see whether the leaking will stop and the problem will get worse or not (they don't appear to be leaking)...  My advice, keep these things sitting level.

Update: 1 of the leaks got worse.  Had to be taken offline.  Sad

Update: The other one is also now offline.  Sad
legendary
Activity: 3164
Merit: 2258
I fix broken miners. And make holes in teeth :-)
Perhaps. I've run into two leaking units so far but when I checked one of them out under a loupe I saw that the pump housing itself inside of the little lid had a tiny hair crack in it (you could see the bubbles coming out). There's a pic a few pages back. I tried contacting the Cool Air people, but didn't get much of a response.

Good question. Does anyone know if the CPU hot-rodders run into leaking water systems? I *do* remember that Apple did it on one of their systems, then stopped doing it due to leaks. G5's if I recall. Hm.

Cooling is fun: One problem is you need to have a temperature differential as well as some sort of fluid flow in order to cool anything. 70 degree air flowing across a heat sink on a 200F chip will cool far more than 100F air going over the same sink. It's not a 1-1 linear thing, it's a logarithmic thing.

I've talked to people who ran them in >100F rooms and were wondering why they were temperature-faulting. Heat's a bitch, no doubt and these miners are generating amazing levels of it in very small spaces. If the room is warm then less heat will transfer which results in hotter fluid. Heat the fluid too hot and it will expand. Expands too much and stuff will get interesting (though I would expect the pump to plate rubber gasket to leak before the radiator blows up). Might be worth a test with a blowtorch on one of my old sedion water coolers from the Chili days....

Another issue is air/water flow rates. Fans and pumps are weird, they too move geometrically more air the faster they rotate. Which means if you lower the voltage and fan speed by a bit, you get a lot less airflow (swept area times pitch and stuff like that). Bring the fan speed up too high and you will have cavitation issues as the fan blades literally "stall" in the air.

And then it's always better to pull a fluid (air is a fluid) through a radiator instead of pushing it through. When you push you create little vortices all over the place that impede the flow of heat from the radiator/heat sink into the air.

I remember when I was running water cooling on my 8 chip jally I was puzzled that when I ran the Corsair water pump at full speed the unit got *hotter* by a good bit. Checking the temps I saw that water was flowing too quickly over the heat sink in the pump, and wasn't picking up the heat. Slowing it to medium speed worked best, slow speed didn't work as well (but was still better than fast).

One thing I did see: When I ran two Monarchs (mine and another person's) on a single 750 watt supply (pulling 800 or so watts. Oh well), they were both running much hotter than they were when I ran each one on it's own 500 watt supply. Checking I noticed the front fan air flow was a lot less, and the radiator were hotter. I checked the voltage at the supply and saw that it was 11.2 volts instead of 12.

*That* is interesting. Technically you can run a Monarch at <11 volts but the fans and pumps will now run a lot slower. Lower RPMs on the fans means a lot less airflow. It's possible that's part of the problem.

I haven't been thinking much about this, as my main thought was on keeping the FETs cool. I'll fiddle with this a bit over the weekend. Hm....
hero member
Activity: 798
Merit: 531
Crypto is King.
Wow... so the monarchs that did make it to light.... are on the edge of exploding.
legendary
Activity: 3164
Merit: 2258
I fix broken miners. And make holes in teeth :-)
*Nod*. What kind of symptoms do you see with the overheat? Can you post a picture of the one that's doing it?

Actually you still have that temp sensor thing. Check on the back of the unit in that square of space behind each chip and tell me what the temp is. That's the temp of the back of the chip coming through the board and should be pretty accurate. Also check the temp on the top of the radiator manifold, point it at the inlet hose and the return hose, that tells us the radiator efficiency.

I'll do the same once I wake up and get some coffee into me.
donator
Activity: 4760
Merit: 4323
Leading Crypto Sports Betting & Casino Platform
I've got 2 Monarchs with the radiators bulging on the side of their tops. One of them has started having overheating issues as a result... I'm letting them run for now, but thinking a custom radiator solution might be the easiest way to handle it, unless BFL actually starts processing RMAs again.
legendary
Activity: 3164
Merit: 2258
I fix broken miners. And make holes in teeth :-)
Quick update: Fixed a few more Monarchs, that's good. When a FET shorts, it seems to immediately disrupt the 6 phase power supply output and shuts down the board so it doesn't burn anything badly. Then since it is shorted, a power supply will simply crowbar on startup. Makes sense.

I've got a few more coming in this weekend for repairs, so I should be busy early next week. If they have the FETs shorted it should be an easy fix to swap them out and get them back on the road again.

I just found out that it looks like BFL is no longer under receivership, which means that they will be able to get back to doing RMAs and shipping again. Good.

I'll have to think about what this means in the long run, but in the meantime the Monarchs that are in the shipping pipeline to be repaired will be fixed under the current "donate the money to a food bank" program. Maybe I'll extend it till next week while they get on their feet or something, I'm sure they are probably busy, so if you have a problematic Monarch that needs repair let me know and I'll take a look at it.
legendary
Activity: 3164
Merit: 2258
I fix broken miners. And make holes in teeth :-)
Well, after getting some parts and working with the oscilliscope I think I can see what the problem is with the left side of the now-air-cooled Monarch. It was burning FETs, but what was weird is that the *bottom* side FETs were being burned. As in smoke. But it still could put out power to the hashing engines. Weird.

So I took off the choke to isolate it from the other supplies thinking they were backfeeding or something. Still boom. Pulled the top side FET and the bottom ones, they were not shorted. Put them back in, smoke.

Then I realized what was going on: I pulled the FETs again, cranked out the scope and looked at the signals on the gates of the FETs on the other channels. With the choke out the left side was in "hiccup" mode, where it sends a brief pulse to the FETs and looks for the current across the chokes. With one choke out you wouldn't see the current, so the system would stay in the "pulse the FETs each second". The low side and high side FETs for channels 1,2,4,5,6 were all reasonable, hiccup, hiccup.

The low side of channel 3 was also ok. However the high side was locked on. Solid 5 volts.

BINGO. That was it. The high side FET was always conducting power to the rail, so when the low side FETs closed they closed into a short, then opened from a short. The top side FET was always closed, so switching losses were none. The low side was switching, and thus burning up.

Reason? The FET driver for channel 3 was where the leak hit. It was soaked, shorted, and now was holding the high side on. Yep, that will do it.

The problem is those drivers are QFN packages. Which means that they have to be perfectly flat against the board, and a hair's difference in orientation is enough for them to fail. They totally suck and I hate placing the bigger ones which I can align using sight. These are tiny things, only 8 pads total.

But at least I know what's going on. And I can make a guess that if someone has a Monarch that had a leaking water block and is having overheats in the FET areas, they might not need to swap the FETs, but the associated FET channel driver.

I'll try swapping it over the weekend. Ug, I hate QFN. And with this much copper it's going to be FUN to preheat that region enough that I can get surface tension to mount those chips. I hat QFN, I really, really do.

But if I do manage to place it, the side should come back up. We'll see.

Pages:
Jump to: