Pages:
Author

Topic: Question to multi-BFL Single miners: temperature and throttling issues (Read 7076 times)

donator
Activity: 686
Merit: 519
It's for the children!
The Heat pipe is engineered to move heat from the first chip to the second chip and then dissipate that head though the radiator.  A very non-efficient design for cooling.

Since the whole heatsink is globbed on the chips with 10x to much thermal past and the in case design sucks heat away vs blowing cold air on cooling is highly inefficient. 

We saw a huge temperature drop and no throttling (except on newest 896 firmware) after some minor fixes.  I think to get the most out of the units (without spending $$$) and using the existing heat-sink this method has the largest net effect:

1: take the unit out of the case
2: use arctic silver thermal cleanser or rubbing alcohol and remove all of the existing thermal compound.  Replace it with an extremely thin film of arctic silver 5 or better.
3: flip the fans to blow cold air onto the heat-sink vs sucking hot air away
4: Ensure you have two 80MM fans one on top and one on bottom blowing air onto the heat-sinks
5: lay the units on their sides and schedule a layout so as to prevent cross heating
(Use a 19" data rack with a Pyle PFN41 19in fan at every shelf to keep fresh cool supplied to the units)

sr. member
Activity: 344
Merit: 250
Mine are stock, also, and I haven't tried taking them apart.

I'm pretty happy with them, except for the noise issue.  But that's not a deal breaker.

When they get replaced by their ASIC counterparts, hopefully the noise issue will be solved.
sr. member
Activity: 274
Merit: 250
Mine are rev 3 too, stock id say.
I can hear them over the sound of stock 6990, 5870 MATRIX and 5870 REV2 mining... sick!
I just discovered, that all is fine, except one tinny think, 6990 got exhausts in both ways, and "THE" sinle got hited by hot air...
Small invention, and now i`m waiting for results...
sr. member
Activity: 344
Merit: 250
When I first got my BFL Singles I had them running in an air conditioned room (though probably not 72°F).  With the 832 firmware, one of them would throttle.

It seems like the better of the two must have a more powerful fan, as I can feel stronger airflow coming from it, and it's temperature is a bit lower than the other one.

Anyway, I ended up settling on the 816 firmware and both of them run without throttling in an unairconditioned room (around 85°F) and consume about 150 watts together.

Code:
BFL 0:  58.9C         | 811.0/797.6Mh/s | A:2946 R:59 HW:0 U:11.20/m
BFL 1:  60.0C         | 811.0/798.7Mh/s | A:2911 R:68 HW:0 U:11.06/m

They are rev. 3 units, but they've also got external fans mounted on the bottom.  They're noisy, too.  I can hear them mining from the other side of the house.
sr. member
Activity: 274
Merit: 250
Hi i just have joined the "club" Smiley

I have got 3 of the singles. One of them slows down a lot.
Code:
 BFL 1:  34.4C         | 802.3/758.0Mh/s | A:8610 R:190 HW:  0 U: 10.28/m
Look at the average hashrate. After night it rised a lot. Before it was sommehink like 650.
I tested this with killo-wat, and i must say one think, it`s not working at all when fornt led is blinking.
Got 3 singles + ztex quad connected to kill-wat, and it shows 300W still, and 230 while throtling.
Look at the temp ! WTF ?
and 3 of them together
Code:
 [P]ool management [S]ettings [D]isplay options [Q]uit
 BFL 0:  43.7C         | 796.1/792.0Mh/s | A:9276 R:167 HW:  0 U: 11.03/m
 BFL 1:  36.8C         | 789.3/757.8Mh/s | A:8652 R:191 HW:  0 U: 10.28/m
 BFL 2:  53.5C         | 801.2/802.9Mh/s | A:9375 R:185 HW:  0 U: 11.14/m
After time spended on writing it, average speed slowled down....
hero member
Activity: 714
Merit: 504
^SEM img of Si wafer edge, scanned 2012-3-12.
Some pictures and info that may be relevant to people here: https://bitcointalksearch.org/topic/m.916922
donator
Activity: 919
Merit: 1000
I just tried getting rid of the "lumpy" thermal paste and smoothing out what was there, instant +10 mhps on a unit test of 100, and now it appears to be running better.  I'll update in the morning after it's worked all night.

If this is the case, I might as well take them all apart to clean and re-apply a better paste.

I did this for all my BFLS and in fact there was a second unit with dried up thermal grease. After applying fresh one and re-assembling, I found it working more stable.

Meanwhile I found the perfect setup for my batch: three are doing fine with the 864MHz firmware, the other two work stable at 800MHz. It is obviously very important to find the right FW for the devices individually so that they never throttle, as doing so greatly impairs their overall hashing rate. A throttling 832 settles at 710MH/s over a week, while the slightly lower clocked 800M (non throttling) one ends up at 788MH/s. The 864 one averages out at 851MH/s (all results running cgminer for a week).

The 12MH/s you loose from the nominal hash rate at any clock speed seems to be the idle time between delivering shares and getting new work fed. From the numbers this should take ~80ms from each 5.37s cycle it takes to complete work at 800MHz. Pretty sure there will be some improved FW some day that provides pre-fetching of work and caching of shares to prevent the FPGAs from idling. BFL Engineer?
legendary
Activity: 1400
Merit: 1005
I have all 10 of my singles (soon to be 4) lined up in a row, with a 6 inch fan blowing over the top of them.  Not much space between them, yet they still all stay below 60c.

They do NOT like to be stacked!  I tried that, all the ones on top got hot, up to 63c!
full member
Activity: 206
Merit: 100
Mostly Harmless...
I just tried getting rid of the "lumpy" thermal paste and smoothing out what was there, instant +10 mhps on a unit test of 100, and now it appears to be running better.  I'll update in the morning after it's worked all night.

If this is the case, I might as well take them all apart to clean and re-apply a better paste.
full member
Activity: 184
Merit: 100
Feel the coffee, be the coffee.
Solved! ....
I squeezed the blob manually and distributed it all over the FPGA with my finger to build a thin film, attached the heat-pipe back - et voilá, device is working at full speed with 815MH/s Smiley

BFL, does this void our warranty when we have to do such repairs? Are customers authorized to open the device in cases like this?

I hope not. I had a pin on the fan plug that was not clipped properly, so I opened it up, clicked the pin and voila !
hero member
Activity: 546
Merit: 500
I re-greased one yesterday with only minor improvement.
donator
Activity: 919
Merit: 1000
Didn't you (or someone else) initially mention that the heat pipe is glued on?
[...]
Yes, I guess that was me and that's your proof of what a noob I am when it comes to HW Wink

When I tried the first time to remove the heat-pipe, I pulled it vertically upwards until I got scared to rip off the chips from the board. With the strong adhesion my wrong guess was that it is glued. Reading the forums and getting wiser (well, kind off), this time I tried it with torque. You still need to be very careful, but as soon as it starts to move you can gently lift it off.

My previously throttling Single works now rock-solid. I did the ultimate test and let it run in the temp-chamber at 28°C (the value I am expecting to have during summer in my basement) and it did not throttle within 4h Smiley With that 'fix' you could maybe power up your GPU miner again and keep the BFLs working.
sr. member
Activity: 344
Merit: 250
When I mentioned replacing the thermal grease on my single that was throttling, they said I could try that and didn't mention the warranty at all.  There are no stickers in place that seal the enclosure. It's a good thing too for obvious reasons.  Quite a few of the singles have throttling issues it seems that require small repairs like the ones mentioned above.  Probably would be a good sticky for hardware or troubleshooting.  A guide on what to do with a throttling single.

You'd think they'd be catching issues like this during the burn-in testing.  I guess not.
sr. member
Activity: 448
Merit: 250
Solved!

Tl;dr: dried-up thermal grease was causing thermal resistance between heat-pipe and FPGAs.

First off, thanks to all for your hints and suggestions. For me they didn't work, but they might be helpful for others with related problems.

Then, I need to apologize for calling BFL customer support crap. It is true that none of my emails I sent after ordering my Singles wasn't answered, but that were the times when probably every second miner overrun them with orders or questions. In contrast to that, my request for assistance in this case I sent yesterday got responded within hours with very helpful instructions. Sorry again.

Back to the problem. After none of the proposed approaches to improve cooling worked, I made sure that it is not an ambient temperature issue by running the device in a temperature chamber at 0°C. Same effect with throttling every several minutes and a total hashrate of less than 700MH/s - a strong indication that there was a problem with thermal conductivity between heat-pipe and FPGAs. Assumption turned out right as soon as I removed the heat-pipe: it was evident that the surfaces had only partial contact, since at one FPGA there was a larger blob of dried-up thermal grease that was holding the heat-pipe back from settling down. As a result, one FPGA had almost no contact to the heat-pipe, the other only partially.

I speculate that the heat-pipe was initially placed with one push-pin not fully locked and the grease started to dry-up at the loose side. Later during QA the push-pin was fixed, but the grease was already too viscose to be squeezed out evenly. Or something completely different...

I squeezed the blob manually and distributed it all over the FPGA with my finger to build a thin film, attached the heat-pipe back - et voilá, device is working at full speed with 815MH/s Smiley

Was too busy and didn't take any pictures, but it is quite obvious when you remove the heat-pipe.


Didn't you (or someone else) initially mention that the heat pipe is glued on?
When removing any kind of heat sink, especially a glued-on one, from a BGA package, I always worry about some (non-optimally soldered) balls to come loose from the PCB or the chip itself...
I.e. maybe the adhesion between heat sink and chip package is stronger than the adhesion between solder ball and PCB, or stronger than the adhesion between solder ball and chip.
I would guess that in 99 out of 100 situations my concern is unwarranted, but then again that doesn't help me much if I run into the one case out of 100 where it is not.
hero member
Activity: 546
Merit: 500
When I mentioned replacing the thermal grease on my single that was throttling, they said I could try that and didn't mention the warranty at all.  There are no stickers in place that seal the enclosure. It's a good thing too for obvious reasons.  Quite a few of the singles have throttling issues it seems that require small repairs like the ones mentioned above.  Probably would be a good sticky for hardware or troubleshooting.  A guide on what to do with a throttling single.
hero member
Activity: 481
Merit: 500
Solved! ....
I squeezed the blob manually and distributed it all over the FPGA with my finger to build a thin film, attached the heat-pipe back - et voilá, device is working at full speed with 815MH/s Smiley

BFL, does this void our warranty when we have to do such repairs? Are customers authorized to open the device in cases like this?
donator
Activity: 919
Merit: 1000
Solved!

Tl;dr: dried-up thermal grease was causing thermal resistance between heat-pipe and FPGAs.

First off, thanks to all for your hints and suggestions. For me they didn't work, but they might be helpful for others with related problems.

Then, I need to apologize for calling BFL customer support crap. It is true that none of my emails I sent after ordering my Singles wasn't answered, but that were the times when probably every second miner overrun them with orders or questions. In contrast to that, my request for assistance in this case I sent yesterday got responded within hours with very helpful instructions. Sorry again.

Back to the problem. After none of the proposed approaches to improve cooling worked, I made sure that it is not an ambient temperature issue by running the device in a temperature chamber at 0°C. Same effect with throttling every several minutes and a total hashrate of less than 700MH/s - a strong indication that there was a problem with thermal conductivity between heat-pipe and FPGAs. Assumption turned out right as soon as I removed the heat-pipe: it was evident that the surfaces had only partial contact, since at one FPGA there was a larger blob of dried-up thermal grease that was holding the heat-pipe back from settling down. As a result, one FPGA had almost no contact to the heat-pipe, the other only partially.

I speculate that the heat-pipe was initially placed with one push-pin not fully locked and the grease started to dry-up at the loose side. Later during QA the push-pin was fixed, but the grease was already too viscose to be squeezed out evenly. Or something completely different...

I squeezed the blob manually and distributed it all over the FPGA with my finger to build a thin film, attached the heat-pipe back - et voilá, device is working at full speed with 815MH/s Smiley

Was too busy and didn't take any pictures, but it is quite obvious when you remove the heat-pipe.
full member
Activity: 227
Merit: 100
During throttle, the device does respond to temperature read, status read, etc.
A new job, however, cannot be issued, as the unit will respond with 'BUSY'.


Good Luck,
What I mean is that: if I see a temp (ZLX) of 49C is that the same number tested by the throttling firmware that decides to throttle the BFL?
So whatever the first throttling temperate is inside a BFL (when it first drops below 832MH/s), I should see (ZLX) close to that "throttling temperature" just before it decides to throttle?

This is true. Should your unit for instance throttle at 67C, then that is usually the throttle threshold of your unit. Of course, the unit must be
in the same condition as it was the first time (enclosure closed/open, number of fans active, etc). Should one factor change, the throttle
temperature will change with it.


Good Luck,
BF Labs Inc.
legendary
Activity: 4634
Merit: 1851
Linux since 1997 RedHat 4
During throttle, the device does respond to temperature read, status read, etc.
A new job, however, cannot be issued, as the unit will respond with 'BUSY'.


Good Luck,
What I mean is that: if I see a temp (ZLX) of 49C is that the same number tested by the throttling firmware that decides to throttle the BFL?
So whatever the first throttling temperate is inside a BFL (when it first drops below 832MH/s), I should see (ZLX) close to that "throttling temperature" just before it decides to throttle?
hero member
Activity: 546
Merit: 500
I had a throttling single.  I opened it up and the heat sink wasn't attached properly to the board.  One of the 2 spring pins wasn't pushed down.  After securing it, I've had no more issues.  Open it up and check it out or stick something through the vent and see if you can move the heat sink.  If it moves then it needs to be secured.
Pages:
Jump to: