Pages:
Author

Topic: The Chili – 30+GH/s BFL based Bitcoin Miner Assembly - page 15. (Read 138078 times)

member
Activity: 67
Merit: 10
is low temp + high hash rate + high hardware error rate just bad luck during the self test or a sign of something else (uneven cooling perhaps?).

I have a card running at 58degrees, 36gh but 11% or 12% error rate.

voltage is 1.08v using the 1.1v limited firmware

Thanks
~Hands

legendary
Activity: 966
Merit: 1000
@MrTeal: Thanks for the sentiment.  What might help IMO is a much slower voltage ramp, like waiting 30 seconds in between each 0.1V step, to give the chips time to heat themselves up from the load.
member
Activity: 67
Merit: 10
So this weekend I actually saw a card "over heat" wait till it hit 30C and then come back to life...

Now to figure out why some cards when they idle just won't get cooler than 32C ~39C

So they CAN self heal..

I plan to rebuild the cooling on all my cards over the week because the error rates are higher than I would like so hopefully that will help.

legendary
Activity: 1274
Merit: 1004
It took some doing and the use of a heat gun, but I got them going now:

Code:
cgminer version 3.9.0 - Started: [2000-01-01 00:03:48]
--------------------------------------------------------------------------------
 (5s):310.2G (avg):301.0Gh/s | A:378754  R:1042  HW:178293  WU:3950.1/m
 ST: 2  SS: 0  NB: 14  LW: 406294  GF: 0  RF: 1
 Connected to stratum-lb-usa48.btcguild.com diff 128 with stratum as user **********
 Block: 1fc1044c...  Diff:1.79G  Started: [01:31:08]  Best share: 370K
--------------------------------------------------------------------------------
 [P]ool management [S]ettings [D]isplay options [Q]uit
 BAL  1:  max 29C 0.85V | 22.08G/22.02Gh/s | A:29696 R:  0 HW:    24 WU:308.2/m
 BAL  2:  max 62C 1.16V | 29.05G/28.96Gh/s | A:35840 R:384 HW:155820 WU:382.2/m
 BAL  3:  max 49C 1.04V | 33.70G/33.40Gh/s | A:47488 R:  4 HW:  2786 WU:441.0/m
 BAL  6:  max 52C 1.04V | 32.18G/32.16Gh/s | A:43136 R:  2 HW:  2119 WU:429.7/m
 BAL  8:  max 61C 1.04V | 35.53G/35.34Gh/s | A:41600 R:258 HW:  1516 WU:473.0/m
 BAL  9:  max 49C 1.16V | 23.43G/23.30Gh/s | A:25984 R:  0 HW:  2037 WU:307.1/m
 BAL 10:  max 60C 1.00V | 33.44G/33.27Gh/s | A:38400 R:  0 HW:  2746 WU:434.1/m
 BAL 11:  max 49C 1.02V | 34.44G/34.09Gh/s | A:43264 R:256 HW:  4649 WU:427.2/m
 BAL 14:  max 54C 1.03V | 32.67G/32.42Gh/s | A:39168 R:128 HW:  2717 WU:422.8/m
 BAL 18:  max 53C 1.06V | 33.48G/33.43Gh/s | A:33280 R:  0 HW:  3901 WU:418.5/m

For some reason the first one shown never raises its voltage.

There was another one that had some pnp errors where a couple of the 0402 caps (C80 and C69) came standing on their ends, only affixed to one solder pad.  I left them on-end and installed some wire bridges to connect them to their other pads.

There's another one that won't always start up.  Sometimes when I connect the power the one LED just flashes forever, and it never starts flashing the second LED.  I have to keep cycling it until it starts.

So they're kinda ornery, but for now at least they're all running.

These are the ones from Technobit.  I'm expecting another 12 from Lucko soon.  I'll see how they do.
I apologize for this. Obviously this wasn't how we wanted this to turn out when we agreed to license out the design. I'm still working with Lucko and while it's not likely that we'll be able to solve the underlying issue with the board, there are a couple options we're looking at to at least get them stable for you. It's been slow going since we're had to be working at a distance with me analyzing the results and proposing new tests and Lucko running them and reporting back, but I think we're getting closer for those of you with these affected boards.
legendary
Activity: 966
Merit: 1000
It took some doing and the use of a heat gun, but I got them going now:

Code:
cgminer version 3.9.0 - Started: [2000-01-01 00:03:48]
--------------------------------------------------------------------------------
 (5s):310.2G (avg):301.0Gh/s | A:378754  R:1042  HW:178293  WU:3950.1/m
 ST: 2  SS: 0  NB: 14  LW: 406294  GF: 0  RF: 1
 Connected to stratum-lb-usa48.btcguild.com diff 128 with stratum as user **********
 Block: 1fc1044c...  Diff:1.79G  Started: [01:31:08]  Best share: 370K
--------------------------------------------------------------------------------
 [P]ool management [S]ettings [D]isplay options [Q]uit
 BAL  1:  max 29C 0.85V | 22.08G/22.02Gh/s | A:29696 R:  0 HW:    24 WU:308.2/m
 BAL  2:  max 62C 1.16V | 29.05G/28.96Gh/s | A:35840 R:384 HW:155820 WU:382.2/m
 BAL  3:  max 49C 1.04V | 33.70G/33.40Gh/s | A:47488 R:  4 HW:  2786 WU:441.0/m
 BAL  6:  max 52C 1.04V | 32.18G/32.16Gh/s | A:43136 R:  2 HW:  2119 WU:429.7/m
 BAL  8:  max 61C 1.04V | 35.53G/35.34Gh/s | A:41600 R:258 HW:  1516 WU:473.0/m
 BAL  9:  max 49C 1.16V | 23.43G/23.30Gh/s | A:25984 R:  0 HW:  2037 WU:307.1/m
 BAL 10:  max 60C 1.00V | 33.44G/33.27Gh/s | A:38400 R:  0 HW:  2746 WU:434.1/m
 BAL 11:  max 49C 1.02V | 34.44G/34.09Gh/s | A:43264 R:256 HW:  4649 WU:427.2/m
 BAL 14:  max 54C 1.03V | 32.67G/32.42Gh/s | A:39168 R:128 HW:  2717 WU:422.8/m
 BAL 18:  max 53C 1.06V | 33.48G/33.43Gh/s | A:33280 R:  0 HW:  3901 WU:418.5/m

For some reason the first one shown never raises its voltage.

There was another one that had some pnp errors where a couple of the 0402 caps (C80 and C69) came standing on their ends, only affixed to one solder pad.  I left them on-end and installed some wire bridges to connect them to their other pads.

There's another one that won't always start up.  Sometimes when I connect the power the one LED just flashes forever, and it never starts flashing the second LED.  I have to keep cycling it until it starts.

So they're kinda ornery, but for now at least they're all running.

These are the ones from Technobit.  I'm expecting another 12 from Lucko soon.  I'll see how they do.
donator
Activity: 686
Merit: 519
It's for the children!
I hope you're not talking about my post but just in case. 

I have a "box o heatsinks" and you can see in the pictures I plucked and stuck without any real attempt at alignment....

TIM: http://www.frozencpu.com/products/16875/thr-161/Fujipoly_Extreme_System_Builder_Thermal_Pad_-_Full_Sheet_-_300_x_200_x_05_-_Thermal_Conductivity_110_WmK.html?tl=g8c487s1730

Copper: http://www.amazon.com/Cosmos-Copper-Cooling-Heatsinks-cooler/dp/B00637X42A these are self stick.  I order a pack a week since they get used up quick.

Small aluminum sinks are from this package which is also the primary cooler used: http://www.amazon.com/ARCTIC-Accelero-Mono-Plus-Cooler/dp/B006DAD8HS

The main chips use AC5 thermal paste, I start with an extremely thin (proper) layer, lay the heatsink on the chips, check the contact points on the copper heatsink, then repeat with a thicker layer where needed until I get full contact with the paste on all chips.

The two on the back are from the grab box and were ripped off graphics cards/ motherboards/etc before they went into the trash.  That Fuji-Poly pad will stick heatsinks to just about anything.

To date every asic I have ever received has had extremely horrible thermal interface and inadequate cooling for primary and secondary chips.
legendary
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
This is impressive. What models of heat sinks did you use and what kind of fastening tools?
donator
Activity: 686
Merit: 519
It's for the children!
One of my chilis got angry and went to 26Ghash from 33.

So instead of smashing it with a hammer... 





Code:
 BAL  13:  max 69C 1.09V | 34.70G/35.12Gh/s

sr. member
Activity: 280
Merit: 250
Helperizer
For the second one, try flashing it again with 1V1, it shouldn't go (much) above 1.1V though 1.11V wouldn't be unexpected. Does it go up to 1.15? You don't need to keep the bottom heatsinks warm during operation, the issues with the boards needing to be warmed up prior to startup with a hair dryer is exclusive to the boards made by Lucko. It shouldn't have an effect on your board.
Thanks, I was wondering about the hairdryer thing.  The problem board does get above 1.15v in cgminer sometimes - that seems to be the majority of the times it's crapping out (bfgminer doesn't show it getting that high when it craps out).  It also sometimes now does the 200+ GHs with practically all errors too, if that info helps.
Hmm, tried flashing it again (several times actually) - no joy.  In a cold room it still craps out and often reboots itself (it seemed to behave a bit better during the day as it was warmer while I was at work but not totally fixed).  Unfortunately when it comes back up the miner doesn't automatically find it (bfgminer - I think cgminer does the same thing), so I have to manually restart the miner.  Not optimal, for obvious reasons.  Occasionally, I have to power cycle it to get it to come back, which I can do remotely since I have a USB-controlled powerstrip (thanks to tiebing's suggestions at an old blog, works great!  http://tiebing.blogspot.com/2011/01/use-linux-to-control-outlet.html )

I've also invoked with --cmd-sick to try to get it to restart the miner but that doesn't seem to do the trick since the miner is happy just letting it send error messages and misreading the temperature ("Received unexpected queue result reponse:" and "Error: temp returned empty string/timed out" and "Failed to send queue").  It seems to do this at 1.102 V and higher whereas the other one runs relatively rock solid at 1.107 V.  Sometimes the bad one will creep up to 1.15 V or so before crapping out, sometimes a bit below 1.1 V.  When it throws these errors hashing just slowly drops to 0GH/s, no increased errors, no increased rejects, no accept (of course), until I SBY restart or kill and restart, after which only the good chili comes up and is recognized unless I wait some number of minutes and the bad one can reboot.

Anything else I can try?

Here are some of the errors:
Code:
[2014-01-16 22:57:37] BFL 1: Error: Get temp returned empty string/timed out
 [2014-01-16 22:57:37] BFL 1: Received unexpected queue result response:
 [2014-01-16 22:57:38] BFL 1: Received unexpected queue result response:
 [2014-01-16 22:57:39] BFL 1: Error: Get temp returned empty string/timed out
 [2014-01-16 22:57:39] BFL 1: Received unexpected queue result response:
 [2014-01-16 22:57:40] BFL 1: Received unexpected queue result response:
 [2014-01-16 22:57:41] BFL 1: Error: Get temp returned empty string/timed out
 [2014-01-16 22:57:41] BFL 1: Received unexpected queue result response:
 [2014-01-16 22:57:42] BFL 1: Received unexpected queue result response:
 [2014-01-16 22:57:42] BFL 1: Received unexpected queue result response:
 [2014-01-16 22:57:42] BFL 1: Failed to send queue
 [2014-01-16 22:57:42] BFL 1: Failed to send queue
 [2014-01-16 22:57:42] BFL 1: Failed to send queue
 [2014-01-16 22:57:43] BFL 1: Error: Get temp returned empty string/timed out
 [2014-01-16 22:57:43] BFL 1: Received unexpected queue result response:
legendary
Activity: 1274
Merit: 1004
The less annoying and higher performance method (the backplate is in the better orientation) is to use some 6-32x2" screws to hold it in place though.

Hm, I don't seem to have any #6-32s on hand.  I guess it'll be another day until I'm up and hashing.

Thanks.
8-32s should also (barely) fit, and 4-40s would also work though you might need washers.
legendary
Activity: 966
Merit: 1000
The less annoying and higher performance method (the backplate is in the better orientation) is to use some 6-32x2" screws to hold it in place though.

Hm, I don't seem to have any #6-32s on hand.  I guess it'll be another day until I'm up and hashing.

Thanks.
legendary
Activity: 1274
Merit: 1004
I just received some Chilis yesterday from our TechnoBit friends across the pond, and today received the CoolerMaster Hyper 212 Evo coolers for them.

Do you guys have any tips on attaching the coolers?  The included hardware doesn't seem to quite fit right.  I started by attaching the back-plate to the back side of the board, using the included tall stand-offs and nuts, but when I then try to secure the cooler to the top side using the included scissors-style "X" clip, there is still a gap left between the cooler and the chip surfaces after the screws have been threaded all the way into the standoffs.

The short stand-offs don't looks like they are made to pass through the boards.  If I try to use them, there is not enough thread on the back side to attach the nuts with the back plate still in place.

How did you guys do it?

The less annoying and higher performance method (the backplate is in the better orientation) is to use some 6-32x2" screws to hold it in place though.


legendary
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
I just received some Chilis yesterday from our TechnoBit friends across the pond, and today received the CoolerMaster Hyper 212 Evo coolers for them.

Do you guys have any tips on attaching the coolers?  The included hardware doesn't seem to quite fit right.  I started by attaching the back-plate to the back side of the board, using the included tall stand-offs and nuts, but when I then try to secure the cooler to the top side using the included scissors-style "X" clip, there is still a gap left between the cooler and the chip surfaces after the screws have been threaded all the way into the standoffs.

The short stand-offs don't looks like they are made to pass through the boards.  If I try to use them, there is not enough thread on the back side to attach the nuts with the back plate still in place.

How did you guys do it?
Bolts from Home Despot.
legendary
Activity: 966
Merit: 1000
I just received some Chilis yesterday from our TechnoBit friends across the pond, and today received the CoolerMaster Hyper 212 Evo coolers for them.

Do you guys have any tips on attaching the coolers?  The included hardware doesn't seem to quite fit right.  I started by attaching the back-plate to the back side of the board, using the included tall stand-offs and nuts, but when I then try to secure the cooler to the top side using the included scissors-style "X" clip, there is still a gap left between the cooler and the chip surfaces after the screws have been threaded all the way into the standoffs.

The short stand-offs don't looks like they are made to pass through the boards.  If I try to use them, there is not enough thread on the back side to attach the nuts with the back plate still in place.

How did you guys do it?
member
Activity: 67
Merit: 10
I am running CGMiner 3.8.5   I also came home to one of my miners in that weird state (all lights off, zero hashing)... Rebooted the farm (with my 20 minute wait to cool down) and all was happy...

* I tested giving a miner 5 minutes of power and fans running to cool itself down enough to self test and it didn't...
* I also tested if the pi's could be turned on at the same time as the miners and would they complete their self test and start mining (even with work issued to them early) and that DID work so I can put the miners and the pi's on the same wifi controlled power switches if I want so that's good :-).


Like I mentioned before, if they are too hot to self test within 30 seconds, I have never seen them self test after that.

Thanks
On the bolded part, did you power cycle the unit when you did that? Also, what's the density of the installation, and were all the other units running? I can unplug mine from running full out and plug them back in immediately, and while it can take a minute to cool down they always come back up.

Yes I've power cycled the device... I've never had one auto-reconnect on their own.

Right now I have 3 that I power cycled and are just blinking (number 7, so waiting for temp).. but they have been off for 1/2 an hour... each card cooled with an h80 and is 6~8" from the next one.. The temp up stairs (where the cards are) is maybe 80 degrees.
member
Activity: 80
Merit: 10
For the second one, try flashing it again with 1V1, it shouldn't go (much) above 1.1V though 1.11V wouldn't be unexpected. Does it go up to 1.15? You don't need to keep the bottom heatsinks warm during operation, the issues with the boards needing to be warmed up prior to startup with a hair dryer is exclusive to the boards made by Lucko. It shouldn't have an effect on your board.
Thanks, I was wondering about the hairdryer thing.  The problem board does get above 1.15v in cgminer sometimes - that seems to be the majority of the times it's crapping out (bfgminer doesn't show it getting that high when it craps out).  It also sometimes now does the 200+ GHs with practically all errors too, if that info helps.

The "hairdryer mod" is only good for boards that "stick" just under 1v it just gives the warm up a bit of a push.
The secondary cooling is mainly to move the air under the board.  Heat rises, so the spill off from the main cooling fan is enough to ventilate the top.

A thought has just struck me, Will the Lucko boards with a problem work better upside down!!!!!!!!!!!

Yes, is strange, but I use upside down config for my boards ...
hero member
Activity: 868
Merit: 1000
For the second one, try flashing it again with 1V1, it shouldn't go (much) above 1.1V though 1.11V wouldn't be unexpected. Does it go up to 1.15? You don't need to keep the bottom heatsinks warm during operation, the issues with the boards needing to be warmed up prior to startup with a hair dryer is exclusive to the boards made by Lucko. It shouldn't have an effect on your board.
Thanks, I was wondering about the hairdryer thing.  The problem board does get above 1.15v in cgminer sometimes - that seems to be the majority of the times it's crapping out (bfgminer doesn't show it getting that high when it craps out).  It also sometimes now does the 200+ GHs with practically all errors too, if that info helps.

The "hairdryer mod" is only good for boards that "stick" just under 1v it just gives the warm up a bit of a push.
The secondary cooling is mainly to move the air under the board.  Heat rises, so the spill off from the main cooling fan is enough to ventilate the top.

A thought has just struck me, Will the Lucko boards with a problem work better upside down!!!!!!!!!!!
sr. member
Activity: 280
Merit: 250
Helperizer
For the second one, try flashing it again with 1V1, it shouldn't go (much) above 1.1V though 1.11V wouldn't be unexpected. Does it go up to 1.15? You don't need to keep the bottom heatsinks warm during operation, the issues with the boards needing to be warmed up prior to startup with a hair dryer is exclusive to the boards made by Lucko. It shouldn't have an effect on your board.
Thanks, I was wondering about the hairdryer thing.  The problem board does get above 1.15v in cgminer sometimes - that seems to be the majority of the times it's crapping out (bfgminer doesn't show it getting that high when it craps out).  It also sometimes now does the 200+ GHs with practically all errors too, if that info helps.
sr. member
Activity: 280
Merit: 250
Helperizer
MrTeal and Lucko are evaluating some more firmware versions, maybe they will have something soon.
Do you use any secondary cooling?
Thanks, yep, good fan to blow across the heatsinks on the underside.  For the watercooled one, it's actually angled a bit and gets to the top part of the tall heatsinks.  I tried cooling the top for the problem board and I wasn't sure if it helped or hurt so I took it off.
legendary
Activity: 1274
Merit: 1004
Still having intermittent problems now, but I think these are "normal" ones.  Couple of questions:

- What is the recommended miner and version that seems to perform best with these Chilis (and even what flags enabled/disabled if it's known)?  I'm looking for the most stable, not necessarily the fastest; I'd rather get rid of these intermittent zombie/sick attacks.

- I'm running the 14e version with the 1.1v limitation on both chilis, but one (212 Evo cooled) still gets above 1.1v and predictably craps out sometimes.  The other is fine usually (H60 refurb water cooled).  Any ideas what I can do?  There are heatsinks on the bottom under the mosfets and above on the 4 chips above the heatsinks on the bottom.  Should I be letting some part of it keep warm somehow (cold room in winter), or is there a recommended firmware for this situation?  BTW, the Chili that is acting up has all 16 cores enabled on all 8 chips so I was really hoping the 1.1v limiation would do the trick - but alas, no.

Thanks for any help!
- Tye
I don't really have a preferred version. I believe I'm using 3.8.5, but I just haven't bothered to upgrade. I probably should try again to see if 3.10.0 handles reconnecting USB devices better when I have hub issues.

For the second one, try flashing it again with 1V1, it shouldn't go (much) above 1.1V though 1.11V wouldn't be unexpected. Does it go up to 1.15? You don't need to keep the bottom heatsinks warm during operation, the issues with the boards needing to be warmed up prior to startup with a hair dryer is exclusive to the boards made by Lucko. It shouldn't have an effect on your board.
Pages:
Jump to: