Pages:
Author

Topic: The Chili – 30+GH/s BFL based Bitcoin Miner Assembly - page 12. (Read 138078 times)

hero member
Activity: 868
Merit: 1000

I've noticed similar behavior with my mini rig, it does not like ambient temperature to fall below 15C.  Just won't start.

Here are some pics of my chilies.  I used 6x32 1 1/2" bolts from Home Depot, Arctic Cooling MX-4 paste, some old PCB for backplate material.
I just hope I will not have to restart it every day :-)


Restarts are part of the gamble. Only time will tell.

I have a theory that the Chili was designed and prototype tested in an "American Air conditioned summer workshop"  Kiss
Now that they are out in the wild (of the rest of the world) There are some "exceptions to the standard" being found.
The next firmware release will probably bring things back on track.   Wink

If we are still hashing them in the coming summer then they will have had a good testing cycle.
The unfortunate reality is that they will be way out of date by then ( in 6 months they may be the "graphics cards" of the asics)
Such is the speed of development of mining hardware.  Roll Eyes
legendary
Activity: 2450
Merit: 1002
Is the firmware open source yet? I would like to mess w/ voltages ...
hero member
Activity: 868
Merit: 1000
Hairdryer worked for me (for 4 cards, need to get a fan connectors for the other two), but I had to do it slightly differently.

1. power cycle the boards (two boards per PSU)
2. wait until they finish initialization (LEDs stop flashing)
3. unplug one board from usb hub
4. start hotdrying the power module on board that is connected to USB hub
5. After 15 seconds of hotdrying, start bfgminer on to scan the port in step 4 and start hashing
6. Keep hotdrying until die temp get to 65C
7. turn off hairdryer
8. when temps start to drop to 45, turn on hotdrying again to bring them up to 65C
9. another 1 minute or so, they should stay above 60C
10.  turn off the hairdryer
11. board should be stable around 60C and hash 30+GH/s

One thing I noticed, that each board is behaving differently in regards to temp rise.
I did not cool the boards at any time.  

For me, it seems that getting the boards to 60C and stabilize them there was the key.

Running one bfgminer instance per board is not best solution, but at least they are hashing.

The trick is to get them about 60c and stable.
If you get some gentle cooling on the back of the board, then the temp will rise again to around 68-70c and you will get another 3Gh
member
Activity: 80
Merit: 10
It's a corsair cx500.  I just cam home on my lunch break and was able to get it to turn on. I'm sure it will stay on for a couple hours but can go down at any thime. Then I will be unable to get the psu to turn on again for a long time. Thx again.

Check the voltages on the PSU while it's running. It's a group regulated design, so if you're pulling enough current on the 12V line without a load on the 5V line it's possible it's shutting off due to overvoltage on the 5V line.

Just out of curiosity. I have been running the single chili off a single pcie 6 pin coming from the PSU would that be a problem? I can try and find a dual 4 pin molex to 6 pin pcie adapter if needed. Anyone think this might be an issue? Still up and running since my last post. But i am sure by the time i wake up tomorrow it will be down. It has not stayed up more then 12-16 hours yet. I would ideally like to run both chilis off 1 500 wat supply if I can. Doesnt matter if its on the pcies themselves or if I need to get adapters. Please let me know. Thx!
The direct PCIe will be better than a dual 4 pin to PCIe, so use the one coming directly out of your power supply.

Thats what ive been doing. O well new PSU comes in today. Will test more when I get in from work.
hero member
Activity: 868
Merit: 1000

As Ronin4bits said:-

Get the hairdryer ready, then start it hashing. As soon as it's hashing, start blasting the top of the fets with it. Give it a minute -- (temp should rise to 68-70c)the hash rate should stabilize for about 30 sec, then it'll start to fall -- (turn off hairdryer) (temp will drop to 60c) switch over to cool air and start cooling the back of the pcb -- when you get good cooling on the back, hash rate will start climbing along with the reported asic temp -- when airflow keeps this around 68-70 degrees, it should run.  (you may need a secondary fan to take over the cooling of the back of the board)


Thanks Mudbankkeith, I'll give it a try.

If the above sequence works on your board, "then you need the slow rise firmware".

The "standard" installed firmware is trying to start too fast.

The 1v1 limiting firmware will also still try to start at the "standard" speed. (So this is really only for boards that hash too fast 39-40Gh then crash).

MrTeal has said earlier in this thread, that a new firmware will be released soon, incorporating the slow rise and a manual voltage setup, (watch this thread)
newbie
Activity: 59
Merit: 0
The only symptom that sounds different for me, they never show SICK in bfg -- they just disappear after a few minutes.

I only have 2 cards. They both do the same exact thing as this. They start normal and ChiliFlash shows everything within normal parameters. Start mining and they ramp up to about 22-26ghs for anywhere from 30-60 sec. If I don't do the warming, they start throwing more errors, hash rate falls until I get a flood of the temp errors and then they'll disappear from the bfg console.

If I kill bfg and go back to ChiliFlash, the device still opens, but it shows only 2 engines still enabled. Power cycling the Chilis is required to bring them back.

Not to confuse the thread, as so many have the 1.1v fw working for them, but for some reason neither of my boards worked better with it -- they both actually sermed worse, as o barely had time to pick up the hairdryer before they'd die.... I decided to reflash the "standard" Chili14e fw before giving up on them and to my surprise, they were stable enough to be able to use the warming technique. Once I got this worked out, based on all the previous cooling/mounting/shielding research of others on this thread, now they run nice and steady about 32-34ghs -- up for well over 48 hrs now.

I have 2 old Arctic USB fans rigged on a hub pointing down onto the pcbs for long term cooling -- hairdryer on cool to cool it down during the startup dance...
hero member
Activity: 868
Merit: 1000
+1 the hairdryer, af_newbie.

Get the hairdryer ready, then start it hashing. As soon as it's hashing, start blasting the top of the fets with it. Give it a minute -- the hash rate should stabilize for about 30 sec, then it'll start to fall -- switch over to cool air and start cooling the back of the pcb -- when you get good cooling on the back, hash rate will start climbing along with the reported asic temp -- when airflow keeps this around 68-70 degrees, it should run.

I've been average 32-33ghs for over 48 hours now after getting this one going like this.

One twist - I'm running the "regular" Chili14e fw - the 1.1v limited version would crash this board after about 30 sec of hashing, no matter what...  Huh

Good luck.

Ronin,

I spoke too soon, all of the cards I have from Lucko go into SICK state after a while.  Some go there after 30 seconds, some after 30 minutes.
The ones that go after 30 seconds in one run, run longer after restart then go down after 20 seconds after yet another restart.  All I can say, I don't know when they decide to stop hashing/responding.

The whole thing is unpredictable.  No consistency.

Now, regarding that hairdryer fix:

So you do this for each card and run a separate instance of bfgminer, one for each port and installed power switch for each 12V line?

I'm not sure if you have the same symptoms.  Can you confirm?
My cards start fine, they all start hashing, sometimes they all run for few minutes, 30+GH/s each, then they fall off into SICK state one by one.  Usually, bfgminer tries to recover, sometimes it can, sometimes, my akbash watchdog will kill it when bfgminer gets stuck.  It seems that cards stop responding to bfgminer commands.

I'll try the voltage limited version.

Thanks,
af_newbie
As Ronin4bits said:-

Get the hairdryer ready, then start it hashing. As soon as it's hashing, start blasting the top of the fets with it. Give it a minute -- (temp should rise to 68-70c)the hash rate should stabilize for about 30 sec, then it'll start to fall -- (turn off hairdryer) (temp will drop to 60c) switch over to cool air and start cooling the back of the pcb -- when you get good cooling on the back, hash rate will start climbing along with the reported asic temp -- when airflow keeps this around 68-70 degrees, it should run.  (you may need a secondary fan to take over the cooling of the back of the board)
newbie
Activity: 59
Merit: 0
+1 the hairdryer, af_newbie.

Get the hairdryer ready, then start it hashing. As soon as it's hashing, start blasting the top of the fets with it. Give it a minute -- the hash rate should stabilize for about 30 sec, then it'll start to fall -- switch over to cool air and start cooling the back of the pcb -- when you get good cooling on the back, hash rate will start climbing along with the reported asic temp -- when airflow keeps this around 68-70 degrees, it should run.

I've been average 32-33ghs for over 48 hours now after getting this one going like this.

One twist - I'm running the "regular" Chili14e fw - the 1.1v limited version would crash this board after about 30 sec of hashing, no matter what...  Huh

Good luck.
hero member
Activity: 868
Merit: 1000
For the second board, it does sound like it might have the same issues that many boards from that group buy are experiencing. We're working on firmware as a workaround for some of those issues, and hope to have it out soon.

Out of 6 cards I got from Lucko, I have 2 that fail with this temp reading error ("Error: Get temp returned empty string/time out").  

The other 4 are hashing at 30+GH/s.  The one with bad temp reading was a defect in the sink, one pipe was slightly lower than the other.
I lapped it to smooth the sink contact.  Worked ok, not perfect, but getting 30GH/s on that defective sink.  

Your other board sounds like it needs the "hairdryer" mod to get it started.
legendary
Activity: 1274
Merit: 1004
MrTeal,

Flash utility shows that my chips are running at ~half the speed:

DEVICE: Chili SC
MANUFACTURER: MrTeal and ChipGeek
FIRMWARE: 1.2.14e
CHIP PARALLELIZATION: NO
QUEUE DEPTH:40
PROCESSOR 0: 16 engines @ 134 MHz -- MAP: FFFF
PROCESSOR 1: 16 engines @ 175 MHz -- MAP: FFFF
PROCESSOR 2: 16 engines @ 176 MHz -- MAP: FFFF
PROCESSOR 3: 16 engines @ 193 MHz -- MAP: FFFF
PROCESSOR 5: 15 engines @ 155 MHz -- MAP: EFFF
PROCESSOR 6: 16 engines @ 177 MHz -- MAP: FFFF
PROCESSOR 7: 16 engines @ 76 MHz -- MAP: FFFF
THEORETICAL MAX: 17.21 GH/s
ENGINES: 111
FREQUENCY: 155 MHz
CRITICAL TEMPERATURE: 0
TOTAL THERMAL CYCLES: 0
XLINK MODE: MASTER
XLINK PRESENT: NO
OK


What could be the reason?  This is on Lucko's version of your board (I got them today, this one is first one I tried).
Should the frequency be around 300MHz?

The one is a little strange, but nothing too crazy. The board starts at a 0.85V and a lower frequency setting during initial turn-on and self-test to provide a factor of safety if there is a problem with heatsink contact. If there is one chip that has poor or no contact we want to watch that before it causes damage.
You won't see those numbers increase until you actually start to accept work from the mining software.

That board hashes at ~16GH, Voltages in bfgminer are: 3.29/0.863/12.203, but it is hashing, temp reported is 70C.

I've tried the second board, same cooler type: evo212, it starts out  but then I get "Error: Get temp returned empty string/time out" in bfgminer 3.10.0, second LED from power connector is flashing, after a while bfgminer eventually restarts it, and the process repeats.  Runs for few seconds, submits few "accepted" shares and goes into the error condition...Not sure what can I do.
For the first one, I would double check that there doesn't appear to be any board flex that would cause one chip to lift off the heatsink. Alternately, open the unit in putty or another terminal program as soon as it starts and while it's doing its self-test (the top two LEDs flashing for ~20s) send it "ZlX". That will report the temperatures of all the dies. Note if one appears to be a lot higher than the others.

For the second board, it does sound like it might have the same issues that many boards from that group buy are experiencing. We're working on firmware as a workaround for some of those issues, and hope to have it out soon.
legendary
Activity: 1274
Merit: 1004
MrTeal,

Flash utility shows that my chips are running at ~half the speed:

DEVICE: Chili SC
MANUFACTURER: MrTeal and ChipGeek
FIRMWARE: 1.2.14e
CHIP PARALLELIZATION: NO
QUEUE DEPTH:40
PROCESSOR 0: 16 engines @ 134 MHz -- MAP: FFFF
PROCESSOR 1: 16 engines @ 175 MHz -- MAP: FFFF
PROCESSOR 2: 16 engines @ 176 MHz -- MAP: FFFF
PROCESSOR 3: 16 engines @ 193 MHz -- MAP: FFFF
PROCESSOR 5: 15 engines @ 155 MHz -- MAP: EFFF
PROCESSOR 6: 16 engines @ 177 MHz -- MAP: FFFF
PROCESSOR 7: 16 engines @ 76 MHz -- MAP: FFFF
THEORETICAL MAX: 17.21 GH/s
ENGINES: 111
FREQUENCY: 155 MHz
CRITICAL TEMPERATURE: 0
TOTAL THERMAL CYCLES: 0
XLINK MODE: MASTER
XLINK PRESENT: NO
OK


What could be the reason?  This is on Lucko's version of your board (I got them today, this one is first one I tried).
Should the frequency be around 300MHz?

The one is a little strange, but nothing too crazy. The board starts at a 0.85V and a lower frequency setting during initial turn-on and self-test to provide a factor of safety if there is a problem with heatsink contact. If there is one chip that has poor or no contact we want to watch that before it causes damage.
You won't see those numbers increase until you actually start to accept work from the mining software.
legendary
Activity: 1274
Merit: 1004
It's a corsair cx500.  I just cam home on my lunch break and was able to get it to turn on. I'm sure it will stay on for a couple hours but can go down at any thime. Then I will be unable to get the psu to turn on again for a long time. Thx again.

Check the voltages on the PSU while it's running. It's a group regulated design, so if you're pulling enough current on the 12V line without a load on the 5V line it's possible it's shutting off due to overvoltage on the 5V line.

Just out of curiosity. I have been running the single chili off a single pcie 6 pin coming from the PSU would that be a problem? I can try and find a dual 4 pin molex to 6 pin pcie adapter if needed. Anyone think this might be an issue? Still up and running since my last post. But i am sure by the time i wake up tomorrow it will be down. It has not stayed up more then 12-16 hours yet. I would ideally like to run both chilis off 1 500 wat supply if I can. Doesnt matter if its on the pcies themselves or if I need to get adapters. Please let me know. Thx!
The direct PCIe will be better than a dual 4 pin to PCIe, so use the one coming directly out of your power supply.
member
Activity: 80
Merit: 10
It's a corsair cx500.  I just cam home on my lunch break and was able to get it to turn on. I'm sure it will stay on for a couple hours but can go down at any thime. Then I will be unable to get the psu to turn on again for a long time. Thx again.

Check the voltages on the PSU while it's running. It's a group regulated design, so if you're pulling enough current on the 12V line without a load on the 5V line it's possible it's shutting off due to overvoltage on the 5V line.

Just out of curiosity. I have been running the single chili off a single pcie 6 pin coming from the PSU would that be a problem? I can try and find a dual 4 pin molex to 6 pin pcie adapter if needed. Anyone think this might be an issue? Still up and running since my last post. But i am sure by the time i wake up tomorrow it will be down. It has not stayed up more then 12-16 hours yet. I would ideally like to run both chilis off 1 500 wat supply if I can. Doesnt matter if its on the pcies themselves or if I need to get adapters. Please let me know. Thx!
legendary
Activity: 1274
Merit: 1004
It's a corsair cx500.  I just cam home on my lunch break and was able to get it to turn on. I'm sure it will stay on for a couple hours but can go down at any thime. Then I will be unable to get the psu to turn on again for a long time. Thx again.

Check the voltages on the PSU while it's running. It's a group regulated design, so if you're pulling enough current on the 12V line without a load on the 5V line it's possible it's shutting off due to overvoltage on the 5V line.
legendary
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
It's a corsair cx500.  I just cam home on my lunch break and was able to get it to turn on. I'm sure it will stay on for a couple hours but can go down at any thime. Then I will be unable to get the psu to turn on again for a long time. Thx again.
Weird. I'm running a single and a chili on a Corsair CX500 no problems. Make sure the side vent isn't covered, but that unit should easily handle the load.

C
member
Activity: 80
Merit: 10
I run 3 chilis on a 1000 watt PSU.

If your PSU's wont turn on with the chili's disconnected then the PSU has a problem of some kind. PSUs are pretty simple. Are you jumpering them correctly to get them to turn on? Using a paperclip?

Yes paper clipping it. As I said everything runs fine for a while and then the PSU just turns off. Have not actually sat beside to see it happen as it will run fine for a 6-10 hours at a time no problems. Then it will be off , and I will be unable to get the power supply to turn back on for quite a few hours.

Edit: As a side note. What is the best way I should be powering off the power supply using the paperclip method? Just flip the switch or shold i be removing the paperclip then powering down? thx!
Just use the switch on the back.
I have never heard of anything like you're talking about.
What are the model numbers of the PSUs you're using? I have an older Corsair unit that won't running mining hardware unless I put a load on the 5V rail.

It's a corsair cx500.  I just cam home on my lunch break and was able to get it to turn on. I'm sure it will stay on for a couple hours but can go down at any thime. Then I will be unable to get the psu to turn on again for a long time. Thx again.
legendary
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
I used normal Radio shack thermal grease on my chips, Runs at 38/35gh with a sedion water block, which isn't bad. I really should take it off and try artic silver type stuff but I'm lame.

As for replacing the FTDI, might be a cold joint, replacing the FTDI chip on one of my jallies was a pain in the *rear*. I used leaded solder balls on the reflow, put 1-2 .45mm balls on each pad and used my Aoyue with 400c, very very low flow rate to keep from blowing things off nearby. And a pre-heater under the board set to 350F for about 10 minutes before starting.

I've found it to be best to think of the air coming out of the Aoyue as forming a "bubble" of heat off the end as opposed to a stream. Low air flows allow the hot air to transfer into the joint as opposed to going around the joint.

C
sr. member
Activity: 262
Merit: 250
Has anyone of you tried to use just the thermal compound instead of thermal pads?
Something like AC MX-4?

Or is the thermal pad absolutely necessary because the chips might have slightly different height?

I don't use the pads. I have 8 miners all using thermal compound on the ASIC chips. I use thermal adhesive on the heatsinks under the FETS.
sr. member
Activity: 280
Merit: 250
Helperizer
It seems to reset now (no more comms errors, just bfgminer crashing/stopping though it's solid with other units (chili and other USB miners)).  But it does so randomly, between a few minutes to several hours (even up to 12 hrs as the longest so far).  No raise/lower in GH/s, getting a solid 34-35 GH/s depending on pool and no noticeable change right before crashing.

Are there some test points I can measure while it's hashing that could help now that it's no longer got comms errors but instead leads to bfgminer crashes?

BTW, I can "make" it crash bfgminer by bumping the table it's on or moving it.  Must be a loose connection somewhere?  (maybe USB port/cable?  Will try a different cable tonight).
Yes, that's really interesting. I'd agree it might be a loose connection or a soldering issue if it crashes when you bump the table. Possibly a cold solder joint on the FTDI chip? The grounds especially are difficult to solder effectively as they wick heat away very quickly especially if you're using a fine tip. I would go over the leads again, and if possible heat the whole board up before you do that.

I was using a hot-air station at 400.  The current behavior is after is after I re-went around each side for 30 seconds and also reinstalled a nearby small brown smd cap that got blown off by the airstream (not sure if it happened then or the first time around when I originally replaced the FTDI).  My hands are way to imprecise (and my eyes too) for hand-soldering the teeny-tiny legs!  I could add a bit of flux and try again, but I'll check the usb area first.

BTW, I definitely appreciate all the responses as I work on this board - you've got to be the most responsive and helpful developer/vendor I've run across here - kudos and thanks!
legendary
Activity: 1274
Merit: 1004
It seems to reset now (no more comms errors, just bfgminer crashing/stopping though it's solid with other units (chili and other USB miners)).  But it does so randomly, between a few minutes to several hours (even up to 12 hrs as the longest so far).  No raise/lower in GH/s, getting a solid 34-35 GH/s depending on pool and no noticeable change right before crashing.

Are there some test points I can measure while it's hashing that could help now that it's no longer got comms errors but instead leads to bfgminer crashes?

BTW, I can "make" it crash bfgminer by bumping the table it's on or moving it.  Must be a loose connection somewhere?  (maybe USB port/cable?  Will try a different cable tonight).
Yes, that's really interesting. I'd agree it might be a loose connection or a soldering issue if it crashes when you bump the table. Possibly a cold solder joint on the FTDI chip? The grounds especially are difficult to solder effectively as they wick heat away very quickly especially if you're using a fine tip. I would go over the leads again, and if possible heat the whole board up before you do that.
Pages:
Jump to: