Pages:
Author

Topic: BFL ASIC Firmware & Hardware, Understanding & Optimization - page 4. (Read 15647 times)

legendary
Activity: 1400
Merit: 1000
I owe my soul to the Bitcoin code...
Highest is 68degC in a 76degF room. Not too bad for making it much easier to sit next to.

Huh? How are you cooling below ambient with just a fan?

Sorry that was misleading. Two things to add. 1) running caseless and 2) unit sitting next to a house AC vent. (not always on but helps)

EDIT: Did you ever have one of them days where nothing comes out right?  Of course Gomeler
sr. member
Activity: 360
Merit: 250
Please note that the two reported temperatures from the single-sc are not everything. This values are coming from two sensors, which are somewhere between a 8-ASIC cluster and the +1V regulation for this cluster. They are not in the ASICs or inside the 8-ASIC cluster circuit nor inside the +1V regulation circuit.

Even if you may see only a small temperature increase by changing the fans or even see a decrease by changing the whole setup by e.g. opening the box or something like this, it doesn't mean that the ASIC's or any other important parts are not getting much more hot than they should. Also watch the hardware error rate carefully!

Soon I'm ready to report my results from looking for fan alternatives....
legendary
Activity: 3878
Merit: 1193
Highest is 68deg in a 76deg room. Not too bad for making it much easier to sit next to.

Huh? How are you cooling below ambient with just a fan?
legendary
Activity: 1400
Merit: 1000
I owe my soul to the Bitcoin code...
Highest is 68degC in a 76degF room. Not too bad for making it much easier to sit next to.

EDIT: Just a big DOH!
sr. member
Activity: 249
Merit: 250
I swapped out the original fans for the corsair high static pressure fans and temps seem to be doing just fine with much less noise.

These guys: http://www.newegg.com/Product/Product.aspx?Item=N82E16835181027   Of course YMMV.

What type of temperatures are you seeing now?
legendary
Activity: 1400
Merit: 1000
I owe my soul to the Bitcoin code...
I swapped out the original fans for the corsair high static pressure fans and temps seem to be doing just fine with much less noise.

These guys: http://www.newegg.com/Product/Product.aspx?Item=N82E16835181027   Of course YMMV.
newbie
Activity: 17
Merit: 0

Airflow / Cooling:
A much better solution are improved fan-plates. I've made new ones, which are much more efficient and the fans are more silent. A friend of me has got a CAD drawing from me, and I expect tomorrow the first new fan-plates, made on a CNC machine. With this plates, you still have a stable box, are protected from touch the running fans and you have optimal air-flow inside and outside of this box/tube. It will help to keep the whole unit more quite and cool, which will be especially necessary by the next tuning steps.

Because of the noise users start to change fans. The default ones are really noisy, but also powerful ones. I can only warn everyone to replace the fans by other ones, if you don't really know what you are doing! The default fans have a high static pressure and airflow rate. I will look for acceptable, fan replacements which are not so noisy in the next days and will let you know.

Please post some pics once you get the new CNC produced plates installed!
I'm curious what the cost of the new plates was? I'm sure others would be interested in getting some if they dramatically reduce the noise, my little single sounds like a hair dryer going non-stop. Maybe something for a kickstarter to produce a batch ...

Looking forward to updates on any fan changes as well, anything to reduce the noise of these units.
sr. member
Activity: 467
Merit: 250
Thanks for the great post.. some of the same conclusions I've reached myself.

-- 60GH singles are already using 'binned' A-grade chips, so there's very little room to turn them up. almost not worth trying.

-- default cooling is decent, but doing it yourself is much better. I had a failed fan out of the box, so I had to replace fans day-0. I took some Scythe 120x38MM fans I had laying around from projects, and replaced the stock ones.  Dropped my average temps by 10-15C under load. (To BFL's credit, they had a replacement fan to me within 3 days... kudos to BFL_Jody)

Code:
 BAS 0:  max 57C 3.27V | 61.05G/61.08Gh/s | A:48188 R:280 HW: 73 WU: 855.5/m
 BAS 1:  max 61C 3.28V | 60.47G/60.50Gh/s | A:50282 R:218 HW:116 WU: 845.0/m

-- worth mentioning and pesky -- the 6-pin PS connectors CANNOT mate with most 8-pin pin VGA power headers. Wish they'd left a little room between connectors to allow either 6 or 8-pin to be used.

-- subzero then powerup : haven't tried this trick, but Jally owners have done the "double-bag the jally, put it in the fridge for 30 minutes, take it out, power it up, and get much better hashrates" trick


I'm curious if anyone has taken the 92MM fans off the top, and just gone with the 120MM.. Fluid dynamics/vortex maybe work better?
full member
Activity: 156
Merit: 100
Good informative post OP, thanks.

I swear back in June Josh swore that the Singles/Little Singles would NOT be using the low profile 92mm Fans like the Jalapenos.  I'd already purchased some nice Panaflo 92mm that i guess I won't be able to use.  That extra 10mm would've allowed for more efficient and quieter cooling.
sr. member
Activity: 360
Merit: 250
Update in the section "Airflow / Cooling".
sr. member
Activity: 360
Merit: 250
Update in the section "Increase of the over-all hashing rate" regarding the 25 & 50 GH singles. 
full member
Activity: 238
Merit: 100
Love the Bitcoin.
thanks for the informative post.
sr. member
Activity: 360
Merit: 250
This thread should be a place where BFL ASIC based miners, especially their firmware and hardware should be discussed, to get a better understanding about how everything works and to find ways for improvements.

There are already some other threads, mainly focused on Jala-HW. This thread should focus on the 25/30/50/60/Minirig hardware.
Nevertheless, some of the information are also valid for other type of BFL ASIC hardware.

Latest official version, firmware version 1.2.6 => https://bitcointalksearch.org/topic/bfl-bitforce-sc-firmware-source-code-235312
My singles (60GH) have all reported on arrival to be version 1.2.6. They all had PCB Rev. D. They all where running with an over-all hashing-rate of ~60GH/s.
I've connected always two of them to a good 750W 80+gold ATX PSU, taking ~550W when mining @~25°C environment temperature.
For JTAG programming I use a JTAG ICE mkII Programmer.

Airflow / Cooling:
The units are awfully loud (fans). There are different reasons for this. One reason are the fan plates, which are really looking nicely, but are only badly doing their job to let the air flow easily and be silent. I read a lot about users which have unmounted this plates. By doing so, the whole box is getting unstable. I've also read by users who have complete unboxed the hardware. Even if the fans's could run better and more quite in a unboxed scenario I can NOT recomment it without additional cooling by e.g. additional fans. The PUSH/PULL configuration of the two 120mm fans is forcing the air to flow through the box, this includes also the backside of the PCB, which has also small heatsinks mounted, and some other hotspots on the PCB.  You can unbox it, when you know what you are doing, otherwise you are risking that some hotspots are not getting cooled enough and the unit will get defect or the lifetime is decreasing.  

I don't like the whole BFL concept to try to keep the unit cool. I would have done it completely different. However, without changing nearly everything, I would recomment at the atmost to only unmount the fan-plates, even if the whole box is getting mechanically unstable. A much better solution are improved fan-plates. I've made new ones, which are much more efficient and the fans are more silent. A friend of me has got a CAD drawing from me, and I expect tomorrow the first new fan-plates, made on a CNC machine. With this plates, you still have a stable box, are protected from touch the running fans and you have optimal air-flow inside and outside of this box/tube. It will help to keep the whole unit more quite and cool, which will be especially necessary by the next tuning steps.

The Single-SC (50/60GH) has four fans.
2 * 120mmx25mm (push/pull). I've measured a power consumption of 2.4 W @12V for this fan.
2 * 92mmx14mm on top of every heatsink (25/30GH version is only using one). I've to measure this type of fan also in the next day.
The pcb has for every fan a separate connector. Unfortunately the fans can not be controlled separately! The fan speed could be controlled in 16-steps with 4 digital outputs. This four digital outputs are controlling the output voltage of a voltage stabilizer (LM317). The LM317 is not a efficient component and therefore also wasting energy. So all fans together + LM317 are already taking up to 10 Watt.

Because of the noise users start to change fans. The default ones are really noisy, but also powerful ones. I can only warn everyone to replace the fans by other ones, if you don't really know what you are doing! The default fans have a high static pressure and airflow rate. I will look for acceptable, fan replacements which are not so noisy in the next days and will let you know.

Delivered firmware:
All my singles have reported to be firmware version 1.2.6, which is currently the latest released official firmware.
I used the connector JTAG1.
The security bit was always set, which means you can't read out the delivered FLASH (as a backup)!
One thing I've figured out is, that even if my singles all report version 1.2.6, they do NOT have the same version of firmware!
It looks like BFL is doing some kind of unit specific firmware flashing (tuning).

Therefor, if you compile and flash version 1.2.6 into your single, it COULD end up with less over-all hashing power (Hashrate minus HW-Errors) or other strange effects.
On the other side, if you know what you are doing, there is little potential to do this tuning better than BFL. I have done it, but it cost me time.

It must be quit hard for BFL to get 60GH singles out of the door, if they have to manually tune the existing firmware during production.
The 1.2.6 firmware has already included some diagnostic and tuning functions, but it doesn't yet work perfect.
Maybe this is also the reason why this products are shipping much slower than other ones.

Hashrate vs. over-all hashrate:
First some basic knowledge about the hashrate the unit is reporting.
There is a "THEORETICAL MAX" hashrate in MH/s, which could be read out by e.g. GetInfo with cgminer.
This hashrate is calculated based on how many engines are enabled in your unit, and with what clock frequency they are running.
It's mainly theoretical, because it doesn't consider the HardWare Errorrate (HW-errors), which could occur. So if you have a HW-Error rate of let's say 2% and a theoretical max hashrate of let's say 61000 MH/s, your over all hashrate is only about 59780 MH/s (devided by factor 1024 is giving you  ~58.38GH/s).

Firmware diagnostic/tuning procedure:
After powering up the firmware in every unit is doing some heavy diagnostic and tuning procedure. I try to give a quick overview about what it is doing.
Every ASIC has 16 hashing engines on chip. Something seems to be wrong with engine 0, because this engine is always disabled in the firmware (this is a another topic). So effectively you can only have up to 15 hashing engines with the latest firmware. Sometimes a part of an ASIC could have an defect, which could cause that a engine is not working or not working correctly. The firmware first tries to access every engine. If an engine could not be reached it will be disabled. Next the firmware is trying to find the best clock frequency  for every ASIC chip. All engines on one chip can only run with one common clock, but the clock can be different from chip to chip. At a specific clock frequency every engine is starting to produce more and more hw-errors. This frequency point and the hw-error rate could vary from engine to engine on the same chip, and especially also between chips. This frequency point and error rate depends also from a complex environment (temperature, voltage, etc.). So let's say from your 15 engines one engine is producing at a relatively low clock frequency already high error-rates, it could be a good decision to disable this engine completely only to be able to let the other 14 engines run at a higher frequency, which could cause an higher over-all hashing rate. So it's all about a compromise between theoretical maximum hashrate and the error-rate, and this could be influenced by how my engines you disable and with what frequency you are clocking each chip. The firmware is trying to do this. It is using test vectors and analyzes the nonce-result. All of this is done in the first seconds after power-up.
 I think this algorithms are not yet working perfectly. Two examples from my singles:
1. On one chip the firmware decided to enable all engines (15), but has found the perfect clock frequency at 0 MHz.  So the whole chip has not produced anything. The problem was only one engine, which had to be disabled.
2. On another unit there was one engine which has produced extremely high error-rates, but was not fetched by the algorithm.

Increase of the over-all hashing rate:
First, I think there is no easy thing to improve the over-all hashing rate to something like ~50-100%, like with the first Jala's.
My 60GH singles are already nearly at the limit, so it should also be with the 30GH singles. However I could increase the over-all hashing rate of my 60GH units from ~60GH/s to about ~63 GH/s with modified firmware, and I think there is some additional room left.
The 25/50GH singles have the same amount of chips mounted like the 30/60GH version. Based on the assumption that the quality of the used chips is not exaclty so much more poor, there should be more room to tune them into the 30/60GH range. I could even imagine that they are using a modified version of the firmware which is artificially limiting the over-all hashrate.

This post will be updated by myself when I've new things to report or get some feedback or information from others.
Pages:
Jump to: