Pages:
Author

Topic: Process-invariant hardware metric: hash-meters per second (η-factor) (Read 24981 times)

full member
Activity: 168
Merit: 100
HashFast Community Liaison
Hello eldentyrell,

Please update your nifty list with HashFast's confirmed info.

We're using 4 9x9mm chips (18x18mm total) to produce 500GH+.   Cool

What are the die size, package type, core voltage, transistor count, operating frequency, and projected TDP of the chip? 

The package is a BGA, containing a multi chip module. There are 4 dies, each 9mm x 9mm, spaced out by 5mm. The core voltage and frequency vary according to the cooling available to the chip. The chip contains a temperature sensor on die, and increases or decreases the operating voltage and frequency to maintain a target operating temperature at the die. The allows the maximum possible performance to be achieved, given the cooling that is available. In a colder environment the chip will operate at a slightly higher voltage and frequency, and return a higher hash rate than in a warm environment. Simulation runs show that the best silicon will have a TDP of 250W when operating at the name plate (nominal) 400GH/s. Worst case silicon will consume a few % more power to reach this nominal 400GH/s. Note - simulation results can be out by +/- 20%, although they typically come in high (expect lower numbers in real silicon).


I'd like to write today about a small piece of why we are confident our product is better than KnCs.

So today's topic: Our silicon design is superior.

Both are 28nm designs, but HashFast's is far more powerful and energy-efficient.

Let's look at KnC's 28nm ASIC, and some basic details as we can pull from their documentation. https://www.kncminer.com/news/news-25

First let's calculate the hash rate per square millimeter of silicon. This is a measure of the efficiency of the design.

Honestly, we don't need much to estimate this. The lid size for their chip is enough to make some good estimates.
 
KnC's diagram shows their chip has a 41.2mm lid, and implies that the silicon under that lid may be between 30mm x 30mm, and 36mm x 36mm. (The additional space is needed for decoupling capacitors and such.)
    Let's use those two numbers as bounds for the size of the silicon under the lid. If the die(s) take up just 30x30mm of the space under the lid, then:
     30x30mm = 900mm^2
     100 GHash / 900 mm^2 = 0.11 GHash/mm^2

    Or if the die takes up a bit more of the space under the lid,
      36x36mm = 1296mm^2
      100 GHash / 1296mm^2 = 0.077 GHash/mm^2

HashFast's Golden Nonce chip: I don't have to estimate the size because I work at HashFast. Smiley
   One 18x18mm die is able to do 400 GHash (nominal - more overclocked**)
   Hashing per square mm:
      18x18mm = 324mm^2
      400 GHash / 324mm^2 = 1.23 GHash/mm^2
   
Let's compare those numbers, for the high and low values for KnC's chip:

      1.23 / 0.11 = 11.2
      1.23 / 0.077 = 16

So HashFast's chip is between 11 and 16 times more efficient, in hashing per square mm, than KnC's chip.

This has an impact on how fast we can deliver units to customers. One wafer of HashFast's chips has the same capacity as 11 to 16 wafers of KNCs. For each silicon wafer delivered by the foundry, KNC will be able to satisfy 11 to 16 times fewer customers than HashFast will be able to. You'll get your units faster once production starts from us.

In addition, the HashFast chip operates much more efficiently. You get four times the hash rate for the same amount of power (250W). (Based on 250W for 100 GHash from KnC, and 250W for 400 GHash from HashFast.)

Calculations such as this are a small part of why we are confident that we are delivering a quality product to our customers.

We figure it's time to start sharing.

Amy Woodward

VP Engineering
HashFast

** P.S. Simon made me put in the line about overclocking. But as per the 'warranty' thread, no one would ever do that to our beautiful chips, right? Wink



Quote
https://hashfast.com/hashfast-announces-fastest-bitcoin-mining-chip-in-the-world/


HashFast Announces Fastest Bitcoin Mining Chip in the World!

    Posted on December 13, 2013
    by Janielle Denier   
    in Baby Jet, Blog, Development, Golden Nonce ASIC, News, Rig Assembly   

HALF A TERRAHASH/s (500GH/s) on a single chip.

HashFast ASIC Golden Nonce- Half a Terrahash Complete bitcoin mining system as found in the BabyJet

This result was achieved during the bringup process of HashFast’s GN chip and module. The engineering team are progressively testing the system, and have not yet reached the full speeds the system was designed for. We expect to see better results over the next few days.
This milestone represents a breakthrough in Bitcoin mining technology and is several times faster than existing Bitcoin mining chips. We are starting volume production of mining systems now.
Stay tuned for further speed results.
donator
Activity: 1218
Merit: 1079
Gerald Davis
It is only announced but here is some details on Hashfast chip
https://bitcointalksearch.org/topic/m.2975145
donator
Activity: 980
Merit: 1004
felonious vagrancy, personified
Well, the problem is that you can have hand-routed designs and other optimizations that only apply at certain feature sizes. Most analogue design work won't carry directly over in a die shrink.

Very true.

But if process migration is a high priority from day one, it is possible to make easily-scalable layout.  As in, migrating the layout takes ~10% of the time required to create the first version.  But you have to design the original with this capability in mind, and most people don't do that.
full member
Activity: 238
Merit: 100
(like using RISC internally and translating the instructions it)

Hrm, doesn't that mean that RISC won out in the end, even though the companies that first introduced it didn't?  So from the standpoint of a product manager deciding whether to fund a RISC project or a CISC project, the right thing to do was to listen to the debates.

Well, I guess RISC won out in a way when you look at ARM dominating the cellphone market.  However, what I meant was that the instruction set that programmers/compilers used, as opposed to the chips designs themselves that didn't matter.

All that matters is the real-world performance.

To the end-user, and to the blockchain, of course!

But if you're the company that has money to spend and are trying to figure out which design to spend money on, you want something like the η-factor.  Just like how the project managers at Intel listened carefully to the RISC-CISC debate and made the right decision in the end even though, at the time they made that decision, RISC chips were overpriced and performing poorly.

Well, the problem is that you can have hand-routed designs and other optimizations that only apply at certain feature sizes. Most analogue design work won't carry directly over in a die shrink.
donator
Activity: 980
Merit: 1004
felonious vagrancy, personified
(like using RISC internally and translating the instructions it)

Hrm, doesn't that mean that RISC won out in the end, even though the companies that first introduced it didn't?  So from the standpoint of a product manager deciding whether to fund a RISC project or a CISC project, the right thing to do was to listen to the debates.


All that matters is the real-world performance.

To the end-user, and to the blockchain, of course!

But if you're the company that has money to spend and are trying to figure out which design to spend money on, you want something like the η-factor.  Just like how the project managers at Intel listened carefully to the RISC-CISC debate and made the right decision in the end even though, at the time they made that decision, RISC chips were overpriced and performing poorly.
donator
Activity: 980
Merit: 1004
felonious vagrancy, personified
I don't know— miners seem like a pretty ideal target for exotic process stuff.

.. if you don't care about cost.

The semiconductor business is insanely capital-intensive; any time you stray even a tiny bit from whatever manufacturing process everybody else is doing, your fixed costs skyrocket and you'd better have the volume to make up for that.  Just look at gallium arsenide, the second-most-popular semiconductor after silicon.  Their wafer prices are insanely high and their production fabs are still at only 180nm (last time I checked, at least); unless you need insane radiation tolerance or microwave RF it simply isn't worth the cost.  Same goes for non-optical wavelength lithography -- it won't be cost-effective until it's the only option left and everybody has to "jump off the cliff" at the same time.  In a certain sense that's what ITRS is all about -- it's a cliff-jumping synchronization mechanism. Smiley
legendary
Activity: 1600
Merit: 1014
Sorry for not reading all the pages so far...

  • Nobody knows the figures for ASICMiner?
  • Doesn't a Bitfury chip make 2.7GH/s?
full member
Activity: 238
Merit: 100
I'm starting to think that a process-invariant metric of power efficiency isn't possible -- at least not one that can be determined by testing (i.e. without the circuit schematics and layout parasitics, neither of which any vendor is ever going to release).

yeah, that's what I was trying to say Wink

This might work for FPGA designs, but not true ASICs.

A crappy layout done with a better process, or even a better packaging that allows for more heat transfer might do better then a great design done with a crappy process or have some flaw that makes it run hot.  A company might spend it's R&D money improving the yield, figuring out the best thermodynamics, etc.

Think about it this way, lots of people used to rag about how x86 was an inferior Instruction set compared to RISC designs, but x86 always ended up having better performance in the end because Intel and AMD competed with each-other, and had a lot more money to throw at working around the 'problems' with x86 (like using RISC internally and translating the instructions it)

The most pure way of doing, the designs that are closest to perfection don't always win out in the end. All that matters is the real-world performance.
staff
Activity: 4172
Merit: 8419
Yes, you have a good point there; I suppose it ought to be "ITRS process node invariant" or "bulk-CMOS process node invariant".  If people start selling mining chips based on exotic non-bulk-CMOS technology I will definitely add a disclaimer, but I'm not sure that's going to happen.
I don't know— miners seem like a pretty ideal target for exotic process stuff. The circuit is simple and regular enough that you don't have a ton of other distractions or risk points and it's a superb commodity: build a more power efficient miner and business should flock to your door, unlike a lot of other things which have huge IPR and market force effects that can keep a technically good product from being a success. The simplicity of it also means that your improvement won't be diluted by a bunch of idle circuitry that doesn't get the benefit.
donator
Activity: 980
Merit: 1004
felonious vagrancy, personified
I agree with you on power, but disagree with you on what it means "process invariant". The way I understand you, you use "process invariant" as "process feature-size invariant", but you disregard the number and the composition of layers.

Yes, you have a good point there; I suppose it ought to be "ITRS process node invariant" or "bulk-CMOS process node invariant".  If people start selling mining chips based on exotic non-bulk-CMOS technology I will definitely add a disclaimer, but I'm not sure that's going to happen.


Bitfury used some cheap

Hey, at 55nm nothing's "cheap" Smiley


digital-only process

Nowadays the "analog" process at all the major foundries is the same as the digital process, you just get more processing steps (MiMcaps, inductor metal, deep n-well, native fet, etc).


and laid out the bypass capacitors beside the flip-flops.

Well, the bypass caps are filler… they're tiny and they don't need any signals routed to/from them, so any unused space in your layout can (and usually is) devoted to bypass cells.  If 20% of the area in his layout was bypass caps I really doubt he could fit 20% more hashers if he left them out.


Hypothetically he could have used some more expensive mixed-signal process which provides for much thicker metal layers and high-k dielectric between the metal layers (not for the gates).

An even more extreme example: measurements taken from an SOI chip would unfairly have a higher η-factor than the same design on a bulk CMOS process with the same feature size.  In that case "more money has been thrown" at the product in the form of more-expensive SOI wafers.


Then he could've laid the bypass capacitors on top of the flip-flops.

Hrm, I've never heard of people using the MiMcaps for decoupling.  Interesting idea…  Although you usually have to use one or two of your thick-metal layers in order to make a MiMcap, so the metal used there would be taken away from the best layers of the power distribution grid.  I'm not sure if that's a net win.
legendary
Activity: 2128
Merit: 1065
Er, decoupling capacitance (what you describe) doesn't affect power consumption -- at least not anywhere near as much as parasitic capacitance does.  Decoupling capacitance smooths out spikes in the supply, but it doesn't increase power consumption (except for leakage).
I agree with you on power, but disagree with you on what it means "process invariant". The way I understand you, you use "process invariant" as "process feature-size invariant", but you disregard the number and the composition of layers.

Bitfury used some cheap digital-only process and laid out the bypass capacitors beside the flip-flops. Hypothetically he could have used some more expensive mixed-signal process which provides for much thicker metal layers and high-k dielectric between the metal layers (not for the gates). Then he could've laid the bypass capacitors on top of the flip-flops.

All in all, I think the good definition of "process invariant" metrics is still an open research question. E.g. how we are going to account for the future Intel technologies where they have on-chip buck voltage regulators including planarized magnetics?
donator
Activity: 980
Merit: 1004
felonious vagrancy, personified
But the biggest problem by far is that parasitic capacitance scales in really funny ways across process nodes an even between fabs.
Also, the parasitic capacitance may not be entirely parasitic. Check out bitfury's post where he describes how he used 1/4 of the chip area to place bypass capacitors close to the sources of the current spikes:

Er, decoupling capacitance (what you describe) doesn't affect power consumption -- at least not anywhere near as much as parasitic capacitance does.  Decoupling capacitance smooths out spikes in the supply, but it doesn't increase power consumption (except for leakage).

The industry's been doing this for a while now.. the Alpha 21264 had a crazy 320nF of on-chip decoupling capacitance.  I think that was also the one that had two solid sheets (not grids!) of metal for power+ground (EDIT: no, that was the 21164).


unfortunately 26%-28% of DIE AREA is just capacitors Sad not transistors... not logic... that's big sacrifice and it won't be stable especially in low voltage without that... capacitors placed near flip-flops;

Ah yes, one of the many downsides of synchronous chips.  They're so fragile when it comes to power supplies...
legendary
Activity: 2128
Merit: 1065
But the biggest problem by far is that parasitic capacitance scales in really funny ways across process nodes an even between fabs.
Also, the parasitic capacitance may not be entirely parasitic. Check out bitfury's post where he describes how he used 1/4 of the chip area to place bypass capacitors close to the sources of the current spikes:
unfortunately 26%-28% of DIE AREA is just capacitors Sad not transistors... not logic... that's big sacrifice and it won't be stable especially in low voltage without that... capacitors placed near flip-flops;
After I read above I started thinking about using the Miller effect with additional large transistors and an additional higher supply voltage to multiply the filtering capacitance. I don't think thats feasible without the additional steps to produce thicker gate oxide than the one used in the normal logic transistors.
donator
Activity: 980
Merit: 1004
felonious vagrancy, personified
This η factor calculation doesn't take power consumption into consideration. It is simply a measure of the efficiency of the silicon design itself. (still very valuable IMHO)

I'm starting to think that a process-invariant metric of power efficiency isn't possible -- at least not one that can be determined by testing (i.e. without the circuit schematics and layout parasitics, neither of which any vendor is ever going to release).

Bitcoin mining chips' power consumption is mostly dynamic power; there's no reason for a mining chip to have any idle circuitry.  Dynamic power is determined by voltage, activity factor, and capacitance.  Although voltage can be observed, figuring out the mix of activity factor vs. capacitance isn't really possible.  At the very least you'd have to know the circuit style (bang-bang-CMOS, Domino, or MCML, for example).

But the biggest problem by far is that parasitic capacitance scales in really funny ways across process nodes an even between fabs.  The ratios between gate capacitance, sidewall capacitance, and gate-to-source/drain capacitance all change in unpredictable ways across generations.  Pretty much the only thing that scales predictably is parasitic capacitance due to metal routing, but a in a well-routed mining chip this is a very small component of the overall parasitic capacitance (except maybe the sigma-function rotation wires).  The vast bulk of your power ought to be going towards charging and discharging diffusion+gate capacitance.

So I don't really think we'll ever be able to independently estimate how well a design's power efficiency will scale across generations of fabrication processes.
full member
Activity: 154
Merit: 100
Maybe it's just me, but when you tell me Bitfury has a 2800 score and KNC a score of 90, that really seems odd. Especially considering KNC's gigahash/watt is better than Bitfury's or BFL's. It really makes me question the relevance of this metric to me. Are you saying KNC, or someone, if they had access to KNC's design could replace it with a design that's 30 times more efficient? Are we saying KNC's design is basically one giant fuckup? Doesn't seem to make sense or accord with known facts.

I'm gonna assume that we simply just don't have enough technical details to make a determination and that's why KNC still hasn't been added to the OP list.

I disagree. Bitfury's power consumption is actually better than KNC's current published specs.
This η factor calculation doesn't take power consumption into consideration. It is simply a measure of the efficiency of the silicon design itself. (still very valuable IMHO)

John
HashFast
donator
Activity: 980
Merit: 1004
felonious vagrancy, personified
If Avalon are willing to stake their reputation on a public claim that their product provides 450mh/s

Here's a quote from Bitsyncom referencing the 450Mhz figure:

the number you are all aiming for is 450

I wouldn't call "the number you are all aiming for" staking their reputation... or even a claim.  Vague comment is vague.

Let me put it another way: is it their stated policy to replace any customer chips which won't go above 300mhz as part of the warranty?  If I bought 1,000 chips from them and tried to return the ones that wouldn't go above 300mhz would they take them back promptly for a full refund?  Has someone tried this?  These are the sorts of things to look for.  There will always be significant chip-to-chip variation; since you can't test their unsorted wafers yourself you have to rely on their public statements and returns policy (i.e. staking of reputation) to figure out what counts as "typical".
full member
Activity: 238
Merit: 100
I mean, Avalon chips ship at 300Mhz, but are known to run at 450 and theoretically even more if it were only a transistor transition time.

I don't list theoretical results.

If Avalon are willing to stake their reputation on a public claim that their product provides 450mh/s with heroic cooling, and a third party verifies that using a few randomly chosen chips, I'll list them at 450mh/s.

I said 450Mhz, not 450MHash/s, but that would would be about 439MHash, maybe.

Here's a quote from Bitsyncom referencing the 450Mhz figure:

was wondering how long it'd take people to notice ( and more importantly share the constant that we've released on github.)

the number you are all aiming for is 450 Tongue of course, that's not really possible on just air cooling.

The 300Mhz the unit shipped with was based on the PSU and cooling, not the chip.
legendary
Activity: 3878
Merit: 1193
If Avalon are willing to stake their reputation on a public claim that their product provides 450mh/s with heroic cooling, and a third party verifies that using a few randomly chosen chips, I'll list them at 450mh/s.

The firmware they ship has a setting for 300 mh/s, so it's safe to include that speed in the chart.

Most miners are using a custom firmware that autoclocks. 350 mh/s is typically about where the autotuning settles for Avalon-supplied boards. I would consider that overclocking, and not appropriate to include in the chart.

To get the chips to go to 450, custom boards need to be used.

And i have some numbers to go with those from yesterday:
Slightly different air cooling setup therefore different temperatures with air cooling. (fan placement)
TL;DL : 450Mhz [9Ghash/s] - STABLE
But at the cost of 94Watts of power.

Air:
431 - 54, 48, 1.30V, 87W, stable
450 - 56, 48, 1.30V, 90W, HW Errors
450 - 57, 52, 1.34V, 94W, slightly increased error rate compared to what i normally call "stable" but close enough

Water:
450 - 54, 32, 1.34V, 94W, slightly less hw errors then with air
donator
Activity: 980
Merit: 1004
felonious vagrancy, personified
Why not graph it along a set range and then use the area of that graph as the metric.

I think there's important information in the curve that isn't captured in any sort of scalar summary of it.

For example, somebody who's renting space in Douglas County cares more about the eta-factor at very power-inefficient points on the curve, while people mining at home as a hobby (are there any left?) in, say, California, care more about the eta-factor at the very power-efficient point on the curve.

Others (like me) care about the steepness and width of the curve since it's a form of insurance against future difficulty increases, which are impossible to estimate to the sort of accuracy needed for major investments.  Burn more power today to get the equipment paid off as quickly as possible, burn less power later on to keep running for as long as possible.  I'll probably be undervolting my Spartan-6 mine (which I'm amazed is still profitable) soon in order to squeeze out an extra month or two before it becomes unable to pay for its own electricity.
donator
Activity: 980
Merit: 1004
felonious vagrancy, personified
I don't really think this is actually process invariant if the limiting factor is thermal, rather then purely a signal propagation delay.

Thermals are a power issue.  I've been pretty clear and up-front about the fact that this metric does not account for power consumption in any way.


I mean, Avalon chips ship at 300Mhz, but are known to run at 450 and theoretically even more if it were only a transistor transition time.

I don't list theoretical results.

If Avalon are willing to stake their reputation on a public claim that their product provides 450mh/s with heroic cooling, and a third party verifies that using a few randomly chosen chips, I'll list them at 450mh/s.
Pages:
Jump to: