Pages:
Author

Topic: 1GH/s, 20w, $700 (was $500) — Butterflylabs, is it for real? (Part 2) - page 32. (Read 146909 times)

hero member
Activity: 518
Merit: 500
I think all of you are so concerned with BFL technology because you are located in China and wanna counterfeit and replicate their design to sell for cheaper Tongue

If they got such a great deal on 65nm FPGA why not sell it straight away at normal market price ?
hero member
Activity: 686
Merit: 564
I don't think there are any FPGA that run at 600 Mhz.  More likely they are using a "larger" chip.  Spartan 6-150 is used because it takes ~150K LUT to fit a complete unrolled double bitcoin hash logic.  Thus 1 hash per clock running at 200 Mhz = 200 MH/s.

If their FPGA have enough LUT to fit 2 complete unrolled hashers then the board would do 4 hashes per clock.  800 MH/s = 200 Mhz.
Almost, except that Spartan-6 LUTs aren't equivalent to the LUTs used in other FPGAs - about half of them are useless for Bitcoin mining because they don't have any adders. Expect somewhere in the ballpark of half that many LUTs for a completely unrolled miner on a more suitable FPGA.

Sure, but first they need to ship. I they have collected 1000 orders till the day the ship and they only released pictures with sanded off ICs they have achieved their delay. If they'd put the pictures online with everybody to see what kind of units they are using their competition could go to work right away.
That's not the main obstacle to competition though - the main obstacle seems to be how on earth to get the FPGAs at a low enough price.
rph
full member
Activity: 176
Merit: 100
Yes, it's a compelling idea. FPGA prices rise super- linear, i.e. an FPGA with twice the gates typically costs more than twice the dollar amount.

That is true on the very high-end (Virtex7), but not true for low-cost/high-yield parts like spartan6. 6s150 has the lowest cost per LUT in the family.

-rph
sr. member
Activity: 448
Merit: 250
Could it somehow be that they're using the pcb as an interconnect and using the 2 fpgas as one? So they only need to implement 1/2 the sha circuitry on each?

Unlikely because a single hash is trivially easy.  Splitting work between two "nodes" only makes sense if one node can't handle it in a timely manner.  It is never more efficient there is always intra-node overhead.  So building dependent parallel solutions is something you do when you have no choice.  A supercomputer for example can never be built with a single petaflop chip.  Thus you have no choice but to accept the overhead and build it with a 1000 terraflop chips.  If a single petaflop chip existed you would just use that because it would be more efficient.
Well from what I understand the issue is that the intra-chip routing takes away a lot of the fpga circiutry away from what could be used to hash. So if you could somehow use the PCB to route, you would kind of have a ghetto sASIC with 2 logic units (the fpgas) and the interconnect (pcb).  Also, if the cross-chip communication (through the pcb) were pipelined there should be no overhead of the across-chip communication, correct?

yes and no.

Yes, it's a compelling idea. FPGA prices rise super- linear, i.e. an FPGA with twice the gates typically costs more than twice the dollar amount.
No, it's not a practical idea because SHA-256 means 256 bits wide, so you need to route 256 signals from FPGA 1 to FPGA 2. While this is possible, think 10-layer PCB. Think $$$.
member
Activity: 79
Merit: 10
Could it somehow be that they're using the pcb as an interconnect and using the 2 fpgas as one? So they only need to implement 1/2 the sha circuitry on each?

Unlikely because a single hash is trivially easy.  Splitting work between two "nodes" only makes sense if one node can't handle it in a timely manner.  It is never more efficient there is always intra-node overhead.  So building dependent parallel solutions is something you do when you have no choice.  A supercomputer for example can never be built with a single petaflop chip.  Thus you have no choice but to accept the overhead and build it with a 1000 terraflop chips.  If a single petaflop chip existed you would just use that because it would be more efficient.
Well from what I understand the issue is that the intra-chip routing takes away a lot of the fpga circiutry away from what could be used to hash. So if you could somehow use the PCB to route, you would kind of have a ghetto sASIC with 2 logic units (the fpgas) and the interconnect (pcb).  Also, if the cross-chip communication (through the pcb) were pipelined there should be no overhead of the across-chip communication, correct?
donator
Activity: 1218
Merit: 1079
Gerald Davis
It is never more efficient there is always intra-node overhead. 

Unless of course you are talking about BFL's rig box, of course. Tongue

Yeah.  Wonder when they will revise those stats down?
sr. member
Activity: 448
Merit: 250
It is never more efficient there is always intra-node overhead.  

Unless of course you are talking about BFL's rig box. Tongue

e: too stoned, removed extra 'of course'
donator
Activity: 1218
Merit: 1079
Gerald Davis
Could it somehow be that they're using the pcb as an interconnect and using the 2 fpgas as one? So they only need to implement 1/2 the sha circuitry on each?

Unlikely because a single hash is trivially easy.  Splitting work between two "nodes" only makes sense if one node can't handle it in a timely manner.  It is never more efficient there is always intra-node overhead.  So building dependent parallel solutions is something you do when you have no choice.  A supercomputer for example can never be built with a single petaflop chip.  Thus you have no choice but to accept the overhead and build it with a 1000 terraflop chips.  If a single petaflop chip existed you would just use that because it would be more efficient.
member
Activity: 79
Merit: 10
Could it somehow be that they're using the pcb as an interconnect and using the 2 fpgas as one? So they only need to implement 1/2 the sha circuitry on each?
donator
Activity: 1218
Merit: 1079
Gerald Davis
Thank you for taking the time to lay that out like that. It makes perfect sense.
On that HardCopy cost I still wanna believe Altera was offering some kind of sASIC production that allowed for much smaller batches at a reduced cost.  Which, if so kinda leaves me a bit lost on just how that would work for older gens. If I'm following the manu correctly. Once they move on to a new gen with whoever their main manu is, where would they have builds for older gens done at?

I don't know but they may have some fab capacity at older gen or they may just be fabless (AMD has no fabs anymore) and just pay a foundry to do the fabrication.

Nvidia (and now AMD) do that.  They are intellectual property companies with no manufacturing assets.   Sometimes that can bite you in the ass.  Just ask AMD and the 7970 missing Christmas delivery because their partner couldn't get 28nm working in time.  Smiley
hero member
Activity: 504
Merit: 500
Someone who knows more than I about the process would have to expand on it a bit.

One of the largest advantages of structured ASICs are reduced power consumption.  The increased NRE cost eats into any per unit cost savings especially for "small (10K  unit) runs.  Even on larger runs (25K, 50K 100K units) the per unit cost is lower but it isn't anything like 70% lower than an equivalent FPGA.

*a very easy to follow explanation of the differences between fpga, sASIC and ASIC was here*
Thank you for taking the time to lay that out like that. It makes perfect sense.
On that HardCopy cost I still wanna believe Altera was offering some kind of sASIC production that allowed for much smaller batches at a reduced cost.  Which, if so, kinda leaves me a bit lost on just how that would work for older gens. If I'm following the manu correctly. Once they move on to a new gen with whoever their main manu is, where would they have builds for older gens done at?

Bah, I need to go back to school.  Lips sealed  I managed to squeeze in 2 years before I was married. Since though, I've been lucky if I could get in 4-8 cred hours a year on the next 2 year piece of paper.. >.<
donator
Activity: 1218
Merit: 1079
Gerald Davis
Someone who knows more than I about the process would have to expand on it a bit.

One of the largest advantages of structured ASICs are reduced power consumption.  The increased NRE cost eats into any per unit cost savings especially for "small (10K  unit) runs.  Even on larger runs (25K, 50K 100K units) the per unit cost is lower but it isn't anything like 70% lower than an equivalent FPGA.


sASIC - 10K unit run
------------------------
* significantly reduced power consumption
* not a significant reduction in overall cost

Last Gen FPGA
------------------------
* will consume significantly higher power than an equivelent 40nm FPGA.
* could be sold at clearance to eliminate inventory (65nm is now 2 generations old)

BFL "mystery" chip
-------------------------
* significantly INCREASED power consumption
* significantly reduced cost (relative to current gen FPGA)

Huh

Quote
But, the hardcopy process has several options that can reduce costs. I breifly read up on one of the offerings that did not include some  'screen layers'(term?) or some such that were quite a bit cheaper than a full cutom hardcopy were.

All hardcopy (and all sASICS) are "custom".

FPGA
- fixed logic units
- fixed interconnects

sASIC
- fixed logic units
- CUSTOM interconnects

ASIC
- CUSTOM logic units
- CUSTOM interconnects

FPGA "waste" a lot of transistors.  They form an interconnect mesh between nearby LUTs (Logic Units).  This mesh is what the ISE software uses to build the routing and gives FPGA their flexibility.  However that flexibility has a cost.  Every transistor you didn't use (and you may only use 10% of potential interconnects) costs you money and power. 

An sASIC uses the same fixed grid of LUTs.  This allows a design that works on an FPGA to also work on an equivelent sASIC.  However there is no programable routing.  All the LUTs are "islands".  The fab makes a mask of YOUR individual custom routing and creates a routing layer and then combines that with the standardized mass produced logic layer.   More complex routing requires more layers and that means more cost.
hero member
Activity: 504
Merit: 500
Based on BFLs own statement "The BitForce processor card is a proprietary implementation of both FPGA and ASIC technology", I'm almost certain what they use is what Altera calls "HardCopy" and what Xilinx calls "EasyPath", namely a FPGA design converted into an ASIC. Such a conversion costs "only" about 300 grand or so and pays for itself once you sell, say, 5,000 ASICs (which, in BFL's case, translates to a mere 2,500 boxes, and, assuming an average of 2.5 boxes purchased per customer, into a mere 1,000 customers). (Disclaimer: I have pre-ordered four singles at this point, so maybe LESS than 1,000 individual customers suffice to make this profitable.)

Altera/Xilinx tend to give their HardCopy/EasyPath customers optimistic projections on the power consumption and maximum clock rate, which a HardCopy/EasyPath customer (BFL in this case) tends to believe (after all, it's Altera/Xilinx saying this) and pass on to their retail customers.

Which is exactly what happened! It's a fairly common mistake to make and not a big deal. Some people went all ape-shit over this here, but underestimating the power draw and overestimating the maximum clock rate is really a fairly common mistake.

Thus, based on the pictures that seem to show an Altera device, its quite safe to assume that what we have here is an Altera HardCopy implementation of an Altera FPGA.

I doubt it.  I think the blend of ASIC & FPGA is just marketing double speak.  It has a USB controller which is an ASIC thus it does use "ASIC technology".

*a whole bunch of stuff garunteed to make Panda's brain hurt was here*
I can't disagree on their terminology being mostly for advertising. I am very inclined to agree with Inspector 2211 though. Unless BFL came across a very, very good deal on their chips they ordered a last gen 'HardCopy', seems very likely.

Someone who knows more than I about the process would have to expand on it a bit. But, the hardcopy process has several options that can reduce costs. I breifly read up on one of the offerings that did not include some  'screen layers'(term?) or some such that were quite a bit cheaper than a full cutom hardcopy were.
donator
Activity: 1218
Merit: 1079
Gerald Davis
Based on BFLs own statement "The BitForce processor card is a proprietary implementation of both FPGA and ASIC technology", I'm almost certain what they use is what Altera calls "HardCopy" and what Xilinx calls "EasyPath", namely a FPGA design converted into an ASIC. Such a conversion costs "only" about 300 grand or so and pays for itself once you sell, say, 5,000 ASICs (which, in BFL's case, translates to a mere 2,500 boxes, and, assuming an average of 2.5 boxes purchased per customer, into a mere 1,000 customers). (Disclaimer: I have pre-ordered four singles at this point, so maybe LESS than 1,000 individual customers suffice to make this profitable.)

Altera/Xilinx tend to give their HardCopy/EasyPath customers optimistic projections on the power consumption and maximum clock rate, which a HardCopy/EasyPath customer (BFL in this case) tends to believe (after all, it's Altera/Xilinx saying this) and pass on to their retail customers.

Which is exactly what happened! It's a fairly common mistake to make and not a big deal. Some people went all ape-shit over this here, but underestimating the power draw and overestimating the maximum clock rate is really a fairly common mistake.

Thus, based on the pictures that seem to show an Altera device, its quite safe to assume that what we have here is an Altera HardCopy implementation of an Altera FPGA.

I doubt it.  I think the blend of ASIC & FPGA is just marketing double speak.  It has a USB controller which is an ASIC thus it does use "ASIC technology".

Power draw on a sASIC is about 1/3rd LESS than a comparable FPGA.  We have seen FPGA solutions getting 22 MH/W.  One would expect 30 to 40 MH/W from an sASIC.  The product as last tested as ~10MH/W roughly half the performance (in MH/W) of an 40/45nm FPGA.

While power draw varies it doesn't vary that much.  The other thing that doesn't fit (as discussed in the original thread) is the lead time for an sASIC is 90 to 120 days.  

So timeline works something like this:

1 ) Build board using FPGA
2 ) Test it (not simulations an ACTUAL functional board), tweak it, test it, tweak it, test it tweak it.
3 ) Run endurance tests, possibly get an outside party under NDA to perform some testing.
4 ) Once investors are satisfied the product is ready have THAT FINAL DESIGN taped out for sASIC.
5 ) Wait up to 120 days for your test run (usually a fractional wafer).
6 ) Have assembly house build a "few" boards based on the test run.
7 ) Verify it is performing as speced.
8 ) Ok the main million dollar+ production run and wait another 30 to 90 days.
9 ) Build production units based on production run sASIC.

That timeline doesn't fit the events at BFL.  Had it been a sASIC they would have had a 100% functional (except prohibitively expensive) prototype 6 months before the sASIC run was ever finished.  Likely a half dozen protoypes.  

One final thing is BFL indicates the product came be used for other applications w/ a different "firmware" (bitstream?).  That isn't possible with an sASIC.  A sASIC Bitcoin miner wouldn't be useful for anything else.  Once masked out its function can never be changed.

I agree it is likely an Altera but IMHO the Occams razor answer is it is 65nm FPGA.  Power draw is about double for 65nm vs 40nm and that is what we see.  40nm Spartan-6 gets 22MH/W.  BFL mystery chip gets 10 MH/W.  Alterra is rolling out 28nm tech and likely has a lot of old product to dump.  If BFL has industry connections they could scoop an "deal" you will never see advertised anywhere.
sr. member
Activity: 266
Merit: 250
The king and the pawn go in the same box @ endgame
interesting. I think i should look more into that hard copy stuff.
sr. member
Activity: 448
Merit: 250
I know that, but I am thinking out on a limb. Perhaps they want to become the next Altera Tongue
Which is why I would like to know whether a purpose-built FPGA could be faster for encryption operations than a "general-purpose" FPGA.

I don't think you understand.

FPGA are horribly horribly inefficient compared to ASICs.  The reason for FPGA is because ASICs have such a huge upfront cost that despite FPGA being utterly lackluster they provide "good enough" performance (per $ and per Watt) compared to an ASIC.  So if you want the best performance nothing beats an ASIC but say you only want 10,000 or 1,000 chips.  That multi-million dollar costs is no prohibitive.

FPGA give you flexibility of making the chip do anything you want but that flexibility comes at a steep price in terms of cost (in $ and Watts). 

A single purpose FPGA is an oxymoron.  It would be like making a hybrid vehicle which is gas inefficient.  Smiley

Nothing else comes even close to the performance:  
An 45nm ASIC SHA-256 processor would be in the ballpark of
$0.20 per MH and 50 MH/w.  (probably better if there was demand for 100K units per year).
Even keeping die size reasonable you could get 4 or 5 GH/s per chip.

Someday when AMD/Intel move on to smaller processes you could roughly double (slightly less) those specs by taking advantage of excess 32nm fab capacity.
Of course the multi-millions of dollar in capital, huge risk, and limited market means we likely won't see an SHA-256 ASIC any time soon.  
 

Based on BFLs own statement "The BitForce processor card is a proprietary implementation of both FPGA and ASIC technology", I'm almost certain what they use is what Altera calls "HardCopy" and what Xilinx calls "EasyPath", namely a FPGA design converted into an ASIC. Such a conversion costs "only" about 300 grand or so and pays for itself once you sell, say, 5,000 ASICs (which, in BFL's case, translates to a mere 2,500 boxes, and, assuming an average of 2.5 boxes purchased per customer, into a mere 1,000 customers). (Disclaimer: I have pre-ordered four singles at this point, so maybe LESS than 1,000 individual customers suffice to make this profitable.)

Altera/Xilinx tend to give their HardCopy/EasyPath customers optimistic projections on the power consumption and maximum clock rate, which a HardCopy/EasyPath customer (BFL in this case) tends to believe (after all, it's Altera/Xilinx saying this) and pass on to their retail customers.

Which is exactly what happened! It's a fairly common mistake to make and not a big deal. Some people went all ape-shit over this here, but underestimating the power draw and overestimating the maximum clock rate is really a fairly common mistake.

Thus, based on the pictures that seem to show an Altera device, its quite safe to assume that what we have here is an Altera HardCopy implementation of an Altera FPGA.
rjk
sr. member
Activity: 448
Merit: 250
1ngldh
I know that, but I am thinking out on a limb. Perhaps they want to become the next Altera Tongue
Which is why I would like to know whether a purpose-built FPGA could be faster for encryption operations than a "general-purpose" FPGA.

I don't think you understand.

FPGA are horribly horribly inefficient compared to ASICs.  The reason for FPGA is because ASICs have such a huge upfront cost that despite FPGA being utterly lackluster they provide "good enough" performance (per $ and per Watt) compared to an ASIC.

A single purpose FPGA is an oxymoron.  If you want to make a chip that has a specific purpose you make an ASIC.  \

Nothing else comes even close to the performance: 
An 45nm ASIC SHA-256 processor would be in the ballpark of
$0.10 per MH and 50 MH/w. 
Even keeping die size reasonable you could get 4 or 5 GH per chip.

Someday when AMD/Intel move on to smaller processes you could roughly double those specs by taking advantage of excess 32nm fab capacity.

Of course the multi-millions of dollar in capital, huge risk, and limited market means we likely won't see an SHA-256 ASIC any time soon. 
 
Yep, I've got that. I don't mean single purpose, I mean something that could do (for instance) SHA1, SHA256, SHA512, DES, 3DES, AES, and so forth ad infinitum, but NOT other things that FPGAs are commonly known for (such as video processing). It would remain configurable so that you could target the algorithm of your choice, but might be heavily optimized towards operations that are commonly used in encryption, instead of video.

Again, this seems far-fetched, but I wanted to be sure you were understanding my thought process here.
donator
Activity: 1218
Merit: 1079
Gerald Davis
I know that, but I am thinking out on a limb. Perhaps they want to become the next Altera Tongue
Which is why I would like to know whether a purpose-built FPGA could be faster for encryption operations than a "general-purpose" FPGA.

I don't think you understand.

FPGA are horribly horribly inefficient compared to ASICs.  The reason for FPGA is because ASICs have such a huge upfront cost that despite FPGA being utterly lackluster they provide "good enough" performance (per $ and per Watt) compared to an ASIC.  So if you want the best performance nothing beats an ASIC but say you only want 10,000 or 1,000 chips.  That multi-million dollar costs is no prohibitive.

FPGA give you flexibility of making the chip do anything you want but that flexibility comes at a steep price in terms of cost (in $ and Watts). 

A single purpose FPGA is an oxymoron.  It would be like making a hybrid vehicle which is gas inefficient.  Smiley

Nothing else comes even close to the performance:  
An 45nm ASIC SHA-256 processor would be in the ballpark of
$0.20 per MH and 50 MH/w.  (probably better if there was demand for 100K units per year).
Even keeping die size reasonable you could get 4 or 5 GH/s per chip.

Someday when AMD/Intel move on to smaller processes you could roughly double (slightly less) those specs by taking advantage of excess 32nm fab capacity.
Of course the multi-millions of dollar in capital, huge risk, and limited market means we likely won't see an SHA-256 ASIC any time soon.  
 
hero member
Activity: 504
Merit: 500
As to exactly what chip?  I don't know but I suspect it is something w/ 300K LUTs giving them 2 hashes per clock per chip and thus 4 hashes per clock for the board.  Much less "interesting" but seems the most probable.

yea, definetly not as exciting. ;p  I will be wholly impresed to ponder how they would acquire something that large so cheap. Even being last gen.
donator
Activity: 1218
Merit: 1079
Gerald Davis
I'm under no illusion that they could do it with 40k luts. What happened with the $1200 speculation I popped in there?

Anyhows, what chips do you believe they are using then?

Well $1200 only buys you 100K LUTS.  Fitting an unrolled Bitcoin hasher in 100K LUT seems difficult but I guess not impossible.  If they did then that is great news.  I mean if you could hash w/ only 100K LUTS imagine what you could do with an entry level 28nm chip.  An Artix-7 w/ 300K LUT = 3 hashes per clock running at 200MHz = 600MH from a sub $200 part.  Smiley  Oh and likely 40MH/W. Smiley Smiley

I just don't think they made a 33% improvement in hashing efficiency.  I mean it would be like someone releasing a GPU miner that suddenly boosts a 5970 from 750 MH/s to 1000 MH/s.  Possible but improbable.

As to exactly what chip?  I don't know but I suspect it is something w/ 300K LUTs giving them 2 hashes per clock per chip and thus 4 hashes per clock for the board.  Much less "interesting" but seems the most probable.
Pages:
Jump to: