Pages:
Author

Topic: An estimate of fpga performance - page 5. (Read 51502 times)

newbie
Activity: 7
Merit: 0
December 27, 2010, 02:48:52 PM
#24
mike_la_jolla checking in here to clarify some FPGA questions.

- DNDPB_S327:  http://www.dinigroup.com/new/DNDPB_S327.html
List price is $19,680 for quantity 1.

- This is probably a much better choice:  DNBFC_S12_PCIe: http://www.dinigroup.com/new/DNBFC_S12_PCIe.html
List price for quantity 1 is $8,950.  We sell thousands of these to do (spooky) things.  We can fit 12 in a single chassis.

- 300 MHz is probably not achievable for Spartan-6 or Cyclone 3.  With some effort by an expert, assume you can get to 200 Mhz or so.  Don't bother with the 'C' to FPGA methodologies.  You'll need someone that is well versed in VHDL/verilog.  Also, you generally can't get to 100% utilization without breaking the tools.

- Any FPGA solution will required a host.  The DNDPB_S327 connects via Ethernet, so has low data throughput.  The DNBFC_S12_PCIe is GEN1/GEN2 PCIe, so the bandwidth is much higher.

- Those of you that think you can do a custom ASIC are nuts.  The expense and effort of an ASIC would cost millions ($USD).  The Genomic search market isn't even large enough to support a custom ASIC.

- If this is a pure code breaking application, you are probably better off with FPGAs than GPUs, but it is very easy to gang together a few Xboxes.  FPGAs are harder to come by.
lfm
full member
Activity: 196
Merit: 104
December 24, 2010, 09:40:13 PM
#23

Also I'm confused about the hash definition, do we define the bitcoin hash as two regular hashes?

Satoshi defined it in the original implementation, yes. sha256(sha256(block header))

Quote
Another thing that really puzzels me is the nounce, will it always be at offset 12 and never be more than 32 bits ?


Well it is offset 12 to the second part of the first hash, ya. offset 76 out of 80 in the block header.

Yes it will always be 32 bits.
full member
Activity: 354
Merit: 103
December 23, 2010, 03:33:45 PM
#22
60 MHz

Hmm, yes I discovered that myself today.. .really...

The numbers where though of as a maximum possible with-all-the-luck-you-can-have.

Unfortunately 11 MHash/sec is not to impressive either...

But then one has to remember that this is not the final implementation, it isn't even runable as it is.

Please correct me if I'm wrong but I thought that the maximum clock inside the spartan was about 300 MHz?

Also I'm confused about the hash definition, do we define the bitcoin hash as two regular hashes?

Another thing that really puzzels me is the nounce, will it always be at offset 12 and never be more than 32 bits ?

It's amazing the we have so many knowledgable people on this board.

sr. member
Activity: 406
Merit: 257
December 23, 2010, 05:13:27 AM
#21
300MHz? on a Spartan3? Roll Eyes
Oh, and bitcoin hash is TWO rounds of sha256.
I just synthesized it, 60MHz max for one core on a -5 speed grade S3E-500.
So NOT
300MHz / 80 clocks/hash * 3 cores = 11MHps
instead (assuming we can lose overhead and just have to do a mid-add and a compare)
60MHz / 130 clocks/hash * 3 cores = 1.4MHps

at $20/chip thats 0.07MH/$ or about 25x worse than a HD5970...

and for "GPU needs mainboard".. FPGA needs PCB, VRMs, config memory, some kind of host connection, ...

So yeah, pull a few crazy numbers out of your ass and FPGAs look decent.
legendary
Activity: 1288
Merit: 1080
December 23, 2010, 01:19:14 AM
#20

If some people created a bitcoin-dedicated ASIC, I'd be amazed.  It would be a strong indicator about how involved are some people into the bitcoin project.
member
Activity: 83
Merit: 10
December 23, 2010, 12:57:55 AM
#19
Just for reference again the logs for that moment: http://veritas.maximilianeum.ch/bitcoin/irc/logs/2010/12/20#l2461

Thanks, that was a good read.
hero member
Activity: 489
Merit: 505
December 22, 2010, 03:28:00 PM
#18
Just for reference again the logs for that moment: http://veritas.maximilianeum.ch/bitcoin/irc/logs/2010/12/20#l2461
newbie
Activity: 32
Merit: 0
December 22, 2010, 02:27:09 PM
#17
ArtForz has developed sha256 ASICs and let them (100 pieces) manufacturing for about $500/engine. This ASICs beats 5970 on hash/W by a factor of 6 but loses to 5970 on hash/$ by about a factor of 3, he said. These ASICs are not exactly a real standard cell ASIC but "metal-layer defined ASIC, basically FPGA without the FP part" (source: #bitcoin-dev).

What kind of ASIC is it?  Is this a custom PCI card?  Would higher production volumes improve the price point?  I'm interested in this, as a purpose made PCI card would be as big a boon as buying an expensive GPU.

ArtForz expect the arrive in february:
https://stuff.caurea.org/irssi/freenode/%23bitcoin-dev/2010/12/%23bitcoin-dev-2010-12-20.log : 18:36

Maybe the first step to develop a ASIC is this vhdl code. I don't believe that ArtForz will give us his code. If we put money together, maybe we could have enough money to let manufacturing a real ASIC.
legendary
Activity: 1708
Merit: 1010
December 22, 2010, 12:45:16 PM
#16
ArtForz has developed sha256 ASICs and let them (100 pieces) manufacturing for about $500/engine. This ASICs beats 5970 on hash/W by a factor of 6 but loses to 5970 on hash/$ by about a factor of 3, he said. These ASICs are not exactly a real standard cell ASIC but "metal-layer defined ASIC, basically FPGA without the FP part" (source: #bitcoin-dev).

What kind of ASIC is it?  Is this a custom PCI card?  Would higher production volumes improve the price point?  I'm interested in this, as a purpose made PCI card would be as big a boon as buying an expensive GPU.
newbie
Activity: 32
Merit: 0
December 21, 2010, 01:49:11 PM
#15
ArtForz has developed sha256 ASICs and let them (100 pieces) manufacturing for about $500/engine. This ASICs beats 5970 on hash/W by a factor of 6 but loses to 5970 on hash/$ by about a factor of 3, he said. These ASICs are not exactly a real standard cell ASIC but "metal-layer defined ASIC, basically FPGA without the FP part" (source: #bitcoin-dev).
full member
Activity: 354
Merit: 103
December 21, 2010, 09:33:44 AM
#14
Yeah that seems about right, that altera board contains 12 times as many 4 input lut's as an xc3s500 spartan module in the GOP module. 12 x 27 = 324 times the 11 MHash in my calcs => 3564000 khash/sec input in the calculator gives you 4 hours for a block. Counting at 2000 blocks per year you get 100000 BTC or $25k a year assuming moderate difficulty increase.

So I guess the graphics cards beat the crap out of the fpga's. But what about power consumption? Also the graphics cards need a motherboard, host cpu etc.

Wonder how far you could optimize the gate count?

Putting a few hundreds of these DIP formfactor boards together would also give you a priceless 80's feeling :-)

http://shop.trenz-electronic.de/catalog/product_info.php?products_id=81
newbie
Activity: 19
Merit: 0
December 20, 2010, 12:00:24 PM
#13
http://www.dinigroup.com/product/data/DNDPB_S327/images/board_front6.jpg

Drool...

By my own estimates, this thing could generate a block every few hours at the current difficulty. I doubt it would cost less than $25k-$50k though...

(source: http://www.dinigroup.com/new/products.html)
legendary
Activity: 1596
Merit: 1100
December 20, 2010, 01:07:17 AM
#12
I also checked in to the bitcoin code, but it seems that the routine I'm trying to accelerate (ScanHash_CryptoPP) is only checking for a certain number of zeroes and then returning.

Correct.  The scanner performs a fast-path check, and then a more exhaustive check if the fast-path check exits the scanner loop.


Quote
Where is the code that checks if you've found a block? I guess it would only be a simple less-than compare in the hardware.

See CheckWork().  It is a less-than compare, on an unsigned 256-bit little endian integer.
jib
member
Activity: 92
Merit: 10
December 20, 2010, 12:41:40 AM
#11
Difficulty = (2^224)/target. They're just two representations of the same thing. To check if you've found a block, you check if the hash is less than the target.
full member
Activity: 354
Merit: 103
December 20, 2010, 12:23:58 AM
#10
hello again!

Just to clarify, I did run the program under simulation only, but I also compiled the module into the Xilinx synthesis tool (ISE) just to see how much space it would take in the chip. (I don't even own a spartan fpga :-)

A full implementation.. well I'm just trying to understand the criteria for a found block, not being an expert in cryptography. This will also need to be in hardware, I think, so the fpga only reports back when it has found something.

I just checked in at the calculator (http://www.alloscomp.com/bitcoin/calculator.php).
What is the correlation between the "difficulty factor" and the "hash target"? Why do we use two concepts?

I also checked in to the bitcoin code, but it seems that the routine I'm trying to accelerate (ScanHash_CryptoPP) is only checking for a certain number of zeroes and then returning.

Where is the code that checks if you've found a block? I guess it would only be a simple less-than compare in the hardware.

The code would also need to contain some uart comms or similar, I thought of broadcasting the request to all devices and then daisy-chaining the results back so that the "winning" device could break the chain and report back to the host computer.

member
Activity: 83
Merit: 10
December 19, 2010, 03:21:03 PM
#9
Under my rough calculations, the highest end Virtex 5 could hit 40-50 mhps. At $3000+ for a PCIE dev board, it is far more cost effective to buy ATI video cards.

Edit:

Of course, if one were to connect a bunch of these things in parallel, they could make a big dent, ie:
http://www.sciengines.com/copacobana/
newbie
Activity: 32
Merit: 0
December 19, 2010, 02:06:52 PM
#8
One AMD radeon 5970 (570 Mhash/s) = ~ 50 * xc3s500E (11 Mhash/s). But with FPGAs the Mhash/W should be better as with GPUs.

I'm not sure what Mhash/W is.  But, GPUs are ASIC so they begin with a significant advantage over FPGAs.

Mhash/watt. FPGAs should be has a better power efficients than GPUs. ASIC (Application-specific integrated circuit) for mining only (like Deep Crack for DES) would be the greatest variant, but this is very expensive in development (maybe 300,000 USD?).
legendary
Activity: 1596
Merit: 1100
December 19, 2010, 01:42:25 PM
#7
One AMD radeon 5970 (570 Mhash/s) = ~ 50 * xc3s500E (11 Mhash/s). But with FPGAs the Mhash/W should be better as with GPUs.

I'm not sure what Mhash/W is.  But, GPUs are ASIC so they begin with a significant advantage over FPGAs.
newbie
Activity: 32
Merit: 0
December 19, 2010, 01:28:26 PM
#6
How does this compare to a GPU?   


One AMD radeon 5970 (570 Mhash/s) = ~ 50 * xc3s500E (11 Mhash/s). But with FPGAs the Mhash/W should be better as with GPUs.
newbie
Activity: 32
Merit: 0
December 19, 2010, 01:16:21 PM
#5
Nice  Smiley
A full implementation would be great!
Pages:
Jump to: