Pages:
Author

Topic: New scalable pipelined FPGA core for SHA-256 - any interest? - page 2. (Read 14934 times)

sr. member
Activity: 247
Merit: 250
Cosmic Cubist
Hi, I recently graduated from the newbie board, and thought I'd repost this.  I know there have been a number of FPGA mining threads already, but I thought I'd share my contribution...

I've been developing a new optimized SHA-256 core in VHDL.  The design philosophy of this version revolves around these points:

1) Reorganize and aggressively pipeline the round processor so as to achieve a clock frequency (and hardware efficiency) close to the maximum possible on a given FPGA.  (The critical-path delay of this particular design should be no more than one 32-bit add delay, plus register setup time.)

2) For improved scalability to maximally utilize FPGAs of any size, don't unroll the round loop, but instead build a small iterative, single-round processor, many copies of which can be operated in parallel.  Each of these cores can simultaneously hash as many block candidates as it has pipeline stages (4 in this design).  A properly designed work-dispatch unit (still to be written) can ensure that all cores always stay fully utilized hashing block candidates.

As an example of this approach's performance, here are some example stats derived for the current design, based on its compilation for a Stratix III FPGA (EP3SL150F1152C2N, as found in the Altera/Terasic DE3 board).

Area for 1 core, including test rig:             2,113 cells (plus a little memory)
Maximum frequency:                               385 - 421 MHz (depending on temperature)
Clock cycles per SHA-256 (1 chunk):         64 (on average, if pipeline is kept full)
Clock cycles per double-SHA-256:             128 (ditto)
Bitcoin Mhash/s per core:                         3.0 - 3.3 (temp-dependent)
Cores per FPGA:                                     At least 50
Bitcoin Mhash/s per FPGA:                        150 - 165 Mhash/s (temp-dependent)

This particular FPGA is rather expensive; I haven't yet researched which FPGA platform would be most cost-effective for this design.  But, if anyone else is interested in exploring this line of work, and helping to integrate this new core into a more complete mining solution, I would be happy to release the code.
Pages:
Jump to: