I mentioned in the main FPGA development thread (by fpgaminer) that I have coded an unrolled sha256 using additional pipelining and was asked about resource usage. I am now compiling with the subscription edition Quartus II v11.0 with an evaluation license. The resource usage is around 44K LE in a Stratix IV (EP4SE530H40C2) for a single core and the clock rate achieved is 240MHz. This device should fit 4 SHA256 pairs with a hash rate approaching 1GH/s. The EP4SE820 should be capable of 2GH/s. I have found a system that uses 20 of these on a single card. Yes this would be expensive to buy (>$200K I assume) but much smaller size and less power usage / heat than a cluster of computers running 6990s, maybe as low as 1-2kW total?
As for the cheaper Cyclone IV (EP4CE115F29I7), resource usage is 62K LE (38K combinational functions, 44K registers, it seems Cyclone cannot combine them effectively) and clock rate is 134MHz. A single SHA256 pair with the additional pipelining would struggle to fit in this device, however I have another version which uses just the precalculation of H + K + W to improve clock rate, which will be smaller. I have found a card with 27 Cyclone IIIs, wonder how much that would cost...
Interested to hear what other people have achieved in terms of clock rate and resource usage.
$200k for 1 GH/s? That's a deal right there..
For that kind of money you can buy about 200GH/s through noisy over, power consuming, heat producing rigs. FPGA has a long way to go unless your rich and have an irrational desire to go green regardless of the expense.