No, the issue is that implementing the unrolling logic costs a lot of die space and limits clocks.
That is not true. Unrolled version are more complicated because routing can be tough especially if near max of the chip design but an unrolled version is going to be faster than a rolled version because of the loop overhead.
For all I know it might make sense to have one FPGA do only or mostly unrolling, and one, running at higher clock speeds, only hashing.
What do you mean "do only or mostly unrolling".
A FPGA doesn't do unrolling. Unrolling is simply a method to convert a loop logic into a flat logic.
For example this is a loop (it would be considered a rolled logic)
while (i=0 i<4; i++)
{
print i
}
However that logic can be expressed identically using this flat logic (unrolled logic):
print 0
print 1
print 2
print 3
In assembly code (or GPU opcode or FPGA bitstream) the later version is more efficient. It can be easily changed to say 5 iterations or 30 iterations but SHA-256 never changes so it a perfect candidate for unrolling.So no FPGA "does unrolling". You unroll the algorithm to make it more efficient and load code on the FPGA which involves no loops and can be completed in a single cycle. The SHA-256 algorithm is a loop with 80 iterations. If you implemented it rolled on an FPGA it would take 80 clock cycles to complete 1 hash. So you can unrolled the loop (requires more LU) and complete the entire hash in one cycle.
Rolled (looped) 80 clock = 1 hash.
Unrolled 1 clock = 1 hash.
Now the rolled version IS smaller so you could put multiple rolled versions on 1 chip. However loops have lots of overhead so in the same die space you can't get 80 rolled version. Maybe you can only get 60 rolled versions (likely you will get much much less maybe 20 loops but lets be generous and say 60).
60 rolled version x 1 hash per 80 clocks = 0.75 hashes per clock.
1 unrolled version 1 hash per 1 clock = 1 hash per clock.
You will never get >80 rolled version on the same FPGA. Why? Loops always have overhead.
What Ive read is that using 50% die space for the unrolling is fairly typical, so using 50% of the chips could make a lot of sense. Never wondered why even the "single" board has 2 FPGA's (assuming thats what they are) ?
This sentence is illogical and I think it is because unrolling doesn't mean what you think it does.
There is no "unrolling" involved in creating or checking a hash.
There is no "unrolling" in SHA-256.
Unrolling is simply the process of converting a looping logic to a flat logic.
http://en.wikipedia.org/wiki/Loop_unwinding