Hi. Having looked at the code I've got a question about the configurable loop unrolling. It appears from looking at sha256_transform.v that feedback is feeding the saved W and state into every stage of the hashing pipeline, not just the first, and I can't seem to see why this is necessary. What's more, if I'm reading what Quartus II is telling me correctly, doing this is costing me several MHz of clock speed and more importantly appears to be using fairly large amounts of logic resources. Is there any way to avoid this?
Edit: Ah, having actually read the comments I now understand. You're doing feedback seperately at each stage of the pipeline, so each pipeline stage computes 2**DEPTH rounds and outputs at 1/(2**DEPTH) speed. Interesting. The trouble with this approach is that I don't think 512-bit wide muxes are exactly cheap.