Since it's a pipelined design, wouldn't removing the subtractor just reduce the latency of the pipeline instead of increasing the throughput?
The overall speed of a pipelined logic design is limited by the speed of the slowest stage. In the design I mentioned the last pipeline stage did what every other stage did plus it did the zero comparator, subtractor and a latch.
Even if this subtractor would prevent the re-loading of the pipeline than you could pipeline the pipeline and the subtractor.
Since the pipeline will not (i presume) produce a nounce to be latched on every clock you have more than enough time to store the previous nounce on chip and subtract the number before sending it out to the controller.
At least i would make my 'store' circuit parallel to the actual pipeline so it can operate asynchonously.
I don't think you've ever tried to use Xilinx ISE or something similar.
true enough..
The problem isn't: come up with a different, potentially faster design. The problem is: come up with a working design, the one that the available tools will be capable of synthesizing, and placing/routing sensibly. The overall structure of SHA-2 (which makes every output bit depend on every input bit in each round) is apparently hitting some worst case behavior in the Xilinx toolchain. It takes close to a full day to run a single full implementation. And in many cases the the toolchain either fails to converge to a working implementation or converges to something shamefully inefficient.
lol., sorry i even mentioned it.. And no way to work around this?
On this board 2^256 is frequently thrown around as a number so high that nobody will be able check all of them. Compare this with the work demanded from the Xilinx placing tool: 23038 SLICEs in XC6SLX150 can be permuted in 23028! ways (I'm making a gross simplification of the "place" step) which is about 10^90499. Obviously all digital synthesis tools have to take some heuristic shortcuts through that vast space of available solutions.
Well, thats why you program that thing, right?. The information you give the synthesis tools reduces this space by very much.
Anyway, it would be too off-topic to discuss it here. I take it it is not an easy task.
So the human art required from the designer is to figuratively take the poor toolchain by the hand an lead it/them to some safe place.
The blackbox art of FPGA programming...