Correct, something like that. I was thinking on-die memory segments could be used. But anything that would separate the hasher clock from the software communicator should be a good thing. I hadn't seen that code as I was working on the altera branches. They must be doing something right to achieve 200mh/s per chip on a spartan lx150 which in this thread (and on the hardware comparison page) topped out at 100mh/s on other boards (unless I missed some updates somewhere). The ztex design seems to be clocking 1 core at 200+mhz versus the other designs without hasher/controller separation clocking at 100mhz with 1 core. Would be amazing to double the clock rate of my altera chips from 220 to 440 w/ 3 cores!
Separating clock will not help for you(i've tried on xc6slx150). The frequency is limited by carry chains, not by the clock network delays.
As i know, ztex design allows 190MHz generally(probably calculated by xilinx at 85 celsius) , but voltage/temperature derating allows to increase frequency.
I've tried to compile ztex's source, and xst reported 230 MHz maximal clock freq. . I made some modifications, so i hope it will reach 190MHz after par, because Xst reported 316.312MHz.
Problem is that XST ALWAYS reports shit hot timings for the simulation, but once the design is mapped into the actual device, then the timings go to pot because of the way the interconnects work.(some of the XST tools just look at the 'pure logic' chains for timing).
Also as regards splitting the clocks.. it is a bad idea and there is no need for it....(in this design), becasue once you have more than one clock you have to deal with crossing clock domains and then you have to deal with shitty situations of clock lag and jitter over multiple clock sources.
As regards heat.. the hotter it runs the shorter it lives, Xilinx starts to shutdown at just over 85deg. die temp.(its designed into the die)
I've taken the XUPV5 to over 350MH/s, but it has required a very special power supply design and special PCB (which smells like cooked hairy crab when it's running full pelt)+6 17CFM fans...(those are special 'maglev' designs, not like the shitty stuff with the oil and the shitty washer holding the spindle in the housing)
BUT It's a bitch, yesterday it worked fine but today it was heavy rain and It's getting bad shares, but dropping it back 20MH/s fixed it until the rain became heavier...
it is purely a research project as both Tom & BFL have screwed me on my ASIC deliveries.
Finally as regards to the 'main delay' being in communication... actually it is unlikely, rather it is in some FPGA designs that don't allow block interruption when the block changes.
I suspect this becasue of the FIFO's and the increased USART timings(230400) I designed in to deal with other 'idle time'
Plus there are a number of shortcuts (nope not the well documented SHA256 ones) even saving a few tens of ms per round all adds up....