I've seen NIOS mentioned above and this made me to add the following comments:
1) Please don't fall into a trap of synthesizing a CPU onto the same physical chip it will be monitoring. This will be a nightmare when debugging. Ideally the monitoring CPU should be on a separate chip powered from a separate regulator and physically placed away from the chips it is monitoring.
We definitely will not be using a soft processor. In fact, the first revision will most likely not have any CPU at all, just a USB port to connect it to the host.
2) The inter-chip communication on the same board should be using as many physical pins as practical. Your design will be thermally limited and physically connecting the pins to traces will allow you to evacuate the additional incremental heat from the chip. You could even create dummy busses and point-to-point connections that are connected physically but not electrically.
This seems like a good suggestion. We will be following best practices as recommended by Altera for their Hardcopy devices, but with 1152 pins on the chip, there is a lot of extra potential cooling capacity by insuring they are all connected to some copper.
3) If you plan on really pushing hard the hasher chips it may be beneficial to have intermediate coding/decoding chip to provide high fan-out/fan-in between the hashing chips and the monitoring/communication CPU. This approach will give you two benefits:
3a) you will know that any hashing failure really occurred in hashing and not in a serdes/comms.
3b) you will be able to separately test and reset all hashing chips on the same board.
Interesting. We hope to be able to put at least 25 fully unrolled miners on the ASIC. Given the code that has already been published, the hardest part of this project seems to be coming up with a robust way to array all of these miners together so that they effectively appear as a single miner. Several ideas (including some existing code) have already been mentioned in this thread and I am looking at all options, so I appreciate your suggestion here. Initially, we expect to be populating PC boards with only a single ASIC. I will try to come up with an architecture that will scale gracefully to more than one, however.
Please don't try to save pennies on the copper for the traces. Make the traces as wide and as thick as practical. Counteract the parasitic capacitance of abnormally wide traces with the slow but parallel inter-chip communication.
Current thinking is to produce a schematic of the design and then give it to a professional PC board design shop, along with a copy of Altera's best practices for PC board layouts for their Hardcopy devices. If we're going to spend $200K producing a custom ASIC, it seems like a bad idea to try to cut corners on the PC board. Fortunately, there will not be any particularly high-speed I/O on or off the chip. Probably the highest switching rate will be the input clock which will be around 50 MHz.