Hi,
IMO, an ASIC implementation is the way to go. We already have decent RTL (those who contributed to this know who they are and I thank you guys for this). With little modifications to the currently RTL, we could easily daisy chain many "cores" (easiest implementation with current state of project is a token ring over UART...only need to assign a specific address to each core).
I fully agree that ASIC is the long-term way to go, but this UART token ring thing seems to be rubbish to me. There are well-suited protocols for this, like for example I²C.
There are two possibilities:
- Build a PCIe mining accelerator card, with some PCIe to I²C (or whatever) bridge, possibly on a CPLD.
- Slap an ARM SoC and an ethernet adapter on the board as well and make it run autonomously.
Let's say each manufactured chip would yield 100 MHash/s. We daisy chain 20 per boards (a board with 20 chips on it is not a big deal) That's 2 GHash/s right there. PCB design and manufacturing would be pretty straight forward. I volunteer for that.
Good to know, as I have never dealt with this area before. Could you provide an estimate for the non-ASIC cost? (PCB design, prototyping, manufacturing and assembly, voltage regulators, clock generation, ...)
The big question: how to we finance an ASIC project? And even more importantly: how do we get it done?
1)
Outsource FPGA2ASIC flow to
http://www.icnexus.com.tw/product.php?id=25 (first company I found...there's gotta be many others). Get a chips ASAP and limit the risks. With this forum, I'm sure we could get a small EE team together and do all the Synopsis, BIST, test scan, pads design, routing, etc. crap ourselves but there are specialists out there that will do it for us...and chances of success will be much higher with that approach. Being a 100% digital chip (+ regulator and PLL obviously) the project couldn't be easier for these guys (or whatever company that would get the contract)...now to mention they are already in the business of FPGA2ASIC conversion.
I've heard rumors that Altera would be doing there HardCopy process for as low as $150K for 1000 chips, which seems very low to me. No idea whether that's true though. We might want to request a quote.
Let's say each manufactured chip would yield 100 MHash/s.
I am pretty sure they can do much more. If a single mid-range fpga can house an entire pipeline and get 50 MH/s, any ASIC must be able to overperform that at least with a factor of ten.
I'd expect the chips to run at 200-300MHz, and one of my co-workers said that he tried synthesizing the hardcopy process for my VHDL design, and that 20 of those would fit on a single chip. That's 4-6GH/s per chip.
I am also not sure that hunderds of people would commit the neccessary amount. Buying a video card is a much lower risk, as it can be sold anytime and has uses for other purposes.
I fully agree on this point, this will probably be the biggest problem, and it sadly wouldn't be an issue for certain governments...
btw, does anyone know why the "Will fund ASIC board for mining community. Need Hardware devs." topic has been closed?
Link to that:
http://forum.bitcoin.org/index.php?topic=14910.0Just coded a fully unrolled SHA256 in VHDL using two different approaches to maximize clock rate, a simple approach that involves precalculating H + K + W, and a more advanced approach that further pipelines each stage. Initial compiles targetted Cyclone IV using web edition quartus (which sucks), with the simple version achieving 110MHz and the advanced version 133MHz. Will be interested to see maximum clock rate that can be achieved on Stratix IV.
How big was the 133MHz design? (How many KLEs?)
Could you share this design?
I now have a bitfile for the atlys board (spartan 6 - lx45) with depth:=2 and 50mhz
The only problem is, that miner.py refuses to communicate over the serial port.
It detects the core, but when it starts "Measuring FPGA performance..." it produces and timeout: "Timed out waiting for FPGA to accept work"
@TheSeven: any idea how to debug or solve the problem? is the miner.py code working for all depths and frequencies?
You'll need to adjust the pin locations for clk_in, rx and tx in the UCF file, and adjust the clock divider for the serial port for the 50MHz frequency.
Replace "10000010001" with "0110110010" and "11000011001" with "01010001011" in uart.vhd.
And I should probably publish the new version of my miner, it now supports multiple pools, long polling, etc.
TheSeven, can you give some lines on how to calculate the deviders, any formula?
the first one is (clock frequency / 115200), the second one is ((clock frequency / 115200) * 1.5)