Sorry to revive a dead post, but, I've actually been working on this in verilog and a few other thing for several months a this point.
So we know that equihash, and ethereum, as well as neoscrypt are what we call memory hard.
In the specific case of ethereum, we currently have a 2+GB DAG file, or as I like to call it a library.
That library takes up 2 GB today and in a few more months it will be 3gb, etc etc.
So the library has to be accessible, and then the way the algorithm works is by calling a 128 byte "page" out of that 2gb library. Well it doesn't call them sequentially, it's at a seemingly random order, and it also calls it a total of 64 times, before mixing it and then comparing it to the current nonce.
64*128 bytes = 8192 bytes or 8.2Kb for the rest of us.
So now we are talking about how many times we can pull 8.2Kb per second? right because 8.2kb/s is a "hash". so we are now discussing memory bandwidth.
Each 1gb of DDR5 has approximately 24 gb/s of bandwidth, which results in a theoretical max Ethereum hash of approximately 3 Mh/s.
So I had this awesome idea for sticking a single mem controller and processor with 16gb of ddr5 ram (which is actually fairly cheap at $6-7 each) only to find out I have a LOT to learn about electrical engineering and programming.
Theoretically, you could build an FPGA with 16gb or memory on 1 RAM controller, with a single processor, and run about 50 mh/s on 1 card. However the development cost is fairly high for that, and ethereum has been fighting off the PoS Ghost transition for 12+ months now. If anyone had known 1 year ago, that casper would not go into effect in October, they would have made such a card. Hindsight is 20/20. People think Casper will again be rolled out around november/december, and it's july 1, so no one will invest 20+K to develop an FPGA/ASIC card for a coin that is going to proof of stake.
I personally consider that silly, as an FPGA card with 16gb of ram, that runs around 250W that is designed for memory hardened crypto algos is well worth the development.
I had the brilliant idea of combining 10 such 16gb cards, with not but a power supply, small ram chip for bios, and 16 gb of ram and a memory controller, all feeding into 1 FPGA "gateway" device that would send out the work, and the biggest problem I ran into there is a data transfer latency timing issue, where basically I would in theory be submitting all stale shares because, by the time I sent the signal, processed it, sent it back, then received it, then passed it on those 10+ nano seconds would cause me to end up with an effective hash rate of below 1.5 Mh/s. so for 10 cards at 16 gb at a likely cost of $750 per card, or $7500- I could end up with a whopping 240 mh/s @ 2500 watts, IE worse than a rig of Rx 480's.
Theoretically I could make a card, with 2 processors (similar to an R9 290 DUO) and stack 2 memory controllers each with 16 GB of ram on the card and end up with a near 100 MH/s (96 really) it is beyond my own ability to finish the engineering and programming of such card, and I do not possess the LARGE amount of funds it would take to generate the first card. All subsequent cards would be fairly inexpensive ($300-$400 to produce)
The other piece of the puzzle, of course, is making sure that the components are OPEN CL/CUDA device compatible, so that they interface with the current miners, and I don't also have to write my own additional miner. All in all, if it was truly a profitable venture, you would see someone with money to blow doing just that.
So good luck, and if you ever want to borrow some of my research notes or existing in development files, drop me a line.