This is a public service announcement for anyone that feels inclined to start sending BTC or LTC to nearly anyone that drops the words "Litecoin" and "FPGA" in the same post, even when it's apparent to everyone with in-depth knowledge on the subject that the OP likely doesn't know what he's talking about.
scrypt differs mostly because it uses an entirely new list so frequently.
I think the big problem is that you can't unroll salsa mixing because of its recursive form. Thus you can't parallelize calculations as you can do with sha256. The only thing you can do is to have multiple instance of your 'cores' run in parallel. But I don't think Stratix have enough on-die ram (52 Mbit max) to overwhelm a pool as you said.
The OP's claim is way worse than that. If you look at what he posted over in the 'scrypt is "memory intensive" therefore no ASICs, but how?' thread, he elaborates a bit more on how he thinks scrypt works, that what he actually means when he talks about setting up and tearing down an entirely new list:
Scrypt is resistant because it is memory hard.
The amount of memory required is controlled specifically by a psuedo randomly generated list that is changed at every hashing cycle. This means that the setup and take down of the list is expensive and it has to be done with each iteration. The alternative is to only generate a small subset of it, but the generation algorithm is itself CPU intensive.
The OP failed the basic scrypt knowledge test, I'm afraid. I saw a flawed explanation posted somewhere that looked like the OP's description, but can't remember where I saw it. This is far enough "out there" that I bet the OP had to have read it in the same place.
You can calculate scrypt+salsa20/8(1024,1,1) as used in Litecoin with a fixed 128kB buffer + a bit of extra scratchpad memory, all day long, without calculating any sort of dynamic list that determines how much memory will be involved in calculating the hash. And the memory access pattern will be exactly the same every time you calculate the hash. In fact, my own FPGA implementation of scrypt with external DDR3 leveraged this fact by shifting every scrypt core 1 clock cycle from the previous one, such that a burst read or write to/from DDR3 would fetch all the data needed (or written) by each core precisely when that core was going to use (or generate) it. This was possible because the memory access pattern and amount of memory needed is
exactly the same every time.
The shortcut is to have a multicore setup and a ton of on-die ram.
A dedicated prng core which does the setup and teardown for the second core.
I don't see the shorcut here. Are you thinking of a two stages pipeline with dual port ram in the middle ?
The OP doesn't have a shortcut at all. Even if scrypt worked the way he described, the OP's suggestion of a 2 core approach as a "shortcut" would be a retarded design for an FPGA implementation.
To conclude, I don't understand why you need funding for your idea because you can test everything with simulation. Altera provides a free web edition of their dev tools that don't allow you to target Stratix but you can target Cyclone V. You should be able to validate your idea with 12 Mb of on-die ram. Then you'll have tangible results to get funds for a dev board which are really expensive
+1
In fact, if we look at his post in the other thread, he claims he already implemented it, and destroyed the FPGA on his dev board while it "sounded like a jet landing the whole time":
For the record I built an FPGA scrypt miner a few weeks ago.
That particular FPGA has a direct path to ASIC from the mfr because it's designed specifically for prototyping ASICs.
The value of LTC is not high enough at this time to justify the cost of the FPGA and in my case at least, an error in the code that I was using caused it to overheat (I couldn't get temp data out of it, the miner was reading 0 the whole time, it never slowed down and sounded like a jet landing the whole time). It quickly became a paperweight.
A $10,000 paperweight.
Does not compute, for anyone with technical knowledge on the subject. In the highly unlikely case that this did actually occur, it would mean the OP already has the dev tools as well and would have no need to replace the whole dev board (as he states earlier in this thread that the dev board costs much more than the FPGA IC), it would be more cost productive to desolder the FPGA IC, clean up the BGA pads and reflow a new FPGA onto the board.
Also ASICs can be built from some FPGAs and those ASICs can be still faster.
Altera's Hardcopy program is really just a mask programmed FPGA that Altera has pre-qualified for your particular netlist to run at a little higher speed. I wouldn't call it a true ASIC, it doesn't achieve anywhere near the speed-up that you'd normally experience going from an FPGA to an actual real ASIC implementation built from your original Verilog source, and only achieves a few % cost reduction over Altera's equivalent FPGA's. The only reason it costs less than the equivalent FPGA is that Altera doesn't have to qualify and test the FPGA for every possible design someone could load on it, they only have to test and qualify it for your specific netlist that was mask programmed on the die. Best not point at Hardcopy as a valid route to an ASIC implementation for LTC.
Hopefully this gives people a little better idea what the odds are the OP is trying to scam people. And I see people in this thread have already been sending LTC and BTC to him! Wow..
OP: Take some time to learn how scrypt works. Read Percival's original Tarsnap scrypt whitepaper. Check out the source code for a few scrypt implementations. That way you can have the correct details on the next BS / scam attempt. Suggesting that scrypt's memory requirements are dynamic and determined by an expensively computed list calculated on each iteration was your biggest mistake here.