Oh boy.. Here we go again (and again, and again, ad nauseum on a daily basis). Can't we go one single day without someone armchair-engineering a hypothetical FPGA or ASIC-based scrypt miner based on a back-of-napkin calculation and immediately posting performance numbers and pricing before they've even tried to write an actual real Verilog implementation of scrypt+salsa?
I did some back of the envelop calculations myself, and while I think that scrypt can be done in a FPGA are a 2:1 cost advantage over GPUs, I'm not sure what the power will be, so I'd have to do a prototype on a development board. Not sure I want to spend all that time without knowing that a lot of people would buy it.
I've never don'e FPGAs before
The fastest way to spot someone who doesn't know what they're talking about is when they start claiming they're going to implement an FPGA scrypt miner with a cost/performance ratio exceeding GPU's, with any commercially available FPGA currently on the market (or in the process of coming onto the market). At least you did admit that you've never done any FPGA work previously though.
Propagation times and clock fanout skew on FPGA's are extremely slow relative to ASIC clock fanout and signal routing, if that's your background. You can achieve nowhere even close to the clock fanout and signal propagation performance on FPGA's than you may be expecting, if you're coming from an ASIC background. If you do have an ASIC background, then pretend that you're working on a design and throwing in tens to hundreds of extra buffers or muxes (representing the switching fabric of the FPGA) on every one of your signals between logic elements, and you'll get the general idea of what you'll be dealing with performance-wise when getting info FPGA development.
A back-of-envelope calculation is not valid for making a performance claim and estimated pricing. Your result is overoptimistic by almost an order of magnitude for any commercially available FPGA, if you're expecting a 2:1 cost/performance edge over, say, a Radeon 6xxx or 7xxx GPU.
There are only so many (known) ways to calculate scrypt+salsa(1024,1,1) between the two ends of the TMTO spectrum. There isn't going to be something totally revolutionary unless someone devises a cryptographic attack against scrypt that significantly shortcuts the effort needed to calculate an scrypt hash. And at that point, the same attack would be equally applicable to speeding up GPU scrypt implementations.