LTC uses the parameters (2^10, 1, 1) which results in a token 128KB max scratchpad size. That isn't a typo it is kilobytes.
You are just forgetting to
multiply this scratchpad size by the number of "cores", "threads" or some other entities (the way how you call them depends on the underlying technology) in the miner device. All these "cores" are simultaneously doing hashes calculations, each with its own scratchpad. The reason why FPGAs and ASICs work so great for SHA-256 is that the number of gates needed for a single SHA-256 "core" is really small, so one can fit an enormous amount of such cores on a single chip. But each scrypt "core" needs a scratchpad for storing intermediate data, and if the scratchpad is implemented as a SRAM memory, then the number of gates per scrypt "core" just skyrockets. You can fit significantly less scrypt "cores" on a single chip than SHA-256 "cores". There are some tricks for the scratchpad size reduction (LOOKUP_GAP is the right keyword, you can search for it in the forum), which reduce the size of the scratchpad, but this reduction is not free and results in more computations. That's why you can see some people mentioning
Space–time tradeoff. The optimal lookup-gap setup depends on the balance between the memory size/performance and the computational power for doing arithmetic operations. It is also possible to use the external memory instead of on-chip SRAM, but the external memory must naturally have wide buses and a lot of bandwidth (memory latency is not critical for scrypt though). The scrypt GPU miners are relying on the GDDR5 speed, with a popular scratchpad size configuration being 64KB (lookup-gap=2), which indicates that the memory speed is the bottleneck and the excessive computational power is already traded off in order to reduce the burden on the memory.
I also suggest checking
https://github.com/ckolivas/cgminer/blob/master/SCRYPT-README to find a lot of information, which is intended to be user-comprehensible:
"--lookup-gap
This tunes a compromise between ram usage and performance. Performance peaks at a gap of 2, but increasing the gap can save you some GPU ram, but almost always at the cost of significant loss of hashrate. Setting lookup gap overrides the default of 2, but cgminer will use the --shaders value to choose a thread-concurrency if you haven't chosen one.
SUMMARY: Don't touch this"The default Scrypt parameters (2^14, 8, 1) result in a 16MB max scratchpad size roughly 128x as "memory hard".
The LTC scrypt parameters are sufficient for making sure that GPUs are required to have a lot of high bandwidth memory for decent hashing speed. It's all that matters. Increasing the size of the scratchpad is not going to bring any improvements (if by improvements you mean making CPU mining more competitive). Actually some scrypt based cryptocurrencies
tried to make it more "memory hard" and
failed to really fend off the GPUs. Also the 128x claim is just silly because you are forgetting that bigger scratchpads also inevitably mean more arithmetic operations involved in a single hash calculation. As I mentioned earlier, it is the balance between the memory speed and the arithmetic calculations speed that is important. And the LTC scrypt somehow managed to get it right, even if this actually happened unintentionally.
Regarding the FGPA device on the picture at the start of this topic. Looks like it is going to have external memory bandwidth roughly similar to what is available for the triple channel DDR3 systems. This is still less than the memory bandwidth of a mid-range GDDR5 equipped GPU. I doubt that this FPGA device is capable of demonstrating any mind blowing hashing speed. Still if it manages to scale well with the lookup-gap increase, have low power consumption and/or low device cost, then it might be possibly competitive.
BTW, the appearance of competitive FPGA devices might make people more motivated to try better optimizing scrypt for AMD GPUs (squeeze every last bit of performance and/or reduce power consumption). Bring it on, this stuff may become fun again