The technical limit is artificial, it could be 1.1 mb or 0.8 mb or 1.111 mb, doesnt matter. The point is that the smaller the blocksize the better, because Moore's law is pseudoscience, and all growth functions tent to be logarithmic, not exponential.
Moore's law wasn't intended to last forever. However it deals primarily with miniaturization and not direct performance improvement.
SIMD instructions are a fine example of how the same transistors can become much faster in executing calculations. You could be using 4 or 8 clocks to do what you now only need one clock cycle. And if you widen the bits that can fit, you could make it 16.
Unfortunately, we are like ~20 years after mmx, and 15 years after SSE & SSE2, and still most compilers are very shitty in taking advantage of the extended instruction sets and too few people will write assembly to take advantage of faster code execution - let alone for very modern instruction sets.
There is a lot of performance right now in our chips that isn't even being used. I believe there'll come a time, perhaps as early as the early 2020's, when code compilers will be AI-driven. Meaning they will be able to think and act like humans (or better) in applying code optimization, where code can be optimized and paralleled. They will be like ...sentient interpreters. You just give them the code and they'll give you the best machine code for your hardware.. and as they evolve, they will not even need code but specifications of what to do, how they will be controlled, etc.
When this happens, the performance gains will dwarf any technological gains because the technological gains have been neutralized for years due to ...Wirth's law and bloated software, so this trend will reverse at full speed. At that moment we'll have made a leap of performance that may be equivalent to ~10 years of processor advances, but also other hardware advances as well. Compression is a key factor in increasing memory, storage and transmission capacities but it is typically unused due to the processing costs. If compression and decompression can be simultaneously high ratio and fast, at that point it is worth it.
Having said that about the future, we should remember that a lot of processing power is left on the table right now, whether in CPU instruction sets or even GPUs with their thousands of cores.