I think the level might happen sooner than you imagine. The 28nm chips announced by Hashfast and Cointerra are 400-500GH/s. The next steps are 22nm (like Ivy Bridge) and 14nm (like Intel's Broadwell) which they just announced this month as being an upgrade to their CPU lines in 2014. We've been playing catchup for all of 2013 as ASICs were introduced, but at some point (and I'm just guessing, as I have no industry knowledge, so if I'm totally offbase, someone please fix me) I expect that these startups will start bumping into the plants that produce CPUs and smartphone SoCs, at which point shit is going to get real and the reasons for delay will sound more like "well, we had an order placed, but Samsung kicked us out of the queue".
It's been a meteoric ride so far, but there is no way we see 2000x per-chip increases to 1PH/s in the next year if we're already at 28nm at this point.
It doesn't have to be per chip. The prices will come down. At first, there are two major price factors driving the consumer price on a new silicon device. The first, lower one, is NRE. The second is demand, and that can be (currently is) the main driving factor. If you are a device vendor, and you can sell it at 1000x cost, you'd be a fool not to. Once the competition starts either outperforming or outpricing you, you need to reevaluate your pricing. So, if you're smart, you lower your prices in increments on the chip that has already been masked out, engineered and produced. The per wafer cost after NRE isn't that much. They can squeese a lot of life out of that by simply offering cheaper prices and/or more chips in the devices. Yeah, there gets to be power issues after a while, but they buy time that way. This also gives them more time to optimise the next gen design, which will probably be more power efficient, less redundant, and a higher hashrate. Given the humongous physical size of KnC's chip, for example, I would guess that they went for at least 10x redundancy in their design just to make sure they made their deadlines. Shouldn't be too hard for them, now that they can concentrate more on engineering precision and less on setting records, to optimize that design rather severely.
Also, as you mentioned, there are smaller processes. Some online, some just starting. In the "race for the bottom" we haven't yet hit the bottom of photolithography, though it's getting close. Plus, most if not all of the necessary conditions to develop nanomachines now exist, so it's quite likely that by the time photolithography hits it's theoretical limits, there will be a better, cheaper alternative. The 'state of the art' is always a moving target in electronics.
If you had told me in 1981 that just 3 decades later I would be typing on an obsolete computer that is more powerful than the supercomputers of the early 80's, I would have laughed in your face and called you an idiot. I would have been wrong.
Sure, the current ASIC designs coming to market are at a very small feature size, but they are still first gen devices. They are not as optimal as they can be. Right now, getting to market is more important to the manufacturers than getting a truly optimal design. I think that you will see a lot of scales of efficiency within the 28nm market long before it goes to a smaller die size. Actual timeline is another matter. But I am standing by a year to two years IF bitcoin starts going seriously mainstream before the big boys care to touch it. I think companies like Cointerra and KnC think so too, and they are positioning themselves to be partners rather than roadblocks when that day comes.
I don't think BFL's people are smart enough for that, and I suspect they will fall into the "also ran" category in the fairly short term.