Let's go back to the reality. Since there is a 4x difficulty adjustment rule, does it mean an attack of this kind won't be able to reverse more than 4 blocks?
No, the idea is that they keep trying to mine blocks, each time specifying times needed to get the maximum possible ramp— mining 2016 blocks between each 4x difficulty change. The first are easy, obviously, but eventually they become very unlikely to ever get a block. The key point is that the last couple of their lucky steps are more apparent work than all the history, if you assume exponential growth. The key points that makes this unreal is that while the probablity tends to one— assuming the attacker keeps a fixed arbitrary ratio the the network and the network grows exponentially, its negligible over sane timeframes.
When talking about the
compact SPV proofs for total chain work the same kind of variance problem arises, that is to say that while the _expected_ work to produce a compact SPV proof is the same as the expected work for the long sequence of normal difficulty blocks it replaces, the reduced sample count means that the probability of 'getting lucky' and being able to mine a compact proof with less work than it would have taken you to do regular blocks of the same difficulty is much higher.
It occurred to me that by requiring additional edges in the proof you can ask about the work over alternative confidence intervals (bounded by base difficulty), and not just the expected work. E.g. "I am 50% confident that this chain had at least work X". In effect you can trade off proof size for decreased variance. But for most of the interesting applications of compact SPV proofs variance 'attacks' were really only interesting right at the tip of the chain (e.g. making a lucky bogus block that has a skip 6 blocks back to where a transaction question was committed), and so it would be easy enough to just demand that your last couple links in the proof be regular difficulty.
I suspect there may actually be an improvement possible to this variance game by having miners commit to lower difficulty solutions and provide them according to some cut-and-choose, and allow for a test of two chains against each other not just in terms of expected work, but according (e.g. to x-percentile work). It would probably take a lot of work to figure out the details here, and maybe in the exponential growth forever assumption it still doesn't work out unless you toss scalability out the window by having to exponentially increase the data disclose along with the exponential increases in hashrate.