My response from there:
My first impression is that your points are based on a misapprehension of how consensus on blocksize is intended to be reached in BU. I think you have seen N (the oversized-block acceptance depth) as the mechanism for reaching consensus,* when it's really more of a failsafe for miners who don't keep abreast of the situation on the network as well as they should if they want maximum profits.
Miners who are paying attention should have their cap set substantially above the normal maximum blocksize as seen on the network; I see N as a bit of a "gimme" to small block advocates who want to make some sacrifices to discourage large blocks, as N allows them to be a little bit riskier about it (they can safely set their max acceptance size M a bit lower) while still tracking consensus in most situations. However, I think you may have misinterpreted the point of this: N wasn't really necessary for BU to work. The original idea was to have blocksize be unlimited, then user-selectable blocksize was added as a concession to small block advocates who worry about big block attacks, and N was largely a way to make life a little easier for those small block advocates (BU engineers please correct me if I'm wrong).
However, this upgrade system is not convergent at all, there is no reason why this battle between the 2MB chain and 1MB chain should be resolved quickly. This could occur over many blocks (greater than 4).
Such a long-running duel assumes something near a 50/50 split of hash power between the chains, or else they would resolve more quickly, correct? It seems the situation with Core is the same if Core were to try to upgrade to 2MB blocks when miner support was near a 50/50 split on the issue. I know, they won't do this, instead waiting for "overwhelming consensus" like 95% or something, thus avoiding the terrible 50/50 outcome.
However, here I think you are seeing the illusion that the tail is wagging the dog. You apparently view Core's policy of waiting for 95% consensus as the means by which a 50/50 split is avoided, as if miners are all a bunch of robots. Since miners are aren't robots but in fact people who are dead-set on maximizing their profits, they certainly would not mine or build on an over-limit block (blocksize > L) unless they believed that would be a profitable choice. That would only be a profitable choice if it were unlikely to result in their block being orphaned.
If a miner is rational, which is the governing assumption of Bitcoin in the first place, this profit/loss calculation will of course take into account the very same 95% that Core is looking for, except the miner is free from that market intervention** and can choose his own threshold percentage while also balancing it with whatever other factors he deems relevant to the profit/loss calculation in his own individual situation - with his own unique power costs and connectivity and the transaction fees in the mempool at the time - to dynamically determine the point at which he himself has an expected positive return on mining a bigger block.
The first miner to mine a bigger block can be assumed to have performed such a calculation as a profit-maximizing agent, and then others may follow suit by raising their limits. Why would miners be monitoring the network that closely? Perhaps they aren't now, but they will have to in the future if they want to stay profitable, because their competitors will. They will do so when blockspace becomes limited enough and fees become high enough that their expected return on mining a block with some extra juicy fees overshadows the orphan probability based on various factors, including the XX% miner signaling.
Centrally planning the switchover at 95% is economically suboptimal and unnecessary. The Core devs imagine that without their paternalistic setting of that 95% threshold, miners would be unable to make the calculation for themselves - even though they are privy to the very same information, and in real time at that! And again, Core devs cannot know the profit/loss calculations that apply to each miner's individual situation. This is akin to Soviet-style price fixing and falls afoul of the Economic Calculation problem as explained by Ludwig von Mises, or the problems with the use of knowledge in society as explained by F.A. Hayek: no central planner can know all the individual valuations and tradeoffs each person in the economy will want to make.
The one-size-fits-all approach destroys the whole idea of the division of labor, the mainspring of human progress for the past 6000 years. Core's one-size-fits-all decrees are NOT where consensus comes from; consensus comes from miners not being idiots.*** The miners are the dog and the devs are the tail. Just like governments, they do something like require seatbelts in cars right as the auto industry is putting in seatbelts on its own and imagine that they - the wise overseers - are the source of auto safety. The tail imagines it wags the dog.
Everyone can see the decree and the effect, but few can see that the same thing - or probably something far less clunky - would have happened without the decree. If a miner has a positive expected return on an oversize block, he will mine it even if it does end up orphaned. Other miners can see this and adjust accordingly, especially once they start getting outcompeted by accepting fewer fees that they could be. Miners stay in consensus because they are econo-rational, not because of developer nannyism.
I would therefore expect blocksize to creep up very conservatively, and barring miner agreement on a flag-day upgrade - which is always a possibility even with BU except they can do without messing around with the Core devs - this will only happen when there are enough fees to warrant the orphan risk, as well as a high percentage of miners signalling support for a bigger blocksize. It would probably move up in some kind of Schelling-point increments like 2MB.
*
unless you think BU is good and your point is simply that BU shouldn't have the "excessive block depth." That may indeed be and I would have to think about your scenarios more.**
which is, after all, nothing more than the inconvenience of adjusting that 95% threshold in the Core code oneself or finding someone to do to it for you.***
Some might say that miners really are idiots, but this violates the basic assumption of Bitcoin, that miners are economically rational. Sure, some miners now might not have to be very smart about their operations, but in the future this will have to change. Right now being dumb in some ways as a miner doesn't make you economically irrational, but in the future where forking and emergent coordination must occur, it will. Miners that refuse to monitor the network and make prudent profit/loss calculations will be outcompeted by those who do.