Self-consistency and global convergence are mutually incompatible objectives. Large scale of convergence is the worst possible failure mode for Bitcoin, worse than any amount of transaction reversal.
If participants prefer their past state it means that two participants with different pasts can fail to agree on the current state even when presented with identical information. Small amounts of self-consistency can be tolerated but it is at the direct expense of convergence (which is why bitcoin is often not converged at all after just a single block, though it becomes converged with infinitesimal chance of convergence failure after enough blocks relative to the communication delays).
The problem of non-convergence is made worse for a distributed systems in a universes with a speed of light, as an attacker can pick which nodes have which past by selecting their 'locations' that they use to announce messages and so they can choose the shape of the non-convergence to be most damaging.
To specifically address your particular suggestion: A byzantine attacker obtains a large amount of hashpower for long enough to mine a couple blocks— say twelve total for your figures— faster than the network, and he maps out the locations of large mining power concentrations with some accuracy. Then he mines two forks off the current head, starting with blocks A and A'. He feeds these forks all in parallel to all nodes, giving A to half the estimated hash power and A' to the other half. Congrats: Bitcoin is over, from one currency you have two. (well, at least until people manually intervene, turn off your 'tweak' or force nodes onto the state they're rejecting, and the system causes a reversal of all transactions on one side or another, belonging and trusted by roughly half the participants).
The idea of 'solving' high hashpower attackers by imposing some kind of self-consistency constraint appears to be one of the perennial (bad-)proposals around here. They all damage consistency— which is a more important objective—, they all fail to prevent reversals with certainty (e.g. huge amounts would be reversed in my above attack example when consistency is finally recovered, ones with many many more confirms than the 6 the attacker produced), and to the extent that they might do anything at all they all present a new practical attack surface where none existed before. It's possible to propose forms which are which are so weak— e.g. only bar >=210000 block reorgs— that they do nothing at all and thus also don't create a practical problem, but if you propose one that would stop an actual attack then that same attack effort could instead be put into creating an optimized convergence failure by careful propagation to produce a maximum entropy partitioning.