BUT ...
Now does that matter which fork you mine on? No. Since all that matters is who finds the next block to decide the fork.
Bitcoin core's rule is ludicrously simple and works.
If you are building on a block at height X, and another valid block appears at height greater than X, start building on the new block. Anything else, ignore it.
It doesn't matter which (valid) fork that new block is on, just switch to it.
So yes you could choose to help some other pool confirm it's 1s, 10s, 100s late block, if you choose to, but you wont gain anything on the bitcoin network by doing that ... without a cartel setup where they say they will do that for you also.
You affirm this because you are not considering the compound effect of the significant network "friction" the example assumes, as well the new incentive the orphan race creates for M and m.
Let's go back to the example in my previous post: M as the large miner, m as the small miner, N as the neutral miner, and significant propagation time. What you are stating is that in the case of a race between M and m, it doesn't matter to N which block it mines off of, it only matters that both blocks extend the last known top. After all, you are only "wasting" hash rate if you mine on top of a block below the max height.
Or are you? Clearly there is no incentive to ignore a valid solution if the only thing you are going after is the coinbase reward + average fees. But what if there was an incentive? Say someone emits a ZC paying 1000 BTC to address A with a small fee, and as soon as this address is mined, emits a transaction B that spends the same TxOuts to address B, but this time paying a 50BTC fee. Wouldn't it be in every miner's benefit (but the one which mined the block with A) to orphan that last block?
Not saying this what's happening in the orphan race per se, or that it is good practice to accept a 1000 BTC payment after a single confirmation, but what stands is that there can be incentives to actively orphan a block. And in the case of an orphan race, it is a coinbase reward + fees riding in the balance.
The proposition goes as follow: in the case of a race between M and m, M is expected to recruit over 51% of network the hash rate behind its block simply because M > m. Now we have 2 miner groups, where G mines off of M's block and g mine's of m's block. There are 2 outcomes:
1) G finds a solution first for height H+1, validating M's block at height H. At this point m has lost the race and its block reward, but this result is irrelevant to others miners in g.
2) g finds a solution first for height H+1. However, M knows it can propagate its solutions faster than m, so there is a window in which M can actively try to orphan g's block and still propagate its solution for H+1 to 51% of the network faster than the g miner can. M has an incentive to do so, which is to save its reward of H.
What does that mean for N?
a) If N is part of G and finds a solution to block H+1 first, there is virtually no chance this solution will get orphaned. After all, N is guaranteed M will back this solution, so there is at least that much extra hash rate guaranteed, which reduces the propagation time of N's solution. If m can't compete M's against propagation time, there is no reason to believe it can compete against M + N.
b) If N is part of g and finds a solution to block H+1, it knows M will attempt to orphan this block for a short period of time, and may succeed. M will try to orphan N's block simply because a window exists in which it is more profitable for M than to start work right away on top of N's block.
So N has no chance of getting orphaned if it is part of G, and a non zero chance to get actively orphaned if it's part of g. What motive does N have to be part of g? What motive does N have to not be part of G? If one solution has a quantifiable risk and the other doesn't, all of this for the same reward, why pick the risky one? Keep in mind that for N, there is no cost to switching from one solution to the other. N can start mining on top of m and switch to M later at no loss.
Point being, this situation can take place without the need of a 51% cartel.
You could argue that M is a cartel. But it doesn't need be 51% to present that active orphaning threat to smaller miners.
You could argue that over 51% of the network using the same relay network is a cartel, but then again their motivation to join the relay network is not to actively orphan smaller pools but to reduce their own orphan risk. There is no agreement that other members of the relay network will support your block in a race, but there is a strong indication that they will get your solution faster than they will get one from miners outside of the relay network.
You could argue that a mining pool is in fact a cartel in its own right, but this is only evidence that any barrier to entry will promote the formation of cartels (after all a solo miner is always guaranteed to reduce his propagation time by joining a pool, and gets other benefits on top of it).
"But I've never seen blocks emitted so close one another"
Far from me to doubt your figures, but the reality is that the 1MB block cap prevents propagation time from exploding upwards. The point isn't so much that concurrent block solutions are propagated within mere seconds of each but rather the average propagation time based on position in the network topology and hash rate. If 20MB blocks were to bump your propagation time to 30 sec, you would be clearly be a victim of this mechanic.
"But it would never reach 30 sec"
30 sec is a bloated value for the sake of this example but is it really all that unrealistic? I think that currently, ~60% of the network hash rate is in China. They naturally propagate fast to each other, and slowly to the rest of the world. How big does the average block have to be in the absence of fast relay networks for Chinese miners to receive it in an average 30 sec? Do we really want to try and find out?
"But no miners submit themselves to this logic atm"
That's mostly because the low block size hard cap prevents network propagation from getting out of hand. If the hard cap goes away with the current network topology, eventually some miners will start exploiting this aggressive orphaning scheme and the rest of the network will have to go along. And there already is a group in a prime setup for this purpose: Chinese miners.
"But cartels!"
While I am trying to demonstrate this particular scheme does not require a 51% miner/cartel, the scheme on its own is only enabled by long propagation times. This characteristic of the mining network creates imbalance and modifies the rules of the game in way that promotes the formation of cartels, the same way the heavily subsidizes cost of electricity in China is, in my opinion, the main factor for the current state of the mining market (where Chinese miners dwarf everybody else).
This is not how Bitcoin propogation is supposed to work. If this is part of the XT proposal, it would destroy bitcoin, by centralizing more than can be mitigated with a fee market anyway. Such a change would render nodes no longer peers, as larger pools would be superior. This would break the 51% rule.
This ties back to the original topic, and the question that birthed this entire argument:
The business intelligence value of knowing who made what transactions alone is easily enough to pay for operating the network. Especially when one has access to 'big data' processing facilities and a big network footprint already. Much higher value than knowing something about people's search terms or scanning their e-mail and cloud storage contents. I would expect to see users (to use a more kind descriptor) get 'cash back' when things are ramped up.
I have little doubt that Mike is keenly aware of this principle, and I suppose that he didn't really want to use it as a sales pitch so he cobbled together some fairly questionable 'mining assurance contract' scheme or whatever it was.
That's outright disturbing if this presentation stands true. Otherwise it's kind of incendiary. I don't want to judge before I see the actual proposal. Surely someone knows about it.