I think that because the problem is mainly about orphan risk, it needs to be addressed specifically via orphan risk.
consider
https://en.bitcoin.it/wiki/Proof_of_blockchain_fair_sharing as a basis for a solution.
The premise is that the longer a transaction has been in existence, the more important it becomes to the acceptance of the next block.
Although the nodes may not all be looking at the exact same transaction list, if the numbers seen are similar, they should be computing a very similar 'credibility score' for the next block. And if the 'credibility score' is too low, they ignore that block, mining on the previous block instead.
This creates an orphan risk for *failing* to include transactions (specifically the oldest outstanding transactions) that balances the orphan risk for *including* transactions (and using up block space to do it).
BDD = Number of bitcoin days destroyed
BD1 = Number of bitcoin days in fee-paid, nonconflicting tx 20 minutes or more old and not yet included in a block
BD2 = Number of bitcoin days in fee-paid, nonconflicting tx 40 minutes old and not yet included in a block
BD3 = Number of bitcoin days in fee-paid, nonconflicting tx 60 minutes old and not yet included in a block
BD4 = Number of bitcoin days in fee-paid, nonconflicting tx 80 minutes old and not yet included in a block
etc.
So let
Credibility = BDD/ (BDD+ BD1 + 2^(BD2 + 2^(BD3 + 2^(BD4 + ... ))))
And if we find a chain is too 'incredible' (less than 0.000001 or so) we just ignore it and mine on a more credible chain (even if that more credible chain is just the current 'incredible' chain minus the last block). Because miners are looking at different tx pools, or may have learned about the same tx at different times, they may calculate different 'credibility' thus that some accept a new block and some don't. But this shouldn't matter much because if 40% of the miners reject a block then that block gets a 40% orphan risk, which has a tangible cost to the miner producing it and gives the miners a strong motivation to avoid creating the kind of blocks for which that might be an issue.
This appropriately prioritizes fee-paid nonconflicting transactions that have been waiting for the longest time, and has the benefit of allowing the blockchain to resist a more-than-51% attack at least in terms of making sure that no one can keep a valid but unfavored tx *out* of the blockchain forever.
Does it cause a problem that initially rejected blocks can get brought into the chain via a reorg, when they become part of a chain that is, thanks to a later block that includes the tx it missed, more 'credible?'