Another coding optimization would be if you receive a second block solution, to immediately discard the first and push out the newer one if it has a significantly smaller hash, of course still keeping track of both as valid work from miners.
Chain proof-of-work is calculated based on the hash target, so if you get another block at the same height there is no benefit to keeping the one with "the smaller hash".
Maybe if you receive a second block solution, keeping the block that removes the most transactions from the memory pool would be the right "good for the entire ecosystem" policy.
Using "the smaller hash" has a benefit: It is unambiguous!
Orphans are the result of a network split (different portions of the network consider a different block as the best current block). If a client knows two blocks and has to decide, using the smaller hash will always lead to the same decision. This works towards the goal of reducing orphans (reducing the occurrance of differing decisions on distinct clients).
Your proposal of using the "most beneficious block" does not necessarily fulfill this goal. Since transactions themselves suffer similar network split problems, two distinct clients can make differing decisions about the same set of blocks. Thus leading to the same result: orphans. In fact, your solution may possibly produce more orphans than before.
In the past I have thought about the propagation problem as well, but refrained from posting a formal proposal so far. However, I came to a similar result. The single best solution is to use the "smaller hash" (as in "more work" or better: "more luck") which is universally testable for everyone who knows the same particular set of competing blocks.
Also, in my thoughts, it makes sense to consider a maximum expected propagation time. A block will typically take no more than N seconds to propagate to most clients. I made calculations, extrapolated from number of hops, current network size, but dont remember the exact outcome. Whatever the best value of N is does not matter here. What matters is that when a client first receives a block of new height, it should open a "propagation curtesy window" of N seconds. When a competing block is received within that window, then it can be reasonably considered potential victim of network skew and a resolution can be made by prefering the "lower hash" block.
If a competing block is received outside of the N seconds window, it can be disregarded(like now). It can't be victim of just network skew (unless N was poorly chosen).
The advantage is obvious. Mostly everyone will work on the same block, even in situations that currently produce an orphan. AND, the proposal is completely voluntary and 100% compatible. Nobody would be able to tell whether or not you have implemented it (unless he is monitoring your network traffic and sees in which order you have really received the blocks). This is the best proof for compatibility. There is no disadvantage to the current system. The more miners implement it, the lower the orphan rate goes.
And last not least, this approach doesn't open a security vulnerability either. It is more difficult to create a block with "lower hash" than to create a "just enough" block with better height. Therefore there is no benefit of trying to create "lower hash" blocks for already existing blocks. Since this method relies on nothing else but the block hash, is is "unhackable". Your proposal however does not have this property. It also depends on the transaction memory pool, which can easily be poisoned.
I have thought about it since about 3 months, and I am absolutely convinced that it is the universally best solution.