Would it really?
Imagine that only 10% of the network accepts blocks over 10 MB and 100% accepts blocks less than 1 MB. What if that 10% got lucky and generated two 11 MB blocks in a row? Well, the other 90% would just ignore them because they are too large. So, those blocks get orphaned because the rest of the network found three small blocks. If you just accepted the 11 MB blocks as a confirmation and sent goods because of it, you could be screwed if there was a double-spend.
I understand, I just think that's not such a serious issue to motivate a change in the 10 min interval too. Instead, if relaying orphans becomes a normal practice, nodes would be able to see whether there's another branch in which their transactions don't exist. If your transaction is currently in all branches being mined, you're certain that you got your confirmation.
So, to counter the problem you raise, I think that relaying orphans is good enough. Why wouldn't it be?
This doesn't make sense. Given any set of network parameters, there is always a single global optimum strategy for miners to maximize their revenue by prioritizing transactions. Tolerances for block sizes are not something that miners will have a wide variety of opinions on - the goal is always to make money through fees (and the subsidy, but that doesn't change based on which tx are included). Besides, why on earth would we want to waste hashing power by causing more orphans?
There would be some variety, surely. In the blocks they produce themselves, miners will search to optimize the ratio (time to propagate / revenue in fees), while in blocks they receive from other miners, they would rather it be the smaller possible. These parameters are not the same for different miners, particularly the "time to propagate" one, as it strongly depends on how many connections you can keep established and on your bandwidth/network lag.
Plus, if there is an "global optimal max size", it's quite pretentious to claim you can come up with the "optimal formula" to calculate it. Even if you could, individual peers would never have all necessary data to feed to this formula, as it would have to take into consideration the hardware resources of all miners and the network as a whole. That's impracticable. Such maximum size must be established via a decentralized/spontaneous order. It's pretty much like economical central planning versus free markets actually.