So if his big blocks bring him 20% more income per block, this is neutral.
Well, I'm not sure if we are on the same page.
Of course, you can include more transactions and collect more fees by building bigger blocks, but that doesn't solve the fundemental problem that A's fee revenues must be multipled by his blockchain production (fraction of the chain built by A), rather than by his rate of successful blocks (ratio of non-orphaned blocks).
Let me come back to my example of the three miners A, B and C, all with a hashrate of 1/3 and an orphan rate of 0.01.
Now, assume that A and B stick to a block size of 1mb, while C tries to find the block size that maximizes his profits.
C can do so by gradually increasing the block size as long as the higher orphan rate (resulting in a lower production share) is outweighed by the higher fees. As the orphan rate follows an exponential distribution and the marginal fee income tends to decrease, there will be an equilibrium where marginal revenue = marginal costs. Let's assume that C's profits are maximized with an orphan rate 0.2, so that his current blockchain production rate will be 0.288, while that of B and C 0.356 each.
I think that's the point where you need to stop, and why I think that all of this has not much sense.
I start from the idea that miners have incentives to be on a good backbone network, directly between them, and do not wait for the P2P network to bring a block to them. In other words, the 10 or 20 big mining pools are on a rather fully meshed, high speed backbone.
I already explained why, because that diminishes their orphan rate, and they are mutually inspired to improve their network links.
If you accept that as a given, then it is impossible to start considering orphan rates that become important due to network and block size problems. There is of course always a given orphan rate, but that orphan rate must be small.
If your (relative) orphan rate is, say, 1%, it means that your income is multiplied by 0.99. If your block size doubles, and your relative orphan rate doubles because of that, you multiply it with 0.98.
By the time that "doubling your blocks" starts to be OFFSET by the diminishing of your income because of orphaning, you see that the orphaning rate must be HUGE. Not 2 or 4%, but 50% or so.
Well, that is impossible. Because if you orphan 50% of your blocks on the chain, it means that you even orphan more than 50% of your successful blocks, which means that you don't even reach your other miners over the back bone with the blocks.
If that is true, nobody else can download the block chain. It is being produced at a rate that is almost saturating a back bone. So no one with a lesser link can ever download the block chain and keep up to date.
So by the time that this problem of orphaning blocks because of their size starts influencing the income of miners, the block chain is growing so fast that NOBODY CAN DOWNLOAD IT.
This story is different if miners are random nodes in a P2P network. But they aren't. They have all interest to invest in strong network links to other miners, exactly because of this orphaning problem.
So in other words, all this theoretical BS over how the orphaning rate offsets the desire for bigger blocks and imposes a natural equilibrium is meaningless, because if ever such an equilibrium would theoretically exist, it occurs for such big blocks that nobody can download the block chain apart from the miners themselves.
Yes, you can say that an "optimum" is reached when the network stops downloading the chain, and nothing works any more. True, in a certain way
EDIT: I hadn't understood something in your post, but now I see what you are getting at.
If we consider *really small* fees, then for an extra included transaction, that extra delay on the network will mean an extra probability of the block being orphaned, putting in jeopardy the whole income. This even happens with small blocks.
Yes, this will simply result in cutting off the very lowest fees of the fee distribution, which will remain for ever in the mem pool.
I don't think that this has much to do with "optimal size" ; it only means that one doesn't include the cheapest transactions below a given fee threshold, because their extra transmission time penalises the whole income while not contributing enough to it.
That said, a market doesn't need to come to "equilibrium". An erratically chaotic market dynamics can be fun too