Pools have an incentive to keep their blocks small so that they propagate fast, reducing their chance of being orphaned. There is certainly no incentive to include free transactions, and low-fee transaction are probably also a negative prospect. eg. adding an extra 500KB to a block for 0.1BTC might not be worth it : it adds 0.4% more returns to the 25BTC base reward, but if that means 3 seconds extra propagation time then it adds a 0.5% extra chance of being orphaned, so it would not be worth doing.
I've just made these numbers up based on my own assumptions of transaction size and propagation times, but hopefully you see the point. Maybe it's the case that some pools have recently done the math and tweaked their thresholds to optimise their returns.
Well if a pool was looking to maximize the short term revenue, block with the highest possible net revenue is one with no transactions (other than the coinbase). A miner would simply mine only empty blocks and ignore all transactions even paying ones. The best estimate (with current protocol) is that the orphan cost is about 3.3 mBTC per kB. Paying tx are ~0.1 mBTC so the inclusion of any tx is a net loss of revenue.
That being said hopefully miners and pool ops realize that having a lot of coins doesn't do you much if you cripple the growth of Bitcoin. 2% less coins which are each worth 10x as much because you helped grow the adoption of Bitcoins is a pretty good deal.
One thing Gavin pointed out, is that if all miners have higher orphan rates they don't lose any revenue. What matters is RELATIVE orphan rates. If you have an orphan rate of 2% and the network on average has an orphan rate of 1% then you are losing 1% revenue. However if you have an orphan rate of 5% and the network on average also has an orphan rate of 5% you aren't losing anything. 5% of your blocks are orphaned but so are everyone elses and as a result the difficulty is 5% lower. The big miners should sit down and discuss mutually raising block sizes. Even if you can't get everyone onboard the orphan costs can be reduced if miners agree to all raise block sizes. Getting 50%, 60%, 70% of hashpower in agreement should seem possible as honestly that is just what getting the top 6 or 7 pools and major solo miners to find common ground?
Longer term a more efficient method of block propagation is possible which should reduce that orphan cost. Today when a block is broadcast it all the tx inside the block are also broadcast as part of the block message. Most nodes already have these txs so it is just wasted bandwidth. One could instead include just the tx hash in the block message and that would cut the size of a block message by up to 80%. For larger savings a reduced length hash could be used instead. Collisions here are not a security risk and would still be incredibly rare and that would allow reducing the block message size (and thus propagation delay) by 95% or more.
Still don't want to be all doom & gloom, even if nothing is done over time Moore's law will mean higher bandwidth at lower costs which will reduce the propagation delay. Also the block subsidy being cut in half again in ~3 years will reduce the "distortion" that the large subsidy has no fee pricing.