4% of total transactions
You don't know "the" total transactions. You know "a" total transactions, because there is no "the" mempool, but there are many mempools, and each node has its own. I can generate 1 TB mempool using any deterministic algorithm, and then say "we need 40 GB blocks now, because of 4% rule".
numbers are only for example
It doesn't matter, because if it would be any percentage, then it can be always bypassed by locally generating a lot of transactions.
Overall, I don't see anything wrong with if we increase current block size.
What about Initial Block Download? You can always decrease block size, but you cannot forget the history. It is always needed to create new nodes, unless you start making backward-incompatible changes. Note that if everyone will switch to pruned mode, then nobody can create a new node, because then there is nobody to connect to. You cannot sync trustlessly from a pruned node.
It's not expensive to run node today even if we double or triple block size and the demand isn't similar to the demand of 2010's.
Total blockchain size is only getting worse. Improvements in technology didn't make verification much faster, creating a new full node often took me something around a week. If you have around 500 GB of history to verify, it will still take a lot of time, even if some soft-fork will now force to include nothing else than the coinbase transaction.
That's why they move on altcoins.
Ideally, we should have sidechains for such things. Then, it could be possible to peg-in, create a lot of history, and then peg-out, finalize it on-chain, and drop everything what happened on some second layer, because it is then not needed for Initial Blockchain Download, and sidechain users don't need sidechain history for things that happened before their coins were pegged in (exactly in the same way as LN users can only store things since their channel was open, and they don't need all in-the-middle transactions from all other LN channels).
That doesn't make sense, unless such miner/pool have goal to bloat Bitcoin blockchain.
We should assume that there will be some bad actors, always producing as large blocks as possible. We had times with congested mempool, including everything "as is" will not solve anything. It will only make Initial Blockchain Download longer, and people will still produce bigger and bigger blocks.
With such long verification time, it would hinder propagation where other miner could beat you since their block is far faster to be verified and propagated even though they mined it few seconds to minutes after you did.
Even if some big block will not end up in the main chain, many nodes will still waste a lot of time trying to verify it. So, that way or another, there will still be some "guessing time", unless you introduce some protections, like "after spending one minute on verification, stop it, and try another block", but then any such protection will put some artificial limit on block size, just not measured in bytes, but maybe in seconds instead (and then many machines will allow different blocks, based on their computational speed, so you will then never know, how many machines accepted some block or not).
expending so much effort on producing an illegal block is stupid and exceedingly rare
Sometimes it is more common than you may think. For example,
some mining pools have problems with counting sigops. And then, it is just another attack vector: if you know that there are some such pools, then you can create transactions with a lot of sigops, and broadcast them. Those transactions alone will be valid, and even included in other blocks, but some pool may produce a block that violates this rule.
And then, the question is: how you can check if sigops rule is violated or not, without checking the whole block? If you start mining on top of not-yet-validated block, then there is a risk that finally, one of those tricky rules will mark it as invalid, and then you will waste your coinbase-only block.