The only solution to this problem is to make broadcasting of a transaction "non free". Namely, if you want me to include it you have to pay me. The net (no pun intended) result is that each client would need to pay other clients to whom they even send their transaction, not just the individual who gets it in a block. In this way the laws of economics take over and no one gets a free ride on the transaction broadcast system.
I don't know a way to implement that. The transaction fee to the block creator uses a special trick to include the transaction fee without any additional size. If there was a transaction for each transaction fee, then what about the transactions fees for the transaction fee's transaction?
Should people who effectively clog the blockchain pay a financial penalty.
Maybe they should, but it needs a lot of brainstorming to implement that correctly. I can imagine many things that could go wrong. Currently, we have satoshis per virtual kilobyte system, and everyone can see how it works. Before, there was a coinage-based system: the older your coins were, the higher priority your transaction had. Also I saw some people mentioning UTXO-based fees: the more UTXO you create, the higher your fees are. And if you consume many UTXOs, you make it easier for pruned nodes, so maybe that should be even rewarded with some discount (or some buffer for free transactions to do that kind of UTXO cleanup, but of course it should be designed carefully to avoid misuse).
That way or another, it is possible to change that without any fork. The hardest thing is presenting a better system, and convincing mining pool operators, that your way of handling it is better than the status quo. And of course, you can start with your own full node, then you will see if your local mempool is really handled better, because the first step is to check your system in practice.
If smaller transactions ( smaller in size, and not in value), were charged a low efficiency fee, and larger ones had to pay quadruple ( say ) fees, then there would be a disincentive to submit so much rubbish on the blockchain.
No matter what model you will take, it is hard to design a system, where you won't punish transaction batching (and UTXO-based fee model mentioned above also has the same problem). Because if you have a transaction that has one input, and creates hundreds of outputs, it does not mean it is automatically bad. It could be some batched withdrawal from some exchange, and if they for example collect all withdrawals, and produce one transaction per 24 hours, then you don't want to punish them (because replacing all of that with hundreds of small transactions would be worse).
How could this be implemented? Would it require two mempools, or just approval by the miners?
One mempool is all you need. After all, finally you have a single block, and all transactions are placed in a given order. Either you include or exclude something, it is that simple. Of course you can locally make it more complicated, and have 20 layers to filter things, but finally, it will be a single binary decision: you include or exclude some transaction.