Well, there may be several optimization targets. One is what you described to minimize the number of inputs to reduce the transaction size. Another may be to use the smallest possible inputs which increases transaction size but also reduces the number of inputs we have to track (long term optimization).
I'm not sure I understood this "long term optimization". By smallest I believe you mean in BTC amount, right? The goal would be to spend smaller addresses quicker in order to have less addresses with money in it, is that it?
Besides shrinking people's wallets, what optimization could this bring? These addresses would remain in the block chain anyway...
I was thinking about this recently - this doesn't seem like such a bad idea. It's not so much to shrink people's wallets, it's to shrink the block chain - or more specifically, to shrink the total number of bytes one must store in order to only have a record of unspent transactions. It might also be a good idea to favor spending coins from transactions that are large (in bytes) so the sooner those transactions can be pruned.
Aggregating small inputs has an anonymity consequence: it publicly links the owner of all those small inputs, providing far more useful information to someone trying to follow coins on the chain.
And finally we could try to use the oldest inputs which would allow the network to Stub off old blocks since the only useful information we can get from it is its hash which has been confirmed.
Can old transactions be really forgotten like this? I've seen people saying this around here before, but it sounds a bit dangerous to me. (deleting data is always a dangerous decision)
I mean, isn't it a problem at all to have a missing link in the chain? Suppose every miner does it, there would be no proof left that certain bitcoins are really linked to its generation block...
I am not sure that old transactions could or should ever be truly forgotten by everybody, but forgotten by most would be fine - it would be a per-user choice. When transaction pruning is properly implemented, the total size of downloading ALL transactions versus only unspent ones will diverge infinitely. Users could choose between a minimal block chain, or a multi terabyte version under the pretense of "donating your disk space and bandwidth resources to the network" for the purpose of preserving all history. A user could select whether they wanted full or pruned blocks and this preference would be sent via the P2P protocol.
I'm not sure yet which one is best
Well, I think - I might be wrong - that bandwidth is a scarcer resource than disk space. So, to me, it sounds more reasonable trying to focus on reducing the bandwidth necessary to be a miner. So maybe minimizing transaction size is more interesting...
By the way, it will probably be on transaction size that fees will be charged anyway.
By properly pruning blocks and taking actions to reduce size of past blocks, both can be reduced. If blocks aren't pruned, they must continue to be passed around in full. I would propose that it's a reasonable endeavor to focus on reducing the bandwidth of being a casual user, less so a miner.
Even so, being a miner requires no more bandwidth than being a non-miner. Either you are running a client that listens for all current transaction activity and maintains a block chain, or you do not. Mining should be equally possible with a pruned versus non-pruned block chain.