I think that we need some kind of control on the size of the mempool eventually if it grows too big. I'm not sure that PoW is the way to go, though - I imagine that the outcome would be that spamming is still trivial with, say, a Blockerupter, while ordinary users need to compute very long to get a suitable PoW.
We could start identifying TXs by the nodes where the TX initiated. That way, a honest node would only have to calculate the PoW once during its existence. The calculation should be difficult. When other nodes detect TX flood from a single node they would revoke the PoW and not relay those TXs any more until a new PoW is attached.
It is easy to revoke a PoW from a malicious node and it is very difficult to calculate a new PoW. The malicious user would have to constantly calculate new PoWs while the honest nodes with their existing PoW can keep sending TXs normally. At times of high number of unconfirmed TXs the TX PoW difficulty should be risen in correlation to the number of those TXs.
When a honest node receives a TX they would check the internal database for a PoW that was somehow attached to the TX. If such a PoW exists and is not flooding the network, the node relays the TX. If the PoW exists but is flooding the network with spam TXs, the PoW will be revoked locally and the TXs with that PoW are no longer relayed. If the PoW does not exist locally the node checks the current number of unconfirmed TXs, calculates the difficulty and checks whether the PoW meets the criteria. If the criteria is met, the PoW is added to the local database. Otherwise, TXs with that PoW are not relayed.