It is not just about data compression, performance is on the table as well because having nodes to go through the same signature generating/verification process for tens/hundreds times is a significant waste of time, remembering the nature of ECDSA as a cpu intensive task.
To make it backward compatible, you have to keep it that way for all old nodes, as far as I know. Making some new version that will have better performance should give enough incentive to upgrade.
On the other hand, your compression idea isn't backward compatible either because transactions broadcasted in their compressed form will never get decompressed by legacy nodes and there is no practical way to make them doing this. Please note that txid is generated with the compressed form and you just can't have a same txn with two txids in the network.
It is backward compatible, because old nodes will receive transactions in uncompressed form and new nodes will receive them in compressed form. For backward compatibility, calculating transaction hash from uncompressed form is needed. So: old nodes will process such transactions as they do it today. Old nodes will keep sending and receiving these transactions the same way as today. But new nodes will process them faster, because they will be able to compress/decompress such transactions. And they have to do it only once and then remember that Alice's transaction uncompressedTxidA is equal to Alice's transaction compressedTxidA. New nodes could send, receive and process transactions in compressed form. Old nodes have to process all of them in uncompressed form, unless you move the coins to some future-segwit-version-outputs, then they will blindly accept it without processing.
Edit: some description to explain all cases:
1) old node to old node will send everything in uncompressed form (to make it backward-compatible)
2) old node to new node will send everything in uncompressed form, but that new node will compress it once, and then will be able to store and process such transaction in compressed form
3) new node to old node will send everything in uncompressed form, but decompression will be simple and could be done on-the-fly, because it will contain simple things like "repeat N bytes M times"
4) new node to new node will send everything in compressed form (and because it will have better performance, people will upgrade quite soon)
Edit:
I remember this idea been discussed and denounced by Greg Maxwell because he believes such options possibly encourage address re-use that is considered a bad practice in bitcoin.
In my idea, transaction hashes are still calculated from their uncompressed form. And that means that fees for spending such transaction should be the same. So, if you have some 100 kB transaction that can be compressed to 1 kB transaction, you still have to pay for taking 100 kB blockchain space.
With Schnorr and taproot with signature aggregation built-in capabilities coming, it is just useless.
It is useless for new addresses, but still, compressing the history is an incentive, so such things should be done to make running full archival nodes cheaper if possible.
And with my compression idea, it requires no forks, you can apply any kind of compression you want just right now without asking. The only reason to standardize it is that if there will be many nodes compressing data in the same way, then less bandwidth will be used, because then it will be possible to send and receive things in compressed form.