Maybe I have not verified it well enough, but I have an impression that the original client, whenever it sends "getblocks", it always asks as deep as possible - why to do this? It just creates unnecessary traffic. You can surely optimize it, without much effort.
Not sure what you mean by "as deep as possible". We always send getdata starting at whatever block we already know. The reason for starting from early blocks and moving forward is because validation is done is stages, and at each point as much as possible is already validated (as a means to prevent DoS attacks, mostly). As most checks can only be done when you have the entire chain of blocks from genesis to the one being verified, you need them more or less in order.
IMO, you should also optimize the way you do "getdata".
Don't just send getdata for all the block that you don't know, to all the peers at the same time - it's crazy.
Instead, try to i.e. ask each node for a different block - one at a time, until you collect them all...
That's not true, we only ask for each block once (and retry after a timeout), but it is done to a single peer (not to all, and not balanced across nodes). That's a known badness, but changing isn't trivial, because of how validation is done.
There is one strategy however that's pretty much accepted as the way to go, but of course someone still has to implement it, test it, ... and it's a pretty large change. The basic idea is that downloading happens in stages as well, where first only headers are fetched (using getheaders) in a process similar to how getblocks is done now, only much faster of course. However, instead of immediately fetching blocks, wait until a long chain of headers is available and verified. Then you can start fetching individual blocks from individual peers, assemble them, and validate as they are connected to the chain. The advantage is that you already know which chain to fetch blocks from, and don't need to infer that from what others tell you.
Last, but not least.
Forgive me the sarcasm, but I really don't know what all the people in the Bitcoin Foundation have been doing for the past years, besides suing each other and increasing transaction fees
The Bitcoin Foundation has only been around for a year or so, and they don't control development. They pay Gavin's salary, but other developers are volunteers that work on Bitcoin in their free time.
Nothing!
The blocks are getting bigger, but there have been no improvements to the protocol, whatsoever.
You cannot i.e. ask a peer for a part of a block - you just need to download the whole 1MB of it from a single IP.
BIP37 actually introduced a way to fetch parts of blocks, and it can be used to fetch a block with just the transactions you haven't heard about, so it avoids resending those that have already been transferred as separate transactions (though I don't know of any software that uses this mechanism of block fetching yet; once BIP37 is available on more nodes, I expect it will be). Any other system which requires negotiating which transactions to send adds latency to block propagation time, so it's not necessarily the improvement you want.
The way I see it, the solution would be very simple: every mining pool can easily use CloudFlare (or any other cloud service) to serve blocks via HTTP.
So if my node says "getdata ...", I do not necessarily mean that I want to have this megabyte of data from the poor node and its thin DSL connection. I would be more than happy to just get a URL, where I can download the block from - it surely would be faster, and would not drain the peer's uplink bandwidth that much.
There are many ideas about how to improve historic block download. I've been arguing for a separation between archive storage and fresh block relaying, so nodes could be fully verifying active nodes on the network without being required to provide any ancient block to anyone who asks. Regarding moving to other protocols, there is the bootstrap.dat torrent, and there's recently been talk about other mechanism on the bitcoin-development mailinglist.