I may not understand what's under discussion here, but if we're talking about new users and lightweight clients, what's wrong with downloading only the headers (which should take less than 5 minutes), and then only download the parts of the blockchain that have timestamps
after the creation date of each key in the wallet?
A new user who doesn't have any keys, will be creating those keys right now, and actually doesn't need any blocks that were created before his own computer just generated them a minute ago. He only needs the full set of blockheaders to figure out the longest chain and determine "truth" for when he does need to get blocks. He stores the parts that are relevant to himself, and will always have a full list of available txouts, without any need to trust anyone else.
Sure, you can't verify other users' txs easily, unless you see the tx in the blockchain with X confirmations. This may make some people uncomfortable, but I believe the future will eventually require people to trust the longest chain (and all the Tx's in it) since it will eventually be infeasible for people to store the entire blockchain themselves.
Btw, you mentioned python: check out my codebase,
PyBtcEngine. Right now, the full suite uses the full blockchain, but I do plan to make a lightweight version of it. There's no networking yet, but it does handle just about everything else (the last thing I need is knapsack optimization to create a set of txOuts to send to my ECDSA signature code). You might find the python code alone to be useful without any of the C++/SWIG, you just won't have access to the entire blockchain without the C++ (I found it way too slow to juggle the full chain in python).