You can't assume that slowing down block propagation is a bad thing for a miner. First of all block propagation is uneven - it'd be quite possible for large pools to have much faster propagation than the more decentralized solo/p2pool miners we want in the network. Remember, if you can reliably get your blocks to just over 50% of the hashing power, you're better off than if they get to more than that because there are more fee paying transactions for you to pick from, and because in the long run you drive difficulty down due to the work your competitors are wasting. Of course reliably getting to 50% isn't easy, but reliably getting to, say, no more than 80% is probably quite doable, especially if efficient UTXO set handling requires semi-custom setups with GPUs to parallelize it that most full-node operators don't have.
Secondly if block propagation does become an issue miners will switch to mining empty or near empty blocks building on top of the most recent blockheader that they know about until they finally get around to receiving the block in full as a way to preserve their income. Unfortunately even with UTXO commits they can still do that - just re-use the still valid UTXO commitment from the previous block.(1) Obviously this has serious issues with regard to consensus... which is why I say over and over again that any UTXO commitment scheme needs to also have UTXO possession proofs as a part of it to make sure miners can't do that.
1) Modulo the fact that the UTXO set must change because a previously unspendable coinbase output became spendable; if we don't do explicit UTXO possession proofs we should think very carefully about whether or not a miner (or group of miners) can extract the info required to calculate the delta to the UTXO set that the coinbase created more efficiently than just verifying the block honestly. I'm not sure yet one way or another if this is in fact true.
Rather than just quote numbers I'll explain where they come from:
There are two indices: the validating index containing unspent transactions keyed by
Let T be the number of transactions with at least one unspent output, N the total number of unspent outputs, X the number of transactions in a new block, I the number of inputs, and O the number of outputs.
The number of hash operations required to perform an insert, delete, or update to the indices is 1, log(T) or log(N), and len(key) for the best, average, and worst case respectfully. Since the keys of both indices involve hashed values (txid, and typically also the content of scriptPubKey), creating artificially high pathways through the index structure is a variant of the hash pre-image problem - you are essentially constructing transactions whose hash and output scriptPubKeys have certain values so as to occupy and branch specific nodes in the index structure, for the entire length of a given key. This is computationally expensive to do, and actually growing the pathway requires getting the transactions into the UTXO set, which requires some level of burnt BTC to avoid anti-dust protections. So it is not economically viable to construct anything more than a marginally worse pathway; the average case really is the only case worth considering.
re: content of the scriptPubKey, change your algorithm to index by H(scriptPubKey) please - a major part of the value of UTXO commitments is that they allow for compact fraud proofs. If your tree is structured by scriptPubKey the average case and worst case size of a UTXO proof differs by 454x; it's bad engineering that will bite us when someone decides to attack it. In any case the majority of core developers are of the mindset that in the future we should push everything to P2SH addresses anyway; making the UTXO set indexable by crap like "tags" - the rational for using the scriptPubKey directly - just invites people to come up with more ways of stuffing data into the blockchain/UTXO set.
validating: (X+I) * log(T)
wallet: (I+O) * log(N)
Currently, these parameters have the following values:
T = 2,000,000
N = 7,000,000
X = 7,000 (max)
I = 10,000 (max)
O = 15,000 (max)
Those are approximate maximum, worst case values for X, I, and O given a maximum block size of 1MB. You certainly couldn't reach all three maximums simultaneously. If you did though, it'd be about 1,000,000 hash operations (half a "megahash," since we're talking single SHA-2), and more importantly 1,000,000 updates to the leveldb structure, which is about 4-5 seconds on a fast disk. My earlier calculation of a max 200,000 updates (computable in about 0.8s) was based on actual Bitcoin block data, and is somewhat more realistic as a worst case.
Worst case would be if a miner creates a block filled with transactions spending previously created 10,000 byte scriptPubKeys and creating no new ones. A minimal CTxIn is 37 bytes, so you'd get roughly 1,000,000/37=27,027 txouts spent. Each txout requires on the order of 10,000 hash operations to update it, so you've actually got an upper limit of ~270,270,000 updates, or 22 minutes by your metric... There's a good chance something else would break first, somewhere, which just goes to show that indexing by scriptPubKey directly is a bad idea. Sure it'd be tricky and expensive to create such a monster block, but it is possible and it leads to results literally orders of magnitude different than what is normally expected.
re: single SHA256, I'd recommend SHA256^2 on the basis of length-extension attacks. Who cares if they matter; why bother analyzing it? Doubling the SHA256 primitive is a well accepted belt-and-suspenders countermeasure, and done already elsewhere in Bitcoin.
Conceivably, when receiving and validating a block it's also possible to start with a pre-generated index based off the last block + mempool contents, and simply undo transactions that were not included. Assuming, of course, that the mempool is not getting backlogged, in which case it would make more sense to start with the last block. If you collect partial proof-of-works, then you could pre-compute updates from transactions which appear in many of the partial blocks. There's a large design space here to explore, in terms of optimization.
Lots of optimizations sure, but given that Bitcoin is a system where failure to update sufficiently fast is a consensus failure if any of those optimizations have significantly different average case and worst case performance it could lead to a fork; that such optimizations exist isn't a good thing.