This requires the tree to be completely regenerated for each block, since the hash of every transaction would change.
Read the rest of the message - that's just a toy example to illustrate the general concept. My worked example does not require the whole tree to be recalculated nor does it require the block number for lookup.
The attack on the system I suggested would require having lots of transactions that has the same initial bits in their hash.
If you had 1 million of them and added them to the tree, then it would become quite deep. Each time you add one, the nodes would have to recalculate the entire path from root to that node and redo all the hashing. The effort per node would be proportional to N squared, where N is the number of collisions.
I think your misunderstanding stems from the idea that the datastructure and the authenticating hash are the same thing. They aren't, they're quite separate. The authenticating hash is an addition to the datastructure, and can be calculated independently.
Take your example of adding a million transactions, which is unrealistic anyway as a 1MiB block can't have more than a few thousand. In a radix tree what you would do is first add each transaction to the tree, keeping track of the set of all modified nodes. Then once you had finished the update you would recalculate the hashes for each changed node, deepest first, in a single O(n) operation.
A real Bitcoin node would basically have a limit on how much CPU time it was willing to use on adding transactions to the UTXO set, and collect incoming transactions into batches small enough to stay under that limit. Given we're only talking tens of updates/second chances are this optimization wouldn't even be required anyway.
Anyway, thinking about this a bit more, on reflection you are right and I think this unbalanced tree stuff is a total non-issue in any radix tree or similar structure where depth is a function of common-prefix length. For the 1.6million tx's in the UTXO set you would expect a tree 20 branches deep. On the other hand to find a transaction with n bits in common with another transaction requires n^2 work, so right there you aren't going to see depths more than 64 or so even with highly unrealistic amounts of effort thrown at the problem. (bitcoin has done a 66bit zero-prefix hash IIRC)
The issue with long depths is purely proof size, and 64*n isn't much worse than 20*n, so as you suggest, why make things complicated?
Too bad, it was a clever idea.
If you really need to prove a tx was in the UTXO set as of the current best block, either wait for another block to be generated, or prove that it isn't in the UTXO set of the previous block, and prove that it is in the merkle tree of the current best block.
You will probably have to do something like that anyway. The plan was to merge mine, so some bitcoin blocks won't have matching tree root node hashes. So, prove it wasn't spent up to 10 blocks previous and then check the last 10 blocks manually.
Yeah, until this is a hard and fast network rule you'll have to check that the hashing power devoted to these UTXO proofs is sufficient.
Speaking of... we should have every one of these UTXO things have a reference to the previous
valid UTXO set. This would turn the whole shebang into a proper
chain and thus allow clients to figure out both which is the longest chain, and how much hashing power is being devoted to it.
Equally if we think in terms of merge-mining a chain, adding support for additional UTXO indexes, such as known scriptPubKeys and so forth, is just a matter of adding additional chains to be merge mined, and UTXO-only and UTXO+scriptPubKey can co-exist just fine in this scenario. A disadvantage is we'll need to add some stuff to the P2P network, but I have another idea...
So right now, merge-mined chains have the problem that the proof of work goes through the coinbase transaction. The issue here is that to prove a path to the proof of work, you need the whole coinbase transaction, and it can be quite large, for instance in the case of P2Pool or Eligius. So I'm proposing a new standard, where one transaction of the following form is used to include an additional merge-mined digest, and that transaction will always contain exactly one txin, and one txout of value zero using the following:
scriptSig: <32-byte digest>
scriptPubKey:
Any digest matching this form will be assumed to represent the miner's hashing power, thus miners should not allow such transactions into their blocks blindly. They are currently non-standard, so this will not happen by default, and the scriptSig has no legitimate use. The scriptPubKey is spendable by the scriptSig, so for the txin miners would usually use the txout created by a previous block following this standard. If none are available the miner can insert an additional zero-fee transaction creating a suitable txout (of zero value!) in the block.
The digest would of course represent the tip of a merkle tree. Every merge mined digest in that tree will have a zero byte appended, and then the digests will be hashed together. What would go in the alt-chain is then the merkle path, that is every leaf digest required to get to the transaction, and what side the leafs were on. Note how this is different from the merge-mining standard currently used by namecoin, and fixes the issues it has with conflicting slot numbers.
Appending the zero byte is critical, because that action means to verify the an alt-chain block hash was legitimately merge-mined you simply check that every other digest in the path to the transaction has exactly 32 bytes, and that the transaction itself follows the above form. Note this also means the miner has the flexibility to use something other than a merkle tree to combine the alt-chain block hashes if they want; alt-chains should put reasonable limits of PoW size.
Alt-chains should also still support the same idea in the coinbase, for miners that don't plan on making large coinbase transactions. (the majority)
Now, the useful part, is we can easily add more than one of these transactions, and distinguish them by different sequence numbers in the txin. You would do that for the UTXO set, with one value of defined for just the UTXO set itself, another for a scriptPubKey index, and whatever else gets added as the system is improved. The previous block where this miner considered the merge-mined UTXO digest to be valid can be specified with nLockTime. The advantage of this idea is that because the final UTXO digests are completely deterministic we don't need to build a new P2P network to pass around the digests in the merkle path to the coinbase.
This also shows the advantage of using a separate transaction, because it keeps the UTXO proof size to a minimum, important for applications like fidelity bonded banks where UTXO proofs would become part of fraud proofs; we need to keep proof size to a minimum. Equally being able to prove the hashing power devoted to the UTXO set merge-mine chain is useful, and the only way to do that is provide a bunch of recent UTXO block headers and associated PoW's. Again, keeping the size down for this is desirable as the SPV node examining those headers may be on a low-bandwidth connection.