Network Protocol
The can be implemented without a soft fork of any kind. It is purely a protocol change and so should be less controversial.
- Automatic reindex of undo info (run once)
Add new service bit EXTENDED_TX- Update transaction serialization
- Add upgrade method to add info to transactions received from legacy clients
- Check/Upgrade raw transactions received over rpc
- Update the add to memory pool code to only allow new type transactions
- Update the code that reads blocks from disk to use undo info to add extra information
I think the service bit is a waste of a bit. Eventually, most nodes would upgrade. A UTXO-lite node could just disconnect from legacy clients when making outbound connections.
Soft Fork
The canonical digest could be defined as Hash(canonical_salt | new OutPoint serialization).
New nodes would use block version 4, with the normal thresholds for rule activation.
If 750 of the previous 1000 blocks are version 4 or higher, the rule is active for all blocks. Otherwise, the canonical salt is reset to zero.
If the rule is active and the canonical salt is set to zero, then the canonical salt is set to the hash of the block being connected to the chain and a collision test is performed. If that causes a collision (after UTXO set updates for the current block), the canonical salt is reset to zero.
Once a non-colliding canonical salt is found, it is locked and subsequent blocks which cause a collision are rejected.
If 950 of the previous 1000 blocks are version 4 or higher, version 3 and lower blocks are rejected.
I need to check how long iterating through the UTXO set would take to check for collisions. If this uses significant CPU/disk resources, then a DOS attack can be launched by activating and deactivating the rule.
If it is greater than around a minute, then a more efficient way to test a new canonical salt would be required or hysteresis added to the super-majority function.
With 20 million entries and 8 bytes per entry, that gives 160MB of RAM to create the set. If a hash set is used, it would be closer to 200-250MB of RAM. This is probably OK for a once off burst.
It is also 20 million hashes. CPUs can handle around 1Mhash per second. That gives 20 seconds just for the hashing.
[Edit]
I wonder if the soft fork is really worth it. The collision avoidance is likely not worth it, since local salting is still required to protect against finding a collision with the CanonicalDigests.
[Edit2]
UTXO commitments should probably still use the full 32 bytes, since they can't be protected by a local salt.