A couple corrections:
In the description for Section C, drop the word "addresses" as there is no address to represent a complex output script (other than P2SH). You also need to add an element to represent "m" (the number of signatures), "n" can be implicitly determined by the number of PubKeys. This is pretty easy to achieve as native multisig (not be confused w/ P2SH) is limited to m<=2 and n <= 3 resulting in only five combinations possible (1 of 1, 1 of 2, 1 of 3, 2 of 2, 2 of 3).
Pay2PubKey and native multisig outputs use PubKeys not PubKeyHashes and they are 33 or 65 bytes not 20 bytes. Only ScriptHashes and PubKeyHashes are 20 bytes. You could either handle this by treating compressed (33 bytes ea) and uncompressed (65 bytes ea) PubKeys as a separate type ("tag" in your format) or by just parsing the first byte (0x2 or 0x3 = compressed = 32 bytes follow, 0x4 = uncompressed = 64 bytes follow). The good news is that 99.9%+ of spendable outputs fit one of 7 standard templates. By also having an unknown type and standard for storing arbitrary scripts (type 0 = unknown) you can store all possible claimants.
Something like this:
Type Description UXTO Template Record_Length Storage_Format
-----------------------------------------------------------------------------------------------------------------------------------------
0 Unknown -none- variable 0x00
1 Pay2PubkeyHash OP_DUP OP_HASH160 OP_EQUALVERIFY OP_CHECKSIG 29 bytes 0x01
2 Pay2PubKey OP_CHECKSIG 42|74 bytes 0x02
3 Pay2ScriptHash (P2SH) OP_HASH160 OP_EQUAL 29 bytes 0x03
4 Native Multisig (1 of 1) OP_1 OP_1 OP_CHECK_MULTISIG 0x08
5 Native Multisig (1 of 2) OP_1 OP_2 OP_CHECK_MULTISIG 0x04
6 Native Multisig (2 of 2) OP_2 OP_2 OP_CHECK_MULTISIG 0x05
7 Native Multisig (1 of 3) OP_1 OP_3 OP_CHECK_MULTISIG 0x06
8 Native Multisig (2 of 3) OP_2 OP_3 OP_CHECK_MULTISIG 0x07
A couple notes
1) Yes 1 of 1 "multisig" is a valid output. It can probably be combined with Pay2PubKey entries as the input format of both are identical.
2) It may be better to break the types down to separate uncompressed and compressed. With multisig they can be mixed so there are multiple combinations (i.e. for x of 3: UUU, UUC, UCC, CCC).
3) Scripts are limited to 10,000 bytes so for type 0 that puts an upper bound on the length of unknown scripts. In practice scripts are much smaller on average.
4) Unknown (type 0) could be broken out to identify more discrete templates.
All unspendable outputs can be dropped. Outputs matching the following templates are unspendable (either by design or error) and can be dropped from the snapshot.bin. There is no reason to keep them anymore than spent output but it doesn't make much difference as there are less than 2K outputs currently in the blockchain.
OP_DUP OP_HASH160 OP_0 OP_EQUALVERIFY OP_CHECKSIG
OP_IFDUP OP_IF OP_2SWAP OP_VERIFY OP_2OVER OP_DEPTH
OP_RETURN
OP_RETURN
An alternative idea is to recognize that complex scripts (including native multisig) have different properties and thus treat them differently. Pay2PubKeyHash, Pay2PubKey, Pay2ScriptHash make up the majority of the outputs (>99.97% by total created outputs lets see how it holds up by value) and they have a single identifying characteristic (as hash or key). To normalize them and reduce the size, a hash could be taken of the PubKeys for formats 2 & 4. This would make all 4 formats (1-4) representable by a single hash and value.
All other outputs could just be stored as raw scripts. While it may not be the most efficient, remember they make up 0.03% of outputs and that will probably shrink due to P2SH, so how you handle the top 4 templates is going to determine the bulk of the snapshot ledger size.
0x04E // single identify claimants (>99.97% of outputs, Pay2PubKeyHash, Hash(Pay2PubKey), Pay2ScriptHash)
//all identifiers in sorted order
0xB4 // complex script outputs (to include native multisig - upper limit is ~30,000 entries)
As a side note it would have been useful if Bitcoin (or any derived coin) had moved all scripting to the tx inputs and kept all outputs as just script hashes (i.e.
vs OP_HASH160 OP_EQUAL). It would make transactions smaller by removing excess opcodes, and more importantly by shifting the "weight" of the tx to the input it would reduce the size of the UXTO (which records the set of outputs. As a side benefit it would make things like snapshot.bin easier and lighter. Case in point the current UXTO spends ~35 bytes per output. If it could be reduced to a output hash (20 bytes), and value stored as varint (4 bytes average) it would reduce the size of UXTO by roughly a third. Bitcoin could move towards a system like that by soft fork (make non P2SH output non-standard after certain point) and the UXTO would be normalized over time as non-P2SH outputs are spent and replaced with P2SH ones. Just something for future altcoin developers to consider.