Thank you, and sorry for the delay!
Firstbits Definition (Draft)
Firstbits Base Representation
All Bitcoin addresses that appear in the block chain using a well defined set of recognized transaction output formats have a unique firstbits base representation.
I don't think this is true or needed. Some addresses have no firstbits base representation, because an address differing only by case appeared earlier.
The firstbits base representation of a bitcoin address is the shortest non-empty substring starting with the first character of the Bitcoin address and converted to lower-case, which does not collide with the firstbits base representation of another address appearing in a transaction output earlier in the block chain.
This is good but could be clearer. How about:
Thus, a firstbits base representation is always in lower-case.
The order of appearance for addresses in transaction outputs in the same block follows the order of transaction outputs in the raw block data.
Firstbits AddressesA Bitcoin address can several valid firstbits addresses. A firstbits address can be derived from a firstbits base representation by appending additional characters from the Bitcoin address up to and including the last character of the Bitcoin address. A firstbits address can have any combination of upper and lower case.
Example 1Bitcoin address: 1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa
Firstbits base representation: 1
Firstbits address examples: 1, 1a, 1A, 1a1z, 1A1z, 1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNaExample 2Bitcoin address: 1SgTspiKe5HHkjdSeD72q9WsiJhRiaxf9
Firstbits base representation: 1sgtsp
Firstbits address examples: 1sgtsp, 1SgTsp, 1SgTspiKe5HHkjdSeD72q9WsiJhRiaxf9Recognized Transaction Output FormatsAn address is said to appear in a block when it appears in one of the block's transactions' pubkey, pubkey hash, or script hash outputs.
Pubkey OutputA
pubkey output is a transaction output with a script of the form:
An address is said to appear in a pubkey output when its version byte is 0 and its key hash equals the 160-bit hash of the output script's push operand (pubKey).
Pubkey Hash OutputA
pubkey hash output is a transaction output with a script of the form:
OP_DUP OP_HASH160
OP_EQUALVERIFY OP_CHECKSIGAn address is said to appear in a pubkey hash output when its version byte is 0 and its 160-bit key hash equals the output script's push operand (pubKeyHash).
Script Hash OutputA
script hash output is a transaction output with a script of the form:
OP_HASH160 OP_EQUAL
An address is said to appear in a script hash output when its version byte is 5 and its 160-bit key hash equals the output script's push operand (scriptHash).
In particular, output scripts not matching the above three cases can not affect whether an address appears in a block, nor can input scripts. For example, a script containing OP_DROP or OP_NOP can not match the cases, but the push operand in a pubkey output may have any length.
It is actually more complex to regard all transactions in a block in a holistic way rather than just handling them one at a time.
However, here is the definition we got so far for transactions in the same block (which is actually used today):
The order of appearance for addresses in transaction outputs in the same block follows the order of transaction outputs in the raw block data.
If you make a less complex description that unambiguously describes your way of handling collisions I'll agree with you.
I don't have a strong preference. Abe does use enough characters to distinguish from addresses later in the block. I think the code was simpler before I did this. I don't think conceptually either way is more complex; you choose between defining "order of appearance" and "same or earlier block". I strongly suggest finding an example and checking blockchain.info's treatment of same-block collisions, since that site obviously has a lot of work put into it and may be unwilling to change. And let's try to include piuk.