With 16 bit of checksum, it is possible to fit everything in 5 words.
Knowing the block, we know how much transaction there is, so we know how much bits to use for encoding.
Knowing the transaction, you know how much bits to use for encoding the TxOut index.
So if we take 24 bit for Block height (block 16 777 216), the transaction count is currently 10 bits, the TxOut index will be 1 bit most of the time. (OP_RETURN + Change)
So this gives us
24 + 10 + 1 + 16 = 51 bits for a 16 bit checksum, (5 words)
I would like to find a more efficient way of encoding the block though. (instead of hardcoding 24 bits)
good idea. This saves a lot.
You certainly don't need 24 bits for block height. It takes >300years. All cryptography in current form should have been broken long before that. 22 bits (70 years) would be enough.
The TxOut index is controlled by user so let's assume it to be usually 1 bit (because people want a short address)
I want at least 20 bits of checksum (1 in a million error tolerance)
I prefer to leave 1 version bit for future extension. (otherwise we may need to use a completely different word list for future encoding scheme)
-----------------
So a 5-words address would have
1 version bit
22 block height bits
1 to 11 txIndex bits
1 to 11 txOutputIndex bits
at least 20 checksum bits
The txIndex and txOutputIndex will always use the least possible bits, leaving more room for the checksum.
For example, if a block is known to have only 500 txs, the txIndex will only take 9 bits so the txOutputIndex may take at most 3 bits. If there is only 3 outputs (2 bits) in the tx, the checksum will have 21 bits
EDIT: If there is not enough bits left to fully encode txOutputIndex, the earlier outputs are assumed. For example, if there are 14 outputs (4 bits) in the tx but only 2 bits is left for txOutputIndex, we could still encode the first 4 outputs with 5 words.
Similarly, if a block has more than 2048 txs, a 5-words address is still valid if it is referring to first 2048 tx in the block
and it is referring to the first or second output
------------------
If an address can't fit-in a 5-words address, it would need 6 words anyway. So a 6-words address would have
1 version bit
22 block height bits
1 to 22 txIndex bits
1 to 22 txOutputIndex bits
at least 20 checksum bits
22 txIndex bits should be enough for a 800MB block. If that's still not enough, it could be extended to 7-words in a similar way.
It is also valid to encode a 5-words address in 6-words, for extra checksum security.