Adding to point (2). To achieve maximum entropy, it is essential that no word is more or less likely to be selected than any other and each select event is independent from any other. Some people erroneously attempt to think up their own words or select them from random pages of some book.
Agreed. I made my own version of the Diceware list years ago to counter this problem. 10 000 words is indeed generous. Even as a native English speaker I wouldn't care to push much beyond 1000 words.
These days I use the English 2048-word list supplied with BIP0039:
abandon ability able about above ... zero zone zoo
Certainly, shortcuts can cost entropy and while method obscurity may increase security, it will typically do so in a non-quantifiable way. Relying on one's intuition regarding the difficulty of divining an obscure method is to abandon a foundational premise of information theory.
However, I'd like to highlight key-stretching as a fair source of additional security for a true brainwallet. In essence, one simply forgets the last few words of their passphrase and brute-forces them whenever access is required.
I'd also like to expand on "sufficiently safe" here.
Selecting 12 words randomly and uniformly from a pool of 10 000 words gives 12 * log2(10000) = 159.45 bits of entropy (2.d.p). Roughly speaking, there are as many equally plausible 12-word passphrases as there are Bitcoin addresses. Assuming the entropy of the passphrase is not reduced as it is converted into a private key, such a private key will be no less effective in securing a Bitcoin output than a standard random key.
Selecting 12 words from a pool of just 2048 yields
Even 9 words from 2048 gives 99 bits of entropy. We're well past the point of general cryptographic recommendation here but as far as a convenience/security tradeoff is concerned, I believe there are cases where 9 words would be a reasonable choice. Extending your earlier point of reference: As of block #387287, approximately 283.71 hashes have been calculated by miners in Bitcoin's lifetime, and such a hash is computationally cheaper than converting a private key to an address.
[1] Most new Electrum seeds are 13 words from the pool of 2048 words I linked to above. One might expect such a seed to have 13 * 11 = 143 bits of entropy but some of the data is dedicated to a checksum/version-number and the final word is underutilised (usually begins 'ab' or 'ac').