When you want to create new wallet, certain Bitcoin wallets define a mnemonic phrase that you have to remember or write down.
For all intents and purposes, knowing the mnemonic is virtually the same as knowing your private key.
Have you ever wondered how the mnemonic phrase is obtained?What I want to explore with this post is this correspondence between the initial entropy and the actual words of the mnemonics. The aim is to understand the principles, not to go into programming details.
We will try to derive the mnemonic starting from a random entropy and we will use tools everyone can find and use on the internet.
First thing to realize is there are several implementation of the mnemonic phrase. The standard that Trezor and other hardware wallets use is called
Bip-39. This is what we will explore here. We will follow the book "Mastering Bitcoin" by Andreas Antonopoulos and this scheme in particular:
https://i.pinimg.com/564x/e3/e9/b5/e3e9b5ca7128eb03d526a342904ef27d.jpgElectrum has its own standard, and even different implementations in different versions of the code, so we won't go into it in Electrum mnemonics in this post.
The StandardBip-39 has a standardized list of words. It contains
2048 words from the standard English language. There are similar lists in several other languages if you prefer to have a seed in your native language.
You can see it on GitHub:
https://github.com/bitcoin/bips/blob/master/bip-0039/english.txtIt goes like this:
Now if we chose 12 of these words randomly, that would mean that the number of possible combination to distribute 12 words out of 2048 words is
204812, which is
equivalent to
2132. So with 12 words, there are
132 bits of entropy (Source:
https://en.bitcoin.it/wiki/Mnemonic_phrase). However, as we will see below, not all combinations work, so there is slightly less entropy than 132 bits.
Let's generate mnemonic words by using a random source of entropy. Like flipping coins. For the record, I didn't want to go through the trouble of flipping coins, but you can if you want to.
The Steps1. For this purpose I picked up a random 128 bit number from
this website. It uses atmospheric noise, so I figure, it should be good enough for this demonstration. This number I got is this:
01000100010010011100010101001010100001101000100101101111000010101100100101100001110011000001100100100110100000110010000000010001
Now I converted this binary to a hexadecimal number:
4449C54A86896F0AC961CC1926832011
2. The next step is to create a
checksum of the above number. How is the checksum created? By applying a
SHA256 function on this number and taking the first 4 bits. (You can do it online here:
https://www.fileformat.info/tool/hash.htm ; make sure you take binary hash )
SHA256(4449C54A86896F0AC961CC1926832011) = 55e802188d4450c11f6b39c4a108395b706855fe56a5e00c1effd33fe2fbe354
SHA256 returns a hexadecimal number, each digit is in fact 4 bits.
Now we take the first four bits of the output, which is the number 5 in hex form (
0101 in decimal form), and that's our checksum.
3. Next we add these four bits to the end of the original random number, like this:
0100010001001001110001010100101010000110100010010110111100001010110010010110000
1110011000001100100100110100000110010000000010001
0101 4. We started with
128 bits, and now we have
132 bits. Next, this sequence should be divided into 12 groups of 11 bits each. Like this.
01000100010 01001110001 01010010101 00001101000 10010110111 10000101011
00100101100 00111001100 00011001001 00110100000 11001000000 00100010101
We have ended up with these 12 groups, each consisting of 11 bits. Next we map these 12 binary numbers to 12 words.
5. When I was trying to figure out the next step, I got really confused. First, I thought that it would be enough to convert each of these 12 binary numbers into decimal numbers. But this wouldn't work.
Finally, thanks to this javascript implementation (
https://github.com/iancoleman/jsbip39/), I realized that one should use a function called
parseInt() to parse the 11 bits into an integer, which then serves as an
index to select the corresponding word. Like this:
parseInt("01000100010", 2) = 546 injavascript
int("01000100010", 2) = 546 in python
The second argument 2 is to indicate the binary nature of the first argument. You can do all this parsing in
javascript, or in
python if you have python installed. If you don't have python installed, you can do it online, for example here:
https://www.python.org/shell/like this:
So now we run all 12 binary numbers from above:
01000100010 --> 546
01001110001 --> 625
01010010101 --> 661
00001101000 --> 104
10010110111 --> 1207
10000101011 --> 1067
00100101100 --> 300
00111001100 --> 460
00011001001 --> 201
00110100000 --> 416
11001000000 --> 1600
00100010101 --> 277
and we get the indices.
5. Next, we match the indices with the words from the Bip-39 list. However, there's one last catch: the word list indices go from
1 to 2048. The above indices go from
0 to 2047, so we have to add +1 to the above numbers:
546+1 = 547 -- > dust
625+1 = 626 --> evolve
and so on...
The end result is
dust evolve famous artist notice lyrics
cereal define bomb cross siege cargo6. Let us check if this is indeed a correct menmonic phrase. We go to this online tool:
https://iancoleman.io/bip39/and we enter our 12 words:
It is indeed a correct phrase. Otherwise there would have been an error.
Let's try to insert some random 12 words from the list, chances are the phrase won't be successful, like the following sequence I chose randomly:
affair attend bone weird wagon midnight
rookie mercy fan abstract siren right
And it happens that it isn't a correct sequence. Invalid mnemonic. Many random combinations aren't accepted, I guess, because of the wrong checksum.
Thanks for following. I hope you will find this little tutorial useful. If there is an error somewhere in this derivation, I would appreciate your help to correct it.