Author

Topic: From a Random Number to Mnemonic Phrase (Read 408 times)

full member
Activity: 434
Merit: 246
May 02, 2018, 11:47:13 PM
#9
Just select the view "Programmer" in the main menu of this application, then select "Bin", paste the string "01000100010" in the binary format, after that select "Dec" and you will get the string "546" in the decimal format.

Very cool, thanks for the tip. Had I known that, it would have made my life much easier while writing this post.

Nice explanation, i really llike your simple guide/article.
But AFAIK BIP39 choose those random bits/mnemonic using cryptographically secure pseudo-random number generator (CSPRNG), not atmospheric noise.

Thanks a lot, ETFbitcoin. And thanks for mentioning CSPRNG. It is important to know that the pseudo-random generator used in wallets of this type is secure. The atmospheric noise was just for demonstration, a simple way for me to select some entropy without going into other aspects of the problem.
legendary
Activity: 2674
Merit: 2334
The chances of generating a valid seed like this is 1 in 16.
This means 15 out of 16 seeds won't work  Undecided
So, does this mean that the initial entropy is diminished something like 16 times?

Not exactly.
A space of 2048 words x 12 words would equal an entropy of 132 bits (2132 combinations).
Due to the fact that that 15 (28-1) out of every 16 (28) seeds are discarded, the actual entropy is at (slightly less than) 124 bits (2124 combinations).

This is definetely secure enough (at the current technological level).
The initial entropy of the digit encapsulated by BIP-39 is 128 bit. This value is not diminished in any case. The last four bits of the 132-bit combination count of the mnemonic phrase are an extra bits. The developers of BIP-39 decided to use them as a checksum, but theoretically these bits could be omitted, taking for example four zero bits instead of them.
legendary
Activity: 2674
Merit: 2334
The second argument 2 is to indicate the binary nature of the first argument. You can do all this parsing in javascript, or in python if you have python installed. If you don't have python installed, you can do it online, for example here:

https://www.python.org/shell/

So now we run all 12 binary numbers from above:
Code:
01000100010 --> 546
01001110001 --> 625
01010010101 --> 661
00001101000 --> 104
10010110111 --> 1207
10000101011 --> 1067
00100101100 --> 300
00111001100 --> 460
00011001001 --> 201
00110100000 --> 416
11001000000 --> 1600
00100010101 --> 277

and we get the indices.

You don't need to install Python to convert a binary digits into a decimal digits. You can use the standard calculator pre-installed on your operation system. For example, on Windows 7:



Just select the view "Programmer" in the main menu of this application, then select "Bin", paste the string "01000100010" in the binary format, after that select "Dec" and you will get the string "546" in the decimal format.


And it happens that it isn't a correct sequence. Invalid mnemonic. Many random combinations aren't accepted, I guess, because of the wrong checksum.
Yes, the last word contains the checksum of the entire mnemonic phrase.
newbie
Activity: 36
Merit: 0
Thank you, butka this topic is awesome, almost all knowledge about how to create mnemonic in one post!
legendary
Activity: 1624
Merit: 2481
The chances of generating a valid seed like this is 1 in 16.
This means 15 out of 16 seeds won't work  Undecided
So, does this mean that the initial entropy is diminished something like 16 times?

Not exactly.
A space of 2048 words x 12 words would equal an entropy of 132 bits (2132 combinations).
Due to the fact that that 15 (28-1) out of every 16 (28) seeds are discarded, the actual entropy is at (slightly less than) 124 bits (2124 combinations).

This is definetely secure enough (at the current technological level).
full member
Activity: 434
Merit: 246
The chances of generating a valid seed like this is 1 in 16.
This means 15 out of 16 seeds won't work  Undecided
So, does this mean that the initial entropy is diminished something like 16 times?

I am reading mastering Bitcoin as well, and your posts often call my attention to book passages which I had not given due importance.
Like your previous post about flipping coins. You are doing a very good job.

Thanks for sharing your thoughts.
Thanks, bitmover. While Mastering Bitcoin is fantastic a book, many aspects are not covered in too much detail. I guess it is impossible to do so because of the complexity of the matter. But here, regarding the mnemonics, I wanted to really understand how it goes. I only I hope I've got it right now.
legendary
Activity: 2352
Merit: 6089
bitcoindata.science
Butka, I really like your posts.

I am reading mastering Bitcoin as well, and your posts often call my attention to book passages which I had not given due importance.
Like your previous post about flipping coins. You are doing a very good job.

Thanks for sharing your thoughts.
legendary
Activity: 1624
Merit: 2481
Many random combinations aren't accepted, I guess, because of the wrong checksum.

Correct.
The chances of generating a valid seed like this is 1 in 16.
This means 15 out of 16 seeds won't work  Undecided
full member
Activity: 434
Merit: 246
When you want to create new wallet, certain Bitcoin wallets define a mnemonic phrase that you have to remember or write down.

For all intents and purposes, knowing the mnemonic is virtually the same as knowing your private key.

Have you ever wondered how the mnemonic phrase is obtained?

What I want to explore with this post is this correspondence between the initial entropy and the actual words of the mnemonics. The aim is to understand the principles, not to go into programming details.

We will try to derive the mnemonic starting from a random entropy and we will use tools everyone can find and use on the internet.

First thing to realize is there are several implementation of the mnemonic phrase. The standard that Trezor and other hardware wallets use is called Bip-39. This is what we will explore here. We will follow the book "Mastering Bitcoin" by Andreas Antonopoulos and this scheme in particular:

https://i.pinimg.com/564x/e3/e9/b5/e3e9b5ca7128eb03d526a342904ef27d.jpg

Electrum has its own standard, and even different implementations in different versions of the code, so we won't go into it in Electrum mnemonics in this post.

The Standard

Bip-39 has a standardized list of words. It contains 2048 words from the standard English language. There are similar lists in several other languages if you prefer to have a seed in your native language.

You can see it on GitHub:

https://github.com/bitcoin/bips/blob/master/bip-0039/english.txt

It goes like this:



Now if we chose 12 of these words randomly, that would mean that the number of possible combination to distribute 12 words out of 2048 words is 204812, which is
equivalent to 2132. So with 12 words, there are 132 bits of entropy (Source: https://en.bitcoin.it/wiki/Mnemonic_phrase). However, as we will see below, not all combinations work, so there is slightly less entropy than 132 bits.

Let's generate mnemonic words by using a random source of entropy. Like flipping coins. For the record, I didn't want to go through the trouble of flipping coins, but you can if you want to.

The Steps

1. For this purpose I picked up a random 128 bit number from this website. It uses atmospheric noise, so I figure, it should be good enough for this demonstration. This number I got is this:

Code:
01000100010010011100010101001010100001101000100101101111000010101100100101100001110011000001100100100110100000110010000000010001

Now I converted this binary to a hexadecimal number:

Code:
4449C54A86896F0AC961CC1926832011

2. The next step is to create a checksum of the above number. How is the checksum created? By applying a SHA256 function on this number and taking the first 4 bits. (You can do it online here: https://www.fileformat.info/tool/hash.htm ; make sure you take binary hash )

Code:
SHA256(4449C54A86896F0AC961CC1926832011)  =  55e802188d4450c11f6b39c4a108395b706855fe56a5e00c1effd33fe2fbe354

SHA256 returns a hexadecimal number, each digit is in fact 4 bits.

Now we take the first four bits of the output, which is the number 5 in hex form (0101 in decimal form), and that's our checksum.

3. Next we add these four bits to the end of the original random number, like this:

0100010001001001110001010100101010000110100010010110111100001010110010010110000 11100110000011001001001101000001100100000000100010101

4. We started with 128 bits, and now we have 132 bits. Next, this sequence should be divided into 12 groups of 11 bits each. Like this.

Code:
01000100010 01001110001 01010010101 00001101000 10010110111 10000101011 
00100101100 00111001100 00011001001 00110100000 11001000000 00100010101

We have ended up with these 12 groups, each consisting of 11 bits. Next we map these 12 binary numbers to 12 words.

5. When I was trying to figure out the next step, I got really confused. First, I thought that it would be enough to convert each of these 12 binary numbers into decimal numbers. But this wouldn't work.

Finally, thanks to this javascript implementation (https://github.com/iancoleman/jsbip39/), I realized that one should use a function called parseInt() to parse the 11 bits into an integer, which then serves as an index to select the corresponding word. Like this:

Code:
parseInt("01000100010", 2) = 546 injavascript

int("01000100010", 2) = 546 in python

The second argument 2 is to indicate the binary nature of the first argument. You can do all this parsing in javascript, or in python if you have python installed. If you don't have python installed, you can do it online, for example here:

https://www.python.org/shell/

like this:



So now we run all 12 binary numbers from above:
Code:
01000100010 --> 546
01001110001 --> 625
01010010101 --> 661
00001101000 --> 104
10010110111 --> 1207
10000101011 --> 1067
00100101100 --> 300
00111001100 --> 460
00011001001 --> 201
00110100000 --> 416
11001000000 --> 1600
00100010101 --> 277

and we get the indices.

5. Next, we match the indices with the words from the Bip-39 list. However, there's one last catch: the word list indices go from 1 to 2048. The above indices go from 0 to 2047, so we have to add +1 to the above numbers:

546+1 = 547 -- > dust



625+1 = 626 --> evolve

and so on...

The end result is

dust evolve famous artist notice lyrics
cereal define bomb cross siege cargo



6. Let us check if this is indeed a correct menmonic phrase. We go to this online tool:

https://iancoleman.io/bip39/

and we enter our 12 words:



It is indeed a correct phrase. Otherwise there would have been an error.  

Let's try to insert some random 12 words from the list, chances are the phrase won't be successful, like the following sequence I chose randomly:

affair attend bone weird wagon midnight
rookie mercy fan abstract siren right



And it happens that it isn't a correct sequence. Invalid mnemonic. Many random combinations aren't accepted, I guess, because of the wrong checksum.

Thanks for following. I hope you will find this little tutorial useful. If there is an error somewhere in this derivation, I would appreciate your help to correct it.

Jump to: