Author

Topic: How do I identify the valid checksums for bip39 if I generate 11/12 of the word? (Read 791 times)

hero member
Activity: 714
Merit: 1010
Crypto Swap Exchange
Actually, the von Neumann method even works with extremely biased coins, it will just take more time and tosses to get enough valid outcomes.
Yes, I knew that, but thanks for pointing it out and explanation, anyway. The "von Neumann" method only breaks if you had a coin that produces 100% a particular side. But then it's also apparent that by rule 2 you'd have to always discard the toss results and can't progress at all.

I just wouldn't feel comfortable to use a heavily biased coin for such tosses and therefore wrote that "slightly biased" is still fine with this method, even when "heavily biased" would've worked fine, too.

Probability stuff is somewhat non-intuitive, at least for my wet brain v1.0beta.  Cheesy
legendary
Activity: 2604
Merit: 2353
You can achieve fair results even from slightly biased coins (or unknowingly biased tossing habits):
  • Toss the coin twice.
  • If the results match, start over, forgetting both results.
  • If the results differ, use the first result, forgetting the second.
Actually, the von Neumann method even works with extremely biased coins, it will just take more time and tosses to get enough valid outcomes.
Fundamentally, it just relies on basic probability laws :
p(HT) = p(H) x p(T) = p(TH) and p(H) + p(T) = 1
so
p(T) = 1 - p(H)

p(HT) = p(TH) = p(H) x (1 - p(H))

The probability is the same for HT and TH whatever p(H) and p(T) are. So even if p(H)=99%, we will get the exact same likelihood to get HT and TH. In the same way as for H and T with a perfect fair coin. Then if you only keep HT and TH you will likely get half HT and TH among all the retained tosses, and finally half H and T if you forget the second outcome.

So it's certainly the easisest safest method to create a seed if you use it along with Odolvlobo's procedure IMO.
hero member
Activity: 714
Merit: 1010
Crypto Swap Exchange
I've seen discussions about bias in dice rolls or coin flips. For instance: Scientists Destroy Illusion That Coin Toss Flips Are 50–50, showing the coin "landed with the same side facing upward as before the toss 50.8 percent of the time". The article also shows ways to avoid this, but this small bias isn't something I'd worry about. No attacker is going to find out how you flipped a coin, and brute-force the entire 128 flips.

You can achieve fair results even from slightly biased coins (or unknowingly biased tossing habits):
  • Toss the coin twice.
  • If the results match, start over, forgetting both results.
  • If the results differ, use the first result, forgetting the second.

If you're very paranoid (or simply want to mask off any potential bias), you can XOR your "random sequence" of coin tosses with another supposedly "random sequence" that is produced by another method, like a CSPRNG or a HWRNG or rolling dice.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
CSPRNG is better than your hand.
My hand is a lot easier to verify than the random number generator inside a piece of hardware.

Quote
1. see that you flipped '1' too many times and decide to write a lie on paper that you flipped '0' somewhere in the middle, because it seems more random.
2. the opposite of (1)
3. get bored in the middle of the process and decide to add some bits by yourself.
That's just dumb Tongue It's possible to do this properly, and you shouldn't do it if you don't understand randomness.

I've seen discussions about bias in dice rolls or coin flips. For instance: Scientists Destroy Illusion That Coin Toss Flips Are 50–50, showing the coin "landed with the same side facing upward as before the toss 50.8 percent of the time". The article also shows ways to avoid this, but this small bias isn't something I'd worry about. No attacker is going to find out how you flipped a coin, and brute-force the entire 128 flips.
hero member
Activity: 686
Merit: 1341
✔️ CoinJoin Wallet
Why is this not a more common way for generating 12 words? 

CSPRNG is better than your hand.

When flipping a coin, you will unwillingly (probably) make one or more of the following mistakes:

1. see that you flipped '1' too many times and decide to write a lie on paper that you flipped '0' somewhere in the middle, because it seems more random.
2. the opposite of (1)
3. get bored in the middle of the process and decide to add some bits by yourself.
legendary
Activity: 4522
Merit: 3426
If you select 12 words, there's a big probability that your seed phrase doesn't pass the checksum.
I intend to flip a coin 121 times, convert to BIP39 words and then enter them into my HW with a random 12th word until it accepts the mnemonic as valid (passing checksum). This way I'm not relying on the HW RNG.

Do this to simplify the process. It will give you the full 128 bits of entropy and should make finding the checksum easier:

1. After flipping 121 times to get the first 11 words, flip 7 more times to get a number between 0 and 127.
2. Multiply that number by 16.
3. Only one of the words that corresponds to that number or one of the 15 that follow will give you a valid seed phrase when used as the 12th word. Try each of them until you find one that works.

For example,

Flip 121 times to get 11 words: raccoon weird maze affair stomach fall whisper direct unveil chase enhance

Flip 7 times to get the number 101. 101 x 16 is 1616. Trying each of the words corresponding to numbers 1616 - 1631, you will find that 1619 "skill" works as the 12th word.

So, the phrase is: raccoon weird maze affair stomach fall whisper direct unveil chase enhance skill

This works because a 12-word phrase only has 4 bits of checksum, which means that once you have all the entropy bits, you only need to find the 1 out of 16 possible bit combinations that matches the correct checksum.
legendary
Activity: 3472
Merit: 10611
I intend to flip a coin 121 times, convert to BIP39 words and then enter them into my HW with a random 12th word until it accepts the mnemonic as valid (passing checksum). This way I'm not relying on the HW RNG.
There are a couple of problems with this idea.
The obvious one is that security-wise you want to generate at least 128 bits of entropy, so reducing it to something like 121 bits is not a good idea.

Additionally to do what you described, you need the hardware wallet to have had implemented a special procedure to first accept invalid and shorter mnemonic and brute force them to generate a valid one! Because that's what they have to do, to add the missing 11-bits or the missing word and check for validity, if it fails increment and repeat. They don't have such a feature as far as I know and there is no valid reason to implement such a feature either because then the wallet has to also implement another feature to determine which one of the permutations is the valid mnemonic because more than one word can be added to get a valid checksum and it could be a user trying to recover an already used mnemonic missing last word.
legendary
Activity: 2380
Merit: 5213
Why is this not a more common way for generating 12 words?  
Your seed phrase represents a large random number and the standard method is that you first generate that random number.
There's nothing preventing you from going to the words list first directly. If you use a method in which words are selected completely randomly, you can achieve the same security.


I intend to flip a coin 121 times, convert to BIP39 words and then enter them into my HW with a random 12th word until it accepts the mnemonic as valid (passing checksum). This way I'm not relying on the HW RNG.
With flipping the coin 121 times, you actually generate a random entropy. This means that you use the common method and the only difference is that you select the last 7 bits in a different way.
By common method, I mean generating the entropy and then go to the word list.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
Why is this not a more common way for generating 12 words?
Probably because it's a lot of work, and not recommended because people will start cherry-picking words to form a sentence.

Quote
I intend to flip a coin 121 times, convert to BIP39 words and then enter them into my HW with a random 12th word until it accepts the mnemonic as valid (passing checksum). This way I'm not relying on the HW RNG.
Why stop flipping there, if you can keep flipping coins and find the nearest word that matches the checksum?
newbie
Activity: 18
Merit: 5
If you select 12 words, there's a big probability that your seed phrase doesn't pass the checksum.
Instead, you can select 11 words and then try to find a word which lead to a valid BIP39 seed phrase. By valid, I mean it passes the checksum
This is completely feasible, but it's not a common method for generating a seed phrase.
If you insist on generating your seed phrase in this way, you should make sure that the words are picked 100% random.

Why is this not a more common way for generating 12 words? 

I intend to flip a coin 121 times, convert to BIP39 words and then enter them into my HW with a random 12th word until it accepts the mnemonic as valid (passing checksum). This way I'm not relying on the HW RNG.

This seems to me to be much better than tyting to use tools like SeedSigner and https://iancoleman.io/bip39/.

member
Activity: 104
Merit: 120
Hello, yes I was talking about botting into a USB drive tails OS on pc or laptop that already had an OS installed on it.  But thank you for the clarification and additional pointers.
legendary
Activity: 2268
Merit: 18775
Question, what are your thoughts about putting into a Linux Tails Distribution on a Windows machine via a USB drive?
Do you mean running Tails as a virtual machine within Windows? Or do you mean bypassing Windows altogether and simply booting the computer from the Tails USB? I wouldn't recommend the former, but I suspect you are talking about the latter.

If you boot to Tails, therefore completely ignoring Windows, and never connect to the internet or any other methods of communication while within Tails, then this is certainly a safer option than simply using Windows, and a good option if you cannot dedicate a device to be permanently airgapped. It would be even better if you can physically disconnect any connectivity hardware (unplug Ethernet cables, disconnect WiFi modules, etc.) and better still if you can physically disconnect any persistent storage (such as your hard drive(s)) while you are using Tails. But obviously the best option would be if you can dedicate an old machine to do this on which will never boot Windows or go online ever again.
member
Activity: 104
Merit: 120
Thank you for the suggestion. Question, what are your thoughts about putting into a Linux Tails Distribution on a Windows machine via a USB drive? I'm considering trying to use a persistent drive on a Tails distribution and not connect the Tails OS to any internet connection and then run it through this os. Are you aware of any possible security issues with this configuration? Thanks
legendary
Activity: 2268
Merit: 18775
Glad you got it all figured out.

For future, if you are planning on using this method (coin flips, calculate checksum, convert to seed phrase manually) to generate a seed phrase, then you should do it on a device which is permanently airgapped. That means it does not have an internet connection and it will never have an internet connection again. Even better if you physically remove things like the WiFi card and Bluetooth chip to ensure it has no wireless connectivity whatsoever. You should also make sure the device is completely clean, which means formatting it and installing a clean OS on it. If you are going through all this trouble anyway, then you would probably be better served simply installing a reputable open source Linux distro rather than Windows and Linux on top. There are a number of very easy to use Linux distros. Mint is probably the closest to Windows in terms of look and feel.
member
Activity: 104
Merit: 120
That did it!  Thank you very much!  So in summary (for the future folks here) on a Windows terminal I had to run through a few hoops here to get things setup.

- I first had to enable Windows Subsystem for Linux

- I then downloaded Kali from the Microsoft store.

- Next I had to setup Kali and create a username and password.

- I next had to log into su via the sudo su command

- Finally I performed the apt-get install libdigest-sha-perl command on the WSL window in sudo su mode and it installed all the necessary commands needed to perform the following line that resulted in the correct SHA256 has of my binary input:

└─# echo -n "1111001010110001011100111100010111010101101010101111111111101011101110000000010 0001001011111111101011111111000100000010101111100" | shasum -a 256
 -0
931258d717865a310cfc24a9161b21f4c0d02e0bb4cf12894516170a10e72339 ^-

Thanks again to everyone who helped me along here.  It was very educational!

legendary
Activity: 2268
Merit: 18775
I'm wondering why the discrepancy is occurring with hosseinimr93's SHA256 digest as from what I understood from
Because the -0 argument tells it to run in bits mode, but in your command you are not feeding it a string of bits, but a string of bytes. You need to feed it the entropy in 0s and 1s as I said before:
Code:
echo -n "11110010101100010111001111000101110101011010101011111111111010111011100000000100001001011111111101011111111000100000010101111100" | shasum -a 256 -0

Try this command and see if you get the correct checksum.
member
Activity: 104
Merit: 120
Hi again everyone,

I was hoping to get someone to double check the hash done on entropy in hex that I generated that is converted to F2B173C5D5AAFFEBB80425FF5FE2057C.

As per hosseinimr93's post, this translates to a SHA256 digest of 931258d717865a310cfc24a9161b21f4c0d02e0bb4cf12894516170a10e72339

Also, with the help of o_e_l_e_o , I was able to perform the following commands and was able to successfully load the Linux files into my Windows copy after logging into su:

apt-get install libdigest-sha-perl

I next performed the following commands but see a different SHA256 digest as noted below:

└─# echo -n F2B173C5D5AAFFEBB80425FF5FE2057C | shasum -a 256 -0
362695f3d7e699ecdae3536168fdc0f4e5696a1ee278c4800a626c0bac70746c ^-

I'm wondering why the discrepancy is occurring with hosseinimr93's SHA256 digest as from what I understood from
o_e_l_e_o :

"-a selects an algorithm, in this case 256. -0 tells it to read the input as bits, which is necessary when computing a checksum as above."

TIA

legendary
Activity: 1512
Merit: 7340
Farewell, Leo
sha256sum won't work in this case, because it does not have an option to treat the input as bits.
My bad. I'm thinking in terms of hexadecimal. You can append the 128 bit number with a "9" and then convert the 132 bit number to ones and zeroes, can't you?
legendary
Activity: 2268
Merit: 18775
Perhaps these versions don't include the shasum command?  Or perhaps the Windows versions don't?
Again, I have absolutely no idea about Windows, but on a pure Linux machine you could try the following command to install the necessary packages. It may or may not work on your Linux for Windows:
Code:
apt-get install libdigest-sha-perl

I've had problems with shasum in the past. Try sha256sum
sha256sum won't work in this case, because it does not have an option to treat the input as bits.

I'm not sure what o_e_l_e_o's command does. To me, echo -n "hello world" | shasum -a 256 -0 is executed normally, but it gives another result
-a selects an algorithm, in this case 256. -0 tells it to read the input as bits, which is necessary when computing a checksum as above.
legendary
Activity: 1512
Merit: 7340
Farewell, Leo
-bash: shasum: command not found
I've had problems with shasum in the past. Try sha256sum:
Code:
echo -n "hello world" | sha256sum

This will return you the SHA256 hash of the bytes of "hello world":
Code:
b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9  -

I'm not sure what o_e_l_e_o's command does. To me, echo -n "hello world" | shasum -a 256 -0 is executed normally, but it gives another result:
Code:
$ echo -n "hello world" | shasum -a 256 -0
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 ^-

Edit: -0 means that it reads in bits mode. So, I presume that it treats the input as binary, and converts it later to bytes to hash it.
member
Activity: 104
Merit: 120
Update:  I've installed both Debian and Kali for Windows and upgraded both distros.  That said when entering your command it seems as though the windows version of linux does not recognize shasum as the output states the following:

└─$ echo -n "hello world" | shasum -a 256 -0
-bash: shasum: command not found

Perhaps these versions don't include the shasum command?  Or perhaps the Windows versions don't?

Either way any suggestions on this issue or also any recommendations on any specific Linux distributions that can let me perform the above would be appreciated.  Thanks.
member
Activity: 104
Merit: 120
Hmm you all may end up forcing me into the world of Linux and python after all.  Either way I'm going to first try to enable to turning on Windows Subsystem for Linux just so I can try to run those commands as you mentioned.  I'll give this a try and report back.  Thanks!
legendary
Activity: 2268
Merit: 18775
I also don't use Windows, but a quick internet search seems that there is no obvious way to use Windows Powershell to compute the hash you need. If you were running Linux, then you just open terminal and enter the following very simple command:
Code:
echo -n "11110010101100010111001111000101110101011010101011111111111010111011100000000100001001011111111101011111111000100000010101111100" | shasum -a 256 -0

Which will return the following output:
Code:
931258d717865a310cfc24a9161b21f4c0d02e0bb4cf12894516170a10e72339

And then you take the fist character (9) and convert it to 1001 and append as your checksum.

Here's another open source tool you can use to input your coin flips and generate your seed phrase: https://bitcointalksearch.org/topic/handydandy-a-tool-to-work-with-entropy-5373505
legendary
Activity: 3472
Merit: 10611
Windows is very limited in using commands and stuff like that to compute hashes, etc. Linux is better. But in any case it is a lot better if you learn and use a programming language instead of trying to make it work with commands. Something like Python is easy to learn and you can use it for such purposes, not to mention there are many open source projects on github.com in python you can use.
member
Activity: 104
Merit: 120
Hey, no not yet. However the whole thing for me here is that I want to understand what's occurring in the background and how it happens so that I can learn for myself and be comfortable with what's going on and why.

If anyone out there can help me out with suggestions on what programs / commands I can use built in Windows offline that will allow me to convert my 128 bit entropy to a hexadecimal number and then also perform a SHA 256 on the hexadecimal output it would be very helpful. Thank you.
legendary
Activity: 1512
Merit: 7340
Farewell, Leo
Have you tried out my software? You're not going to mess with hex values, hash functions and mnemonic standards at all. That's technical stuff that happens on the background. You'll just flip a coin, and submit the results. Once done, it'll return you your BIP39 seed phrase, as well as some addresses of every type (legacy, nested segwit, native segwit) with their responsive private keys.

Alternative to my software (and more reviewed): https://iancoleman.io/bip39/
member
Activity: 104
Merit: 120
Hi BlackHatCoiner / all,

I agree there are several software items I am trusting.  The trick I'm trying to pull off is minimizing my sphere of trust to only encompass the essentials and of course as you said avoid the RNG.  That said I'm only interested in performing the OS stuff relating to generating the BIP 39 in an offline machine that will never see the internet which should help reduce that trust further. 

As to what I'm ultimately trying to do is figure out exactly how best to take my own derived entropy that I create offline and then create a 12 word BIP 39.  To me it sounds like flipping a coin 128x or rolling dice is the way to go (at least for testing).  Where I get confused is once I have that 128 bits (ones and zeros) what exactly do I do in Windows to:

1) identify the Hex value for the 1s and 0s of the entropy (offline), and then

2) once I have the Hex value, how do I perform a SHA256 hash for this Hex value in windows (offline) in the hex value of the entropy so I can then convert the first value of this SHA256 digest value back to binary (first 4 bits) to get the checksum that I would need to append to the 128 bits of entropy.

TIA
legendary
Activity: 1512
Merit: 7340
Farewell, Leo
That said however I am not a coder and not presently looking to be one but rather someone who likes to tinker with bitcoin and is trying to get to the point where I can create my own entropy and generate my own BIP 39 seed word without any reliance on the available software that does it.
Okay, but note that you do rely on lots of things that I'm not sure you're aware of. First of all, you rely on Microsoft. There's absolutely no way to prove that your OS won't betray you, unless Microsoft released the source code and let developers across the world confirm they haven't inserted a backdoor. Secondly, there's a higher chance that your OS has a 0-day, compared to Linux, because it isn't so broadly reviewed. Thirdly, you rely on software developers, cryptographers, and mathematicians.

But, yes. You do avoid the firmware RNG.

However specifically how to structure those bytes into a file format that I can use a built in Windows tool
With coding. But since that's not your field, I recommend you to use my software. Otherwise, tell me exactly what you want to do. You might be a victim of an XY problem.
legendary
Activity: 2268
Merit: 18775
i.e. 11 randomly selected BIP 39 words
Again, please don't do this. It isn't secure, like, at all.

Alternatively I imagine I could simply roll a16 sided dice to get 32 unique hex values
I wouldn't use dice at all. Dice are more prone to bias than coins, the bias takes longer to detect, and is also harder to detect. All of these things become more true the more faces your dice has. It would take hundreds of rolls to be relatively sure of detecting even a fairly large bias on a 16 sided dice. It will be simpler, quicker, and more secure to flip a coin 128 times.
member
Activity: 104
Merit: 120
Thank you again BlackHatCoiner.  I certainly appreciate all of your replies.  That said however I am not a coder and not presently looking to be one but rather someone who likes to tinker with bitcoin and is trying to get to the point where I can create my own entropy and generate my own BIP 39 seed word without any reliance on the available software that does it. The idea is that as a non coder that can't verify code independently (and even if I could, I am not an encryption expert to fully understand all the intricacies there).

That all said, I feel I'm right on the cusp of knowing how to build my own BIP 39 seed word list for use an offline wallet that uses this standard and PSBT files that is compatible with Bitcoin core.  In my mind this is probably one of the safest ways to transact with sovereignty and that's what I'd like to be able to achieve here.   I'm just getting stuck in a few areas being a non coder windows user.  I can easily create my own entropy and understand that I need to be able to figure out how to perform some of the steps that yo mentioned here:

"You don't hash the hexadecimal, and that's why you don't need to convert the binaries to hexadecimal. Hash functions take input as bytes. You need to convert your 128-bit string to bytes, and then hash that. It's just that most libraries do this conversion in the background, which brings some confusion."

As far as how to convert my 128 bit entropy to bytes, I understand that would mean just simply deconstructing it to 8 bit chunks.  However specifically how to structure those bytes into a file format that I can use a built in Windows tool. I think my method of execution is off as I was apparently wrong in saving the ones and zeros in a notepad.txt file format.  So if I understand you right, then what I need to do is somehow create a binary file of my entropy.  I'm wondering how I would go about that and that once it's done, how I would then structure the binary file to contain the bytes to be digestible by the SHA 256 hash function that I'm hoping is possible with a build in windows tool.

Thanks.
legendary
Activity: 1512
Merit: 7340
Farewell, Leo
Thanks but the idea here was for me to learn how to do as much as possible myself
Building the software from scratch requires a certain degree of technical competence. If you don't feel confident with that, I strongly recommend you to either use code that isn't yours and that you've read it, or study software engineering until you do feel confident enough.

That said, can you give me a walk through on how I would do that step manually on windows?  TY
First of all, I want to make it clear that I don't want you to trust me. I want you to verify me. The code isn't difficult to read. Most of it happens in Form1.cs. I make use of the NBitcoin and Bitcoin.Net libraries which are broadly used in other software too.

There are two ways to execute this program. One's to import the source code in Visual Studio 2019, and then have it compiled. The easier way is to download CoinFlippedSeed-v0.3.zip, make sure that the SHA-1 of the zip is 4DA93F3D72A9EB65282650E15D4E3C288A28FD71*, unzip the binaries and run CoinFlippedSeed.exe.

*You can try to skip the integrity verification part (that is the SHA-1 verification) for the moment, just to try out the software, but it's important to do it regularly on most of the software you install. It makes sure that the binaries aren't compromised. Do it if you're about to create a Bitcoin wallet with funds deposited.
member
Activity: 104
Merit: 120
Thanks but the idea here was for me to learn how to do as much as possible myself and to avoid putting trust in any particular software relating to building your own bitcoin wallet offline.  That said, can you give me a walk through on how I would do that step manually on windows?  TY
legendary
Activity: 1512
Merit: 7340
Farewell, Leo
Do you have any suggestions on the best way to do this in an offline Widows machine?
Hash bytes? Sure, but there are programs that let you make a seed yourself completely, not just for this part. That's one I've written: https://github.com/AngeloMetal/CoinFlippedSeed

Also, when you asked "Windows box? You mean Windows Forms in Visual Studio?" I simply meant a Windows PC.  Thanks.
The above program works on Windows.
member
Activity: 104
Merit: 120
Hi BlackHatCoiner,

Thanks for the tips.  When you stated "You need to convert your 128-bit string to bytes, and then hash that. It's just that most libraries do this conversion in the background, which brings some confusion" Do you have any suggestions on the best way to do this in an offline Widows machine?

Also, when you asked "Windows box? You mean Windows Forms in Visual Studio?" I simply meant a Windows PC.  Thanks.
legendary
Activity: 1512
Merit: 7340
Farewell, Leo
2)   Convert the binary 128 bit string to hexadecimal.
3)   Perform a SHA 256 hash of the hexadecimal.
You don't hash the hexadecimal, and that's why you don't need to convert the binaries to hexadecimal. Hash functions take input as bytes. You need to convert your 128-bit string to bytes, and then hash that. It's just that most libraries do this conversion in the background, which brings some confusion.

Alternatively I imagine I could simply roll a16 sided dice to get 32 unique hex values and skip steps 1 and 2
Note that a 16-side dice is likely to be more prone to return less random results than a 6-side dice, and even less than a 2-side coin. You should run a chi-squared test, to check this.

That all said, can anyone here give me some insight with how I would perform steps 2 and 3 on a windows box (ideally offline)?
Windows box? You mean Windows Forms in Visual Studio?
member
Activity: 104
Merit: 120
Thanks all.  So if I'm getting this right and I wanted to simply create my own independent entropy for a BIP 39 12 word seed, I can do it by performing the following steps:

1)   Take 128 bit entropy (i.e. 11 randomly selected BIP 39 words and identifying their 11 bit codes + 7 random bits - or perhaps just 128 coin flips).
2)   Convert the binary 128 bit string to hexadecimal.
3)   Perform a SHA 256 hash of the hexadecimal.
4)   Convert this SHA 256 hex digest to a binary number and take the first 4 bits of this binary number output as the checksum.
5)   Append the checksum identified in step 4 to the entropy from step 1 and deconstruct the 132 bits into 12 groupings of 11 bits to get the BIP 39 12 word lists.

Alternatively I imagine I could simply roll a16 sided dice to get 32 unique hex values and skip steps 1 and 2 but would need to add a step between 4 and 5 above to convert the hex I rolled into binary to append the checksum.

That all said, can anyone here give me some insight with how I would perform steps 2 and 3 on a windows box (ideally offline)?


legendary
Activity: 2380
Merit: 5213
The correct checksum is 0001, so the last word is 11111000001.
The correct checksum is 1001 and the last 11 bits are 11111001001.
I think you made a typo, because the last word is still "Weird" and your final result is correct.
legendary
Activity: 4522
Merit: 3426
1)   I first generated a random 128 bit entropy as such:

1111001010110001011100111100010111010101101010101111111111101011101110000000010 0001001011111111101011111111000100000010101111100
2)   I next performed a hash of the entropy by saving it in a notepad.txt file then performing the following command:  certutil -hashfile test.txt SHA256

3)   The resulting hash is: bc4f595b36de2533832a47bf66535612688d81594449693bed9414180ab7cad4

4)   The first 4 bits of the hash would be 1011.  This is my understanding as I believe that when converting from hexadecimal to binary you must always represent each binary value with four bits.  In this example, b is converted to binary as 1011.  

The correct checksum is 0001, so the last word is 11111000001. The phrase is verify merit vapor prize quiz volume theme lucky young yellow life weird

Everything you did looks, ok except that you cannot use notepad to create the file being hashed because it stores a text version and not the binary itself. If you save a hex value instead of binary with notepad, you may be able to use "CertUtil -decodehex ..." to convert to binary for the sha256 calculation.

You can use this site to check your results: https://iancoleman.io/bip39/
legendary
Activity: 2268
Merit: 18775
(BIP 39 word "west")
As hosseinimr93 has pointed out, your checksum is incorrect. The correct final word should be "weird", not "west".

So if I understand it right then, the only requirement for a valid 12th word for this 12 word BIP 39 phrase would have to contain 1011 at the end of their bit pattern.  That would mean that in addition to the BIP 39 word "west" that I chose two other options could have been either  “earth” number 555 decimal / 1000101011 binary and also the word “maximum” number 1099 / binary 10001001011  Is this correct?
Ignoring the fact you calculated the checksum incorrectly, your understanding here is wrong. There is exactly one word ("weird") which will be a valid final word for the 128 bits of entropy you have selected. There will be other words you could replace "weird" with and still have a valid 12 word seed phrase, but given that the last word contains 7 bits of entropy as well as 4 bits of checksum, then if you choose one of these other words then you will have a different 128 bits of entropy. Further, if you choose one of these other valid words, there is no guarantee that the 4 digit checksum would be the same given you are changing the entropy.

For example, the entropy you have given above encodes this seed phrase:
Code:
verify merit vapor prize quiz volume theme lucky young yellow life weird

This is also a valid seed phrase:
Code:
verify merit vapor prize quiz volume theme lucky young yellow life debris

Weird encodes the following:  11111001001
Debris encodes the following: 00111000011
Checksums are in bold.

Two different valid words, but with different entropy and different checksums.
legendary
Activity: 2380
Merit: 5213
3)   The resulting hash is: bc4f595b36de2533832a47bf66535612688d81594449693bed9414180ab7cad4
Your calculation is wrong.
You need to hash your entropy through SHA256 function as a hex input, not as a text.

First, you need to convert your entropy to a hexadecimal number.
The result is F2B173C5D5AAFFEBB80425FF5FE2057C.

The hex number need to be hashed through SHA256 function.
The result is 931258d717865a310cfc24a9161b21f4c0d02e0bb4cf12894516170a10e72339

If you convert the result to a binary number, the first 4 bits would be 1001
legendary
Activity: 3472
Merit: 10611
Would you be able to give me an idea how I could perform the checksum on a windows box for my entropy example?
Sorry, I have no idea.

Quote
My (apparently mis) understanding from the previous replies was that you take the SHA 256 digest of the 128 bit entropy then use the first 4 bits of that as the checksum occupying the last four bits of the 12th word.
That part is correct. The misunderstanding is after you computed and appended the checksum to the end and when you start changing your entropy.
member
Activity: 104
Merit: 120
Thank you for the reply. I guess I must have misunderstood some of the previous replies. Would you be able to give me an idea how I could perform the checksum on a windows box for my entropy example? My (apparently mis) understanding from the previous replies was that you take the SHA 256 digest of the 128 bit entropy then use the first 4 bits of that as the checksum occupying the last four bits of the 12th word. In this case the first hexadecimal value from said SHA 256 digest was b and when converting b hex into binary it's 1011 which I appended to the end of the original 128 bit entropy. Perhaps I'm not calculating the checksum correctly? Thanks.
legendary
Activity: 3472
Merit: 10611
The original word-list(s) are found here:
https://github.com/bitcoin/bips/blob/master/bip-0039/bip-0039-wordlists.md

Quote
*The 12th word can have several different possible words as all that needs to be present in the last word is the four bits of the 11 bit pattern for the 12th word.
~
So if I understand it right then, the only requirement for a valid 12th word for this 12 word BIP 39 phrase would have to contain 1011 at the end of their bit pattern.  That would mean that in addition to the BIP 39 word "west" that I chose two other options could have been either  “earth” number 555 decimal / 1000101011 binary and also the word “maximum” number 1099 / binary 10001001011  Is this correct?
That's not how it works.
The last 4 bits are the checksum of the 128-bit entropy not arbitrary bits. This means if you change even a single bit inside the 128-bit entropy the 4-bit checksum also changes.

I think you misunderstood the previous comments. They are talking about collision. If choosing "maximum" instead of "west" gives you a correct mnemonic, you are manually brute forcing the words to find a collision. In which case it is not just about the last word, you can change any other bit inside the 128-bit entropy. For example you could change the 5th word and still have the same last word (and same other 10 words).
member
Activity: 104
Merit: 120
Hi everyone and thank you for the excellent feedback.  Just to be sure I understand things properly, I’ve gone ahead and outlined my understanding in a step by step write up for an example of how I believe that one could calculate their 12 word BIP 39 seed.  Please let me know if I got this correct.


A few key points I took away from you all are:

* You need 128 bits of entropy for BIP39 for a 12 word seed phrase

*Each BIP 39 word has an 11 bit code (earth = # 555 or 1000101011 in binary) that I believe is located here: https://github.com/hatgit/BIP39-wordlist-printable-en

The 128 bits of entropy for BIP39 also requires and additional 4 bits for a checksum in the 12th word.  This checksum is placed in the last 4 bits of the 11 bit word. 

*To obtain the additional 4 bits for the checksum you need to perform a SHA256 hash on the 128 bits of entropy and then take the first 4 bits of this hash and append it to the 128 bits which gives you a total of 132 bits. 

*Once this 132 bits has been created, you then deconstruct them into 12, 11 bit groupings and then identify the valid BIP 39 words that correlate to their bit patterns.

*The 12th word can have several different possible words as all that needs to be present in the last word is the four bits of the 11 bit pattern for the 12th word.

With all that said, here is what I did to confirm my understanding of the above.  Please let me know if there are any obvious errors. Note that this is just an example entropy and nothing I will ever use to generate my own seed.  Thank you all for your help!

1)   I first generated a random 128 bit entropy as such:

1111001010110001011100111100010111010101101010101111111111101011101110000000010 0001001011111111101011111111000100000010101111100

2)   I next performed a hash of the entropy by saving it in a notepad.txt file then performing the following command:  certutil -hashfile test.txt SHA256

3)   The resulting hash is: bc4f595b36de2533832a47bf66535612688d81594449693bed9414180ab7cad4

4)   The first 4 bits of the hash would be 1011.  This is my understanding as I believe that when converting from hexadecimal to binary you must always represent each binary value with four bits.  In this example, b is converted to binary as 1011. 

5)   Next I appended the 4 bites derived from the first placeholder of the hexadecimal hash value converted  as follows ENT+CS = 1111001010110001011100111100010111010101101010101111111111101011101110000000010 00010010111111111010111111110001000000101011111001011

6)   Divide the resulting 132 bits into the following lists:

11110010101
10001011100
11110001011
10101011010
10101111111
11110101110
11100000000
10000100101
11111111010
11111111000
10000001010
11111001011 (BIP 39 word "west")

So if I understand it right then, the only requirement for a valid 12th word for this 12 word BIP 39 phrase would have to contain 1011 at the end of their bit pattern.  That would mean that in addition to the BIP 39 word "west" that I chose two other options could have been either  “earth” number 555 decimal / 1000101011 binary and also the word “maximum” number 1099 / binary 10001001011  Is this correct?

Also with respect to the way I computed the the hash of the 128 bits, I did the following:  I entered in all the 1s and 0s into a notepad file and saved in a .txt extension.  I then performed the CertUtil on said file that provided me the above digest in SHA256. Does this produce the correct hash file of the binary stream?  I’m not sure if I did this correctly. Thank you.
legendary
Activity: 2268
Merit: 18775
When you consider 11 fixed words and randomly selecting the 12th word, then yes, the numbers become exact rather than averages, as for any given first 7 bits (not 8 as you have used) of the last word then there is exactly 1 combination of the last 4 bits which is valid.

When approaching the problem from OP's point of view of randomly selecting words and hoping for a valid seed phrase then it becomes an average as if you were to take a 12 word seed phrase and cycle through all possibilities for the first word (for example) there is no guarantee that you would end up with 128 valid seed phrases, due to the unpredictable nature of the checksum.
legendary
Activity: 2380
Merit: 5213
There are (on average) 128 words which will be a valid checksum for a 12 word seed phrase. It is 8 words (on average) for 24 word seed phrases.
Thanks for the correction. I edited that post.
But isn't that exactly 128 words for the 12 word seed phrase and exactly 8 words for the 24 word seed phrase?

Let's say I have the first 11 words of a 12 word seed phrase and the last word is unknown.
There are 256 128 possibilities for the first 8 7 bits of the last word and 16 possibilities for its last 4 bits.

There's 1/256 1/128 chance that the first 8 bits 7 bits of the word I choose are 0000000.
There's 1/256 1/128 chance that the first 8 bits 7 bits of the word I choose are 0000001.
There's 1/256 1/128 chance that the first 8 bits 7 bits of the word I choose are 0000010.
.......
.......
.......


If the first 7 bits are 0000000, there's 1 possibility for the last 4 bits that make the seed phrase valid. The chance is 1/16.
If the first 7 bits are 0000001, there's 1 possibility for the last 4 bits that make the seed phrase valid. The chance is 1/16.
If the first 7 bits are 0000010, there's 1 possibility for the last 4 bits that make the seed phrase valid. The chance is 1/16.
......
......
......



Therefore the chance of having a valid BIP39 seed phrase is always 1/16 (128 out of 2048 words)
legendary
Activity: 2268
Merit: 18775
Anyway, if you have the first 11 words and you want to have valid BIP39 seed phrase, there are 8 words that can be used as the 12th word.
There are (on average) 128 words which will be a valid checksum for a 12 word seed phrase. It is 8 words (on average) for 24 word seed phrases.

Specifically what I'm trying to do is print out a list of the 2048 bip39 words and randomly select 12 to create my own offline generated seed.
Don't do this! It is an incredibly insecure method of generating a seed phrase. You will not and can not choose words randomly, despite your best efforts. Humans are not random. Whatever seed phrase you end up with at the end of this process will not represent 128 bits of entropy.

I'm trying to ensure true ravdsomness in seed creation and this seems to be the only way I can come up with outside of being able to independently verify the code from wallet manufacturers etc.
Do not select words. Instead, flip a fair coin 128 times to create your entropy, calculate and append the 4 bit checksum, and then encode that 132 bit number in to the corresponding words. For each 11 bit section you will need to convert to decimal and then add 1 before looking up the word on the BIP39 word list.
legendary
Activity: 4522
Merit: 3426
1) How does one identify the corresponding bit pattern from the BIP 39 word list?  Is it as simple as finding out full BIP 39 word list and then the patterns are in alphabetical order? For example would I be correct to assume that the first word alphabetically on the BIP 39 list is abandon and so the 11 bit pattern would be 00000000001 whereas the second word alphabetically is ability which should correlate to 00000000010 ?).  

The words in the BIP-39 word lists are in a specific order, but I wouldn't depend on them being in alphabetical order. And, as hosseinimr93 pointed out, the first word is 0. Here are the "official" lists: BIP 39 Word Lists

2) Do you know an easy way to identify the SHA 256 hash of a 128 bit stream offline in a widows PC or an android device?

Most languages have cryptography libraries on Windows and Android that include a variety of hash calculations. Windows has the CertUtil command, if that is what you are looking for.
legendary
Activity: 2380
Merit: 5213
1) How does one identify the corresponding bit pattern from the BIP 39 word list?  Is it as simple as finding out full BIP 39 word list and then the patterns are in alphabetical order? For example would I be correct to assume that the first word alphabetically on the BIP 39 list is abandon and so the 11 bit pattern would be 00000000001 whereas the second word alphabetically is ability which should correlate to 00000000010 ?).  
Yes. Just take note that the first word (abandon) represents 00000000000 and the second word (ability) represents 00000000001.


2) Do you know an easy way to identify the SHA 256 hash of a 128 bit stream offline in a widows PC or an android device?
If you are familiar with python programming, you can use hashlib library.
member
Activity: 104
Merit: 120
Excellent reply!  I do have a few questions for you though:

1) How does one identify the corresponding bit pattern from the BIP 39 word list?  Is it as simple as finding out full BIP 39 word list and then the patterns are in alphabetical order? For example would I be correct to assume that the first word alphabetically on the BIP 39 list is abandon and so the 11 bit pattern would be 00000000001 whereas the second word alphabetically is ability which should correlate to 00000000010 ?). 

2) Do you know an easy way to identify the SHA 256 hash of a 128 bit stream offline in a widows PC or an android device?

Thanks you very much for the excellent reply!
legendary
Activity: 4522
Merit: 3426
Specifically what I'm trying to do is print out a list of the 2048 bip39 words and randomly select 12 to create my own offline generated seed.

The right way to do it is to follow BIP-39:

  • 1. Generate 128 random bits.
  • 2. Compute the SHA-256 hash of the 128 bits.
  • 3. Append the first 4 bits of the hash to the 128 bits, giving you 132 bits.
  • 4. Split the 132 bits into 12 11-bit values.
  • 5. Generate the phrase by using the 11-bit values as indexes into the list of 2048 words.

However, there is another way similar to what you want to do, but the result may be less secure depending on how random your input is:

  • 1. Select 11 words from the word list. Duplicates are acceptable.
  • 2. Concatenate the indexes into a 121 bit string.
  • 3. Add another 7 bits, random, 0, or whatever.
  • 4. Compute the SHA-256 hash of the 128 bits.
  • 5. Append the first 4 bits of the hash to the 7 bits, giving you the index of the 12th word.

Finally, here is minor variation of the previous method. Again, the security depends on how the words are chosen:

  • 1. Select 11 words from the word list. Duplicates are acceptable.
  • 2. Determine the 128 words that would be valid as the 12th word.
  • 3. Choose one.
legendary
Activity: 2380
Merit: 5213
Maybe if I try to clarify a little bit further as far as to what I'm trying to do exactly it might be able to give us a better picture of whether it's feasible or not.
I fully understand what you are trying to achieve.


Specifically what I'm trying to do is print out a list of the 2048 bip39 words and randomly select 12 to create my own offline generated seed. Can this be feasibly done?
If you select 12 words, there's a big probability that your seed phrase doesn't pass the checksum.
Instead, you can select 11 words and then try to find a word which lead to a valid BIP39 seed phrase. By valid, I mean it passes the checksum
This is completely feasible, but it's not a common method for generating a seed phrase.
If you insist on generating your seed phrase in this way, you should make sure that the words are picked 100% random.

For generating a BIP39 seed phrase, I would start with a random 128 bit entropy instead of directly going to the word list.
member
Activity: 104
Merit: 120
Maybe if I try to clarify a little bit further as far as to what I'm trying to do exactly it might be able to give us a better picture of whether it's feasible or not. Specifically what I'm trying to do is print out a list of the 2048 bip39 words and randomly select 12 to create my own offline generated seed. Can this be feasibly done? I'm trying to ensure true ravdsomness in seed creation and this seems to be the only way I can come up with outside of being able to independently verify the code from wallet manufacturers etc. Thanks.
legendary
Activity: 2380
Merit: 5213
Specifically I'm interested in generating a 12 word seed but my understanding is that the 12 word would be a checksum.........
This is not true.
The checksum isn't the last word. The checksum is the last 4 bits.
Each of words include 11 bits. The first 7 bits of the last word have been generated randomly and its last 4 bits are the checksum.
So, if you have 11 word, for selecting the 12th word, you have to test different words until you find a valid word. As I already said in my previous post, 128 out of the 2048 words will lead to valid BIP39 seed phrase.
member
Activity: 104
Merit: 120
Thank you for the reply. Just to clarify what I'm attempting to do is trying to generate my own offline bip39 seeds. Specifically I'm interested in generating a 12 word seed but my understanding is that the 12 word would be a checksum and therefore I would need to to be able to figure out a way to easily identify the viable checksum options which is what I'm attempting to do here. Thank you very much for your assistance
legendary
Activity: 2380
Merit: 5213
You have 11 words and you want to select the 12th word, so the BIP39 seed phrase passes the checksum. Am I right?

If I have understood you correctly, first of all note that that's not how a BIP39 seed phrase is generated.
It's not that 11 words are generated and then the 12th word is selected. Instead you generate a random number and your seed phrase represents that number. Your seed phrase provides 128 bits of entropy and 4 bits are added as the checksum.

Anyway, if you have the first 11 words and you want to have valid BIP39 seed phrase, there are 8 words that can be used as the 12th word.
To find that word, you should use brute-force method. This means that you should test all the 2048 words one by one.

The post has been edited. Thanks  o_e_l_e_o for the correction.
member
Activity: 104
Merit: 120
Hello all, I was hoping someone can help me identify the best way for identifying what the correct checksum would be in a bip 39 seed list when I've generated 11/12 words. Thanks in advance for your support!
Jump to: