Author

Topic: BIP39 how software found the checksum? (Read 446 times)

full member
Activity: 260
Merit: 129
December 20, 2020, 07:54:47 PM
#16
Thank you for your answers ! It's OK for me now.
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
December 19, 2020, 05:29:34 PM
#15
Now, using my 128 bits (with the random 7 bits I added to the OPs original 121 bits of entropy):
Code:
import binascii
import hashlib

myhexstring = "A89EC4E8327577411A104FBECBBF6AEC"
myhash = hashlib.sha256(binascii.unhexlify(myhexstring)).digest()
print(binascii.hexlify(myhash))

Output should be:
Code:
cfbcc91db0c32831574d52049530de257354baee4ac768bae401ae330cfba4a8

I used something different to convert the sha256 checksum to hexadecimal, the contents of this Stack Overflow question. I thought it would work since the digest has a type 'bytes', but now I know it's making completely different results. Thanks!
HCP
legendary
Activity: 2086
Merit: 4363
December 19, 2020, 04:08:16 PM
#14
Take note HCP uses slightly different entropy, OP and few uses uses 1000110 for last 7 bits, but HCP uses 1101100 instead.
That's correct... the OP had just "bruteforced" a correct word... and worked backwards from there... so I ignored the 7 bits from their post, as it wasn't correctly generated... I simply generated 7 random bits and used those.



So that explains how I arrived at a different result. My SHA256 function was wrong, which means I cannot use the hashlib.sha256() function for computing checksums.
It's quite possible that your code was calculating the hash of the string, rather than the actual byte values... it's a common issue that people come across when trying to calculate hash values... I've done the same thing plenty of times using Python when I forget to work with bytes Wink hashlib.sha256().digest() works fine:

Essentially (using your example with the OP's 128 bits):
Code:
import binascii
import hashlib

myhexstring = "a89ec4e8327577411a104fbecbbf6ac6"
myhash = hashlib.sha256(binascii.unhexlify(myhexstring)).digest()
print(binascii.hexlify(myhash))

Output should be:
Code:
bf4b881634e88ff59c29caa582413ee050bb1ce9a72272cebe9491f28e474e03

This is different to the hash you got:
SHA256(0xa89ec4e8327577411a104fbecbbf6ac6) = 0xfdd211682c17f1af399453a541827c07


Now, using my 128 bits (with the random 7 bits I added to the OPs original 121 bits of entropy):
Code:
import binascii
import hashlib

myhexstring = "A89EC4E8327577411A104FBECBBF6AEC"
myhash = hashlib.sha256(binascii.unhexlify(myhexstring)).digest()
print(binascii.hexlify(myhash))

Output should be:
Code:
cfbcc91db0c32831574d52049530de257354baee4ac768bae401ae330cfba4a8

legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
December 19, 2020, 04:10:30 AM
#13
Then we SHA256 that (https://emn178.github.io/online-tools/sha256.html - Set the input type to HEX):
Code:
cfbcc91db0c32831574d52049530de257354baee4ac768bae401ae330cfba4a8

So that explains how I arrived at a different result. My SHA256 function was wrong, which means I cannot use the hashlib.sha256() function for computing checksums.
HCP
legendary
Activity: 2086
Merit: 4363
December 19, 2020, 03:08:04 AM
#12
Yeah... OP seems to be doing this backwards and needs to re-read BIP39: https://github.com/bitcoin/bips/blob/master/bip-0039.mediawiki

Specifically the "Generating the mnemonic" section... it explains pretty clearly how it works. You don't start with 11 words and then "bruteforce" the 12th word (7 bits entropy + 4 bit checksum).

As per the BIP:
Quote
First, an initial entropy of ENT bits is generated

noting that ENT is defined as:
Quote
The mnemonic must encode entropy in a multiple of 32 bits. With more entropy security is improved but the sentence length increases. We refer to the initial entropy length as ENT. The allowed size of ENT is 128-256 bits.

So, we need ENT value between 128bits and 256bits, in a multiple of 32... in OPs case, we need 128 bits of ENT (for 12 words). Unfortunately, the OP only generated 121 bits:
Code:
1010100010011110110001001110100000110010011101010111011101000001000110100001000001001111101111101100101110111111011010101

You CANNOT get from here to a valid 12 word seed... you're missing 7 bits... so use your dice and get another 7 bits and get to a total of 128 bits first before you concern yourself with the checksum!!




OK, so for illustration purposes, lets say that the OP went ahead and generated the other 7 bits required... and ended up with 128 bits like so:
Code:
10101000100111101100010011101000001100100111010101110111010000010001101000010000010011111011111011001011101111110110101011101100




Now, following the BIP:
Quote
A checksum is generated by taking the first ENT / 32 bits of its SHA256 hash.


So, we SHA256 our ENT... in this instance, we'll convert our binary to hex:
Code:
10101000100111101100010011101000001100100111010101110111010000010001101000010000010011111011111011001011101111110110101011101100
===>
Code:
A89EC4E8327577411A104FBECBBF6AEC




Then we SHA256 that (https://emn178.github.io/online-tools/sha256.html - Set the input type to HEX):
Code:
cfbcc91db0c32831574d52049530de257354baee4ac768bae401ae330cfba4a8




Then we convert that HEX result back to binary:
Code:
cfbcc91db0c32831574d52049530de257354baee4ac768bae401ae330cfba4a8
=====>
Code:
1100111110111100110010010001110110110000110000110010100000110001010101110100110101010010000001001001010100110000110111100010010101110011010101001011101011101110010010101100011101101000101110101110010000000001101011100011001100001100111110111010010010101000




We need the first ENT / 32 bits... 128 / 32 = 4... so we want 4 bits... == 1100

We add that to our original 128 bits:
Code:
101010001001111011000100111010000011001001110101011101110100000100011010000100000100111110111110110010111011111101101010111011001100




Then divide into twelve 11 bit chunks to derive our words:
Code:
10101000100
11110110001
00111010000
01100100111
01010111011
10100000100
01101000010
00001001111
10111110110
01011101111
11011010101
11011001100

The twelfth word is 11011001100 = 1740 = sunset




So our final derived 12 word mnemonic is: possible wage deliver gossip first party hair antique salute fuel survey sunset




We can confirm this by using Ian Coleman's Mnemonic Convertor: https://iancoleman.io/bip39/

First, paste in the 12 words into the BIP39 Mnemonic box, and then tick the box that says "Show Entropy Details":


You can see the calculated entropy, and the binary checksum etc.
legendary
Activity: 2128
Merit: 1293
There is trouble abrewing
December 17, 2020, 11:13:51 AM
#11
He has complete access to his entropy. I think he's just confusing the terminology here.

the problem sounds more like using the wrong approach to me. OP is selecting words at random instead of generating a random entropy then converting that to words. that is why he has only 121 bits of entropy not 128 (11 words * 11 bits).
the method should be generating 128 bits then converting that to words.

an alternative could be using the brute force as OP already guessed but from a starting point. meaning choose 12 words then change the last until the checksum is valid (increment the index one at a time). that way the entropy size could actually be 132 bit (instead of 128). and the approach would be similar to what electrum does when generating its seeds.
legendary
Activity: 1946
Merit: 1427
December 17, 2020, 07:17:00 AM
#10
What I'm understanding, is that the 12th word can't be predicted from the 11 first words. I need to choose a random 12th words, cumpute something, getting the checksum and chose the words that validate my seed ?
If you have 11 words, for finding a valid 12th word, you have to brute-force it. Because for 12th word, you need the checksum and for computing the checksum, you need the first 7 bits of the 12th word as well.

He has complete access to his entropy. I think he's just confusing the terminology here.


Convert it to a buffer or something, calculate the length (16), multiply by 8, we get 128.


Can you detail this step please? Thank you for your time.


P.S : So I need to start from a complete 128bits seed (12words) , and not 121bits (11 words). So I need to bruteforce last 7bits to calculate 4bits checksum...
Yes, you need to hash your entire entropy. (which is not 12 words, it's just 128 bits.)

Code:
So I need to bruteforce last 7bits to calculate 4bits checksum...
Why?
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
December 16, 2020, 08:03:30 PM
#9

You already have 11 words: possible wage deliver gossip first party hair antique salute fuel survey miracle, the 12th word is a combination of 1000110 and  a 4 bit checksum. [...]



This 1000110 is what I'm looking for.

How to compute from 11 random words? or 12 random words ?


What I'm understanding, is that the 12th word can't be predicted from the 11 first words. I need to choose a random 12th words, compute something, getting the checksum and chose the words that validate my seed ?

You should generate 7 bits of the 12th word, and then put them after the bits for your first 11 words. Take note of the bit string at this point.

After that download the page https://iancoleman.io/bip39/ and open it on an offline computer, and select the "Show Entropy Details" checkbox. Then there's a box called "Entropy". In that field you need to put your bit string in there but as a hexadecimal number (convert binary ==> hexadecimal). It will automatically compute the checksum for you, and therefore the final 4 bits of the 12th word and will show you the correct seed phrase for this entropy.



Now that I think of it, the reason I arrived at a different checksum than you is that Ian Coleman's page is computing the SHA256 sum and first bits different from me, the checksum displayed on his page is definitely correct.
legendary
Activity: 2380
Merit: 5213
December 16, 2020, 08:00:31 PM
#8
What I'm understanding, is that the 12th word can't be predicted from the 11 first words. I need to choose a random 12th words, cumpute something, getting the checksum and chose the words that validate my seed ?
If you have 11 words, for finding a valid 12th word, you have to brute-force it. Because for 12th word, you need the checksum and for computing the checksum, you need the first 7 bits of the 12th word as well.

It's not a good idea to select 11 words and find a valid 12th word. The best method is to select the first 128 bits (preferably using a dice), calculate the checksum and then find corresponding words.  
full member
Activity: 260
Merit: 129
December 16, 2020, 07:31:15 PM
#7

You already have 11 words: possible wage deliver gossip first party hair antique salute fuel survey miracle, the 12th word is a combination of 1000110 and  a 4 bit checksum. [...]



This 1000110 is what I'm looking for.

How to compute from 11 random words? or 12 random words ?


What I'm understanding, is that the 12th word can't be predicted from the 11 first words. I need to choose a random 12th words, cumpute something, getting the checksum and chose the words that validate my seed ?
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
December 16, 2020, 07:01:06 PM
#6
I think i'm confused now. How did you end up with 0111?


I took the bytes of the 128-bit entropy and converted it to hex. Then I put the 128-bit hex through SHA256, which is supposed to return 128 bits (hashlib.sha256().digest() returned me 128 bits[1], but I think it may not be doing it right. I really think it should be 32 bytes/256 bits no?). The four bits at the lower end of the SHA256 hash are 0x7, or 0111.

[1] Disregard the items in parentheses, I just did a size check for the hashed entropy again, and it came out as 32 bytes.
legendary
Activity: 1946
Merit: 1427
December 16, 2020, 06:54:54 PM
#5

You already have 11 words: possible wage deliver gossip first party hair antique salute fuel survey miracle, the 12th word is a combination of 1000110 and a 4 bit checksum. So the last word 10001101011 "miracle" is actually invalid.

Your first 128 bits are 10101000100  11110110001   00111010000  01100100111  01010111011  10100000100  01101000010  00001001111  10111110110  01011101111  11011010101  1000110. This is the 128-bit entropy. It is 0xa89ec4e8327577411a104fbecbbf6ac6 in hex.

Your checksum is first_4_bits(SHA256(entropy)), or for your example:

SHA256(0xa89ec4e8327577411a104fbecbbf6ac6) = 0xfdd211682c17f1af399453a541827c07

We take the first 4 bits of this hex, which is 7 ==> 0111. Then we append this to the entropy to get:

 10101000100  11110110001   00111010000  01100100111  01010111011  10100000100  01101000010  00001001111  10111110110  01011101111  11011010101  10001100111

I think i'm confused now. How did you end up with 0111?
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
December 16, 2020, 06:44:22 PM
#4
Don't think your hash function is right.

It isn't, OP must have generated the last 4 bits with the dice instead of deriving the checksum.



BIP39 12-word seed phrases have a 128 bit-entropy and a 4 bit checksum for a total of 132 bytes, and the 12th word includes. the last 128/32 = 4 bits of the checksum. Each word in the seed groups log2(2048) 11 bits together because there are 2048 possible words in the wordlist.

You already have 11 words: possible wage deliver gossip first party hair antique salute fuel survey miracle, the 12th word is a combination of 1000110 and a 4 bit checksum. So the last word 10001101011 "miracle" is actually invalid.

Your first 128 bits are 10101000100  11110110001   00111010000  01100100111  01010111011  10100000100  01101000010  00001001111  10111110110  01011101111  11011010101  1000110. This is the 128-bit entropy. It is 0xa89ec4e8327577411a104fbecbbf6ac6 in hex.

Your checksum is first_4_bits(SHA256(entropy)), or for your example:

SHA256(0xa89ec4e8327577411a104fbecbbf6ac6) = 0xfdd211682c17f1af399453a541827c07

We take the first 4 bits of this hex, which is 7 ==> 0111. Then we append this to the entropy to get:

 10101000100  11110110001   00111010000  01100100111  01010111011  10100000100  01101000010  00001001111  10111110110  01011101111  11011010101  10001100111


Convert it to a buffer or something, calculate the length (16), multiply by 8, we get 128.


Can you detail this step please? Thank you for your time.

Your bits have to be converted into a byte array in order to pass it to SHA256. In Python you can convert from binary string representation ==> integer ==> hex ==> bytearray to do this.

Code:
# the [2:] just shaves off the '0x' at the beginning
bytearray.fromhex(hex(int('10101000100111101100010011101000001100100111010101110111010000010001101000010000010011111011111011001011101111110110101011000110', 2))[2:])

Other languages have a similar way to convert to a bytes array.
full member
Activity: 260
Merit: 129
December 16, 2020, 06:36:53 PM
#3

Convert it to a buffer or something, calculate the length (16), multiply by 8, we get 128.


Can you detail this step please? Thank you for your time.


P.S : So I need to start from a complete 128bits seed (12words) , and not 121bits (11 words). So I need to bruteforce last 7bits to calculate 4bits checksum...
legendary
Activity: 1946
Merit: 1427
December 16, 2020, 06:29:04 PM
#2
Yes, it's entirely possible.

Take your entropy.

1010100010011110110001001110100000110010011101010111011101000001000110100001000 0010011111011111011001011101111110110101011000110

Convert it to a buffer or something, calculate the length (16), multiply by 8, we get 128.

divide by 32, we get 4.

hash 1010100010011110110001001110100000110010011101010111011101000001000110100001000 0010011111011111011001011101111110110101011000110

first to buffer
168, 158, 196, 232, 50, 117, 119, 65, 26, 16, 79, 190, 203, 191, 106, 198

hash -> 191, 75, 136, 22, 52, 232, 143, 245, 156, 41, 202, 165, 130, 65, 62, 224, 80, 187, 28, 233, 167, 34, 114, 206, 190, 148, 145, 242, 142, 71, 78, 3

convert the hash to binary

take 0,4

you get 1011.

Don't think your hash function is right.

I get bf4b881634e88ff59c29caa582413ee050bb1ce9a72272cebe9491f28e474e03
back to binary is
Code:
1011111101001011100010000001011000110100111010001000111111110101100111000010100111001010101001011000001001000001001111101110000001010000101110110001110011101001101001110010001001110010110011101011111010010100100100011111001010001110010001110100111000000011 
full member
Activity: 260
Merit: 129
December 16, 2020, 05:34:31 PM
#1
Hello,

I want to calculete checksum for a BIP 39 seed by hand. All my 11 first word are generated by Dice, but for the last word I need to do a checksum (no worry just a fake seed)


I have this 11 words :

Word          n°List       N°list in binary
possible      1348      10101000100
wage         1969      11110110001
deliver         464         00111010000
gossip         807         01100100111
first         699         01010111011
party         1284      10100000100
hair         834         01101000010
antique         79         00001001111
salute         1526      10111110110
fuel         751         01011101111
survey         1749      11011010101
miracle         1131         10001101011   <==== I Need to find Him


Here the entropy for my 11 words :
1010100010011110110001001110100000110010011101010111011101000001000110100001000 001001111101111101100101110111111011010101
Here the correct 12words entropy :
1010100010011110110001001110100000110010011101010111011101000001000110100001000 0010011111011111011001011101111110110101011000110
Checksum that I must found:
1011



So my final seed must be :
1010100010011110110001001110100000110010011101010111011101000001000110100001000 00100111110111110110010111011111101101010110001101011

So the last 11bits "10001101011" are in decimal "1131" corresponding here to "miracle"

If understand this https://learnmeabitcoin.com/technical/mnemonic page :
"This checksum is created by hashing the entropy through SHA256, which gives us a unique fingerprint for our entropy. We then take 1 bit of that hash for every 32 bits of entropy, and add it to the end of our entropy."


So I have my 11 words entropy :
1010100010011110110001001110100000110010011101010111011101000001000110100001000 001001111101111101100101110111111011010101
I hash it to SHA-256 with hex result :
4C3FA7A784B345C6BA9ECA9FCFEEAF36E9BE00D0A2406B88DB61F609137B8F68
Convert it to binary :
1001100001111111010011110100111100001001011001101000101110001101011101010011110 1100101010011111110011111110111010101111001101101110100110111110000000001101000 0101000100100000001101011100010001101101101100001111101100000100100010011011110 111000111101101000

Where do I find "10001101011" ?

Is it possible to predict the checksum with my 11 words ? Or do I need to bruteforce ?

Jump to: