PY21 - A simple BIP39 mnemonic generator in PYTHON

PowerGlove

hero member

Activity: 510

Merit: 3981

Quote from: apogio on March 06, 2024, 08:30:32 AM

Do you think it would be better if I generated more bits of entropy and then keeping 128 of them? Instead of zfill or repeating the process if bits are less than 128.

Yup, in the context of your script, slicing a length-128 string from a larger string of random 1s and 0s would work (that is, it would correct the bias, because the high-order bit would then have a 50/50 chance of being either 1 or 0).

But, the way that you posed that option as an alternative to zfill makes me think you don't fully understand the source of this bias. I hope you don't find the following explanation tiresome, I just know that if it were me in your shoes, I'd really appreciate someone taking pains to help me understand the potential hole in my thinking:

Let's imagine a toy version of this problem where you're trying to generate just 4 (rather than 128) bits of entropy with a coin. Let's not go down the rabbit hole of coin fairness, entropy extraction, and tossing technique. (I have a sometimes-sophomoric, or, as my wife would say, "bloody stupid" sense of humor, especially when I'm in a good mood, so "tossing technique" just gave me the giggles. Don't toss while flipping coins, yeah?) Cheesy

There are only 16 possible ways for 4 coin-tosses-in-a-row to end up, so let's put them all in a table (ignore the last 4 columns for now):

Outcome	Pattern	Binary	Decimal	`f1($pcv)`	`f2($pcv)`	`f3($pcv)`	`f4($pcv)`
#1	HHHH	1111	15	`"0b1111"`	`"1111"`	`"1111"`	`"HHHH"`
#2	HHHT	1110	14	`"0b1110"`	`"1110"`	`"1110"`	`"HHHT"`
#3	HHTH	1101	13	`"0b1101"`	`"1101"`	`"1101"`	`"HHTH"`
#4	HHTT	1100	12	`"0b1100"`	`"1100"`	`"1100"`	`"HHTT"`
#5	HTHH	1011	11	`"0b1011"`	`"1011"`	`"1011"`	`"HTHH"`
#6	HTHT	1010	10	`"0b1010"`	`"1010"`	`"1010"`	`"HTHT"`
#7	HTTH	1001	9	`"0b1001"`	`"1001"`	`"1001"`	`"HTTH"`
#8	HTTT	1000	8	`"0b1000"`	`"1000"`	`"1000"`	`"HTTT"`
#9	THHH	0111	7	`"0b111"`	`"111"`	`"0111"`	`"THHH"`
#10	THHT	0110	6	`"0b110"`	`"110"`	`"0110"`	`"THHT"`
#11	THTH	0101	5	`"0b101"`	`"101"`	`"0101"`	`"THTH"`
#12	THTT	0100	4	`"0b100"`	`"100"`	`"0100"`	`"THTT"`
#13	TTHH	0011	3	`"0b11"`	`"11"`	`"0011"`	`"TTHH"`
#14	TTHT	0010	2	`"0b10"`	`"10"`	`"0010"`	`"TTHT"`
#15	TTTH	0001	1	`"0b1"`	`"1"`	`"0001"`	`"TTTH"`
#16	TTTT	0000	0	`"0b0"`	`"0"`	`"0000"`	`"TTTT"`

The first four columns are: 1. the possibility/outcome # (1 through 16, so, every possible outcome accounted for), 2. the heads-or-tails pattern corresponding to each outcome (H = heads, T = tails), 3. the same heads-or-tails pattern but in base-2 (1 = heads, 0 = tails), and 4. the heads-or-tails pattern converted from base-2 into base-10.

The last four columns involve the imaginary variable $pcv (previous column's value) and the following definitions:

Code:

f1 = lambda x: bin(x)
f2 = lambda x: x[2:]
f3 = lambda x: x.zfill(4)
f4 = lambda x: x.replace('0', 'T').replace('1', 'H')

Now, the first thing to think about when looking at that table is if there are any "bad" or "low entropy" outcomes in it? The answer to that is no: as long as each outcome is equally probable, then any given outcome is just as "entropic" as any other outcome. HTTH is just as good as TTTT, savvy? That means that, tempting though it is, it's a mistake to use the string length of the values in the f2($pcv) column to decide whether or not you "have enough entropy" (to be clear, it's a mistake to base that decision on anything about a single outcome). If you discard outcomes whenever len(f2($pcv)) != 4, then you're only ensuring that the pattern will always begin with a heads instead of a tails (check the table to confirm that).

The second thing to think about when looking at that table is if the final patterns in the f4($pcv) column ever disagree with the source patterns in column 2? They don't. So, you can rest assured that the zfill(128) that was present in your original script was correct (that is, it was only restoring 0s that, in some sense, were there from the start, and were "lost" during an int to str conversion). You can/should pat yourself on the back for getting it right the first time. Wink

There are at least two other bugs in your script though (beyond the bias bug, which we'll call Bug #1):

Bug #2

You need a zfill(32) before passing the hexadecimal representation of your entropy into bytes.fromhex (otherwise, you'll occasionally pass it an odd number of nibbles, which will cause it to raise a ValueError).

Bug #3

You need a zfill(256) at the end of the line that calculates your sha256_bin variable (otherwise, your checksum, and therefore the last word of your mnemonic, will be wrong 50% of the time).

For an example, try manually setting your entropy to cc399b43e82bfbc07f2fe3fd1f13ed41.

Your script will spit out the following mnemonic: slow smoke special space sausage then witness wise wonder weasel win lobster

Ian's script (and mine) will spit out this instead: slow smoke special space sausage then witness wise wonder weasel win lion

The reason your wonder weasel only managed to win a lobster instead of a lion, is because you're miscalculating the checksum bits as 1001 instead of 0001.

(I'm going to be too busy to respond for a while. I'll take a peek at this thread again in a few weeks' time.)

NotATether

legendary

Activity: 1568

Merit: 6660

bitcoincleanup.com / bitmixlist.org

Quote from: apogio on March 06, 2024, 08:30:32 AM

Do you think it would be better if I generated more bits of entropy and then keeping 128 of them? Instead of zfill or repeating the process if bits are less than 128.

If you generate more rounds of entropy then the amount that you end up with will be the average of the entropy of all the rounds. in other words: (E(Loop1) + E(Loop2) + ...) / N. Unless you select the bits from a specific place in the total amount in which case the calculation becomes slightly different but still follows through.

It doesn't make much of a difference, and if the reason you are doing this is to cover for a potential insecurity in the random number generator, it will not work well. OpenSSL's RNG is much better for that purpose.

apogio

sr. member

Activity: 406

Merit: 896

Thanks PowerGlove, I find your answer very helpful.
I will check it in detail later today and I will comment, if I find anything important.
As you said, this is the approach that I like to follow. I code in order to understand something better.

Do you think it would be better if I generated more bits of entropy and then keeping 128 of them? Instead of zfill or repeating the process if bits are less than 128.

PowerGlove

hero member

Activity: 510

Merit: 3981

Quote from: apogio on March 04, 2024, 01:02:23 PM

Hello, I just updated the OP with this:

Code:

entropy = bin(int(token_hex(16), 16))[2:]
while len(entropy) != 128:
    entropy = bin(int(token_hex(16), 16))[2:]

so now, if the entropy is less than 128, it repeats the process until it is 128 bits long.

You've introduced a small bias by doing that...

This is a pattern that I see pretty often: an attempt to make things "more secure" that ends up backfiring.

Think of it like this: by predicating that loop on the entropy being exactly 128 bits long, you've removed the possibility of your program ever generating a 128-bit value with the high-order bit set to 0. That means, in hexadecimal terms, your entropy will always start with an 8, 9, A, B, C, D, E, or F (0 through 7 are no longer possible). In BIP39 terms, the first word of your mnemonic will always start with one of the letters L through Z (no matter how many times you run your script, you'll never generate a mnemonic that starts with an A-word, for example).

I think that what you were doing before with the zfill was better. (But, if you change it back, you should be aware that that will reveal a different bug that's hiding in your code: bytes.fromhex always expects a string with an even number of nibbles. That is, bytes.fromhex('abcd') will return b'\xab\xcd', but bytes.fromhex('abc') won't return b'\x0a\xbc', as you might expect, instead, it will raise a ValueError. So, there's a ~6% chance that your script will fail on any given invocation. Try it yourself, run the original version of your script again and again in a terminal and you'll encounter that problem eventually and get a traceback that'll point to the offending line.)

Please don't feel discouraged. It's cool that you're learning Bitcoin by programming little bits and pieces of it yourself, and then sharing your code: that's a really effective way to learn in my experience, so keep it up, dude. Wink

I don't know about you, but I really appreciate seeing other people's takes on problems that I've recently solved myself, especially if they involve techniques that I may not be familiar with. So, here's my take on a 12-word BIP39 mnemonic generator in Python:

Code:

#!/usr/bin/env python3

# Don't use this for anything serious; it *probably* works as intended, but I haven't tested it much (I wrote it quickly for apogio).

import base64

import lzma

import secrets

import hashlib

bip39_words: list[str] = lzma.decompress(base64.b85decode('{Wp48S^xk9=GL@E0stWa761SMbT8$j;4?ZDBwYY7n21WR2~SMNj7l(^H;DHJAH%e&gKef~1EXSD0Us}VywB$hQSCL24wlCsKYCV<&UARZHOh)&>5DZ-~ie;w#}??6$%xTl&8@gxbgANqLq@*=qXFu|h=a2+%Xss$^84V29ubWU|Jis}ZeS^SfW}jww7nw9K&Cx%HKy;0Y37#)`+p3zY1bZ`7k+YIMR3u^uY^9ccN{kp=S?u6N4zisDC8gkIJo$^^m6wM^a$8H3bAYEsAIc966#xe{9j&hY0zxEiT${N!QqS^qX$el?jg>&rnPf$K-yrI;)DRZN%5N67_SO#n=yj@!>$4Hz)evAV5=sYEQS$z7tp@GUj@>>S;S=L=!OK{Jtif0|-p%w0w#bxt71|pkbpS}&H*pv#^=Et7p$ITGaVb29^yoJQ!zsLIcy9`T(hCW;CliZjzHlUHk5C~^7_nJ1EUY6*S-;1uIhueo7A2kS$(TLIuv1k&86r%i)z0*-`orY@!pD9im@rths?CuQ90QmSY+!iP@8%nQ%d2Mnv#BVLo#w-6V9c5aG@e2d9O-C!?y>GSCUs5I^54+iV}u#dj~6%nU^oJ!C{P_)~vNh$*wcdG_&J;l!^sRRdSF->(8X?IogT__SWD7&M;TnqE9)DN*23c>sJsQxXi3;5J6DDh(_H0P%C?nxktxwmY48AW19?`wuF+F(0QfgOO}Ojr7j3KfbBaY+=fiqt?}oNK#yaFde;{ARhlk@2*Yh;H++vcQgmvVuv4wKzHFO&Ru(cfJW=QuOvQu>M0i6U6^04JXbcxiR-dL7D_vpPfb6ZAR`2uHBG)agXV6VC2h%>;B+vg*v^Xsw3(RIEe8ls@DQbyP-deH3-Hb)O_0nXteO2F_5u@cpMV^*D4qiuNSK-`U!)6zJN$A6{Hhkj+D@VTMcYtaNfsu>Wn{j~^6xu22NN$^PmEmEl*lr~k&2zu1nq>@!y1*ypM67Y`iGod_AKq_r-*jlOyBOA{KNrg}5Ac0RMtO%k2jizS0XF!`YtPkPQ`ImjU%rS#KF}#sq+r|(L72zb1TC2;dOS~ikeihcHtjY!p{N`Z*Q+Cx&oG&bsXVozbAQj&R&`HmZr4)YnH?nY^m}If@mU2OBIw{C4^z1meuyEu;2#(@;2Osppn@|UV6U2xJvDC)cQ`(jFILY>=0`+;VLBTS|!3}0QE&E(EzgOrO8NIJOzT>^p}ISTqQSGa0npbKH^kf2A+%8xf8FWGBc;!h)Vd2IuWu!v6lNZ{yVZpjQ$0$y<=)J7sK?*>Vv>n?iROPvbL3X*R&VFHz8cX8#e-z!_RyxYIU<(6gQ_4b(OE?b)!x=RJf;~4bi0p=4VlSVImRd!l7a7cE+)`2rKn4pAw_LqwZx&RWLHTaB*I{-F6zLlIF;yxKK4;IvNZ6t|?^jhonkM3>Vwmf(Vk0@EB2?UeVdB;Yb|Kd;W%R3JWCWlnF(sPu^{(m8)`sb8503#A0k-zZPD`M9T*&C+y`J9JD_0->(14W>=Y!~_Y8&AfDjoh)LpK6g5a&}4OZ^ryKhc3JF+hme1Be32X32Sb{U6qNKLDpmXx~u;TgGnn6m@#M;b$CnfIq7~tRFqGO>Nx0>tP_m`QW>a`i!TRAZ%h#-uM7BMoC{dn$iVRKwHz-B)^APT2#W+FMad$_X#-BM+bg%Rf2oYfSf2eeAFo8OyWD>GfI>IduIKFJlGhObtF)718Oq4s#zK*RIUGa7k|k$4HOG>9RiJM;@2yK=NBbEC6*Kd&2VVqPUxacYwN@0mswW469o32{DJf=y*CU9V7)?7loOwYQ9_8JLIotvhv~&Wz%i0Z!05V1|QY@4L4jp;%JOM*dix({v*!!8Z=C&nDTk&RqqACXz9fO2kwe61+}5DG1WkN7Hp0~$f1@=n~e=))pi@!)Tac?masVzC?^p0gZWN%vmU+jb}&3M!$|S+EHr+mGIU0i;)%3RPlu5L-Y@1H-swrrU`hjMq%4nK}+=J_f-N8;rWk>@YoZbR~hDP?=07HhnK7Yha4mRW0EOgl>Q5h&N?zt#)JjrC*Ls77Nt7Cp(!Js+pn%*(NBy}Z)CsN&bjh^gC{rL~LauR(COgtSd>bE4IHnvS$>ovdGkG|VU^PL1*cw*dz6l9aH($Kat5LTU9BKbaVBD}e6ZT~A9kcNB{KYbj~K(7{m$+D85|@#74?ms4(7n$~>n>pTX`Y%sMm%q37$sm}KmutW;E;=^VLusQ?lO6n`0TFjQAc`g)`IHG4G<%c8sRkb*u>;O8etvo0@=>SAhL77BuaNdi|LI0L>I=?nF^ZO%Qae<~ZG5pz#9ZAFRpFG%-Y8{Rmd*f!oYYX*iSsjQjl`TqEOaP?mMm@Y?2^|vWSQqhp#Ro~xw3f(fI6x=i)ocaQIs|Dvg?AX3|G_enP}P7N_?{nab4X|xO;Bvcny;Mgm&n;}#^N?ht)QPa=pK=OJT&Nc@`)s2y%=xX{IUe}??&l&?LPPJ^bTBK*!;mW_~-<`zW8)>h0fy7dX3AT&!LmnKb**OFB-IsN$aD^LPcO{na32FZCS;P!p)sbnR{<1H8&Io*XH>^%{X5H?vtC_<5Yl)4H{;T?eR=xo4zMgGxrMG9p@1UtXLg|+1$C`Eo$g|Lx#e~g(V3q>lQZ}FvGh-tAjMoOTg=^RKYDoUIaU;*WhzRh@}-9nE11k#_w`r+2d7A@2gw`AU7O5Nt!32Lo0+$xOMdW67%8`w@1^w*jNQ!&uSKfge%M28%OO{Ws}_mw5k2)UX#8EE`$J&ery(PFRTPRe+8ME$%+0(8%omCTo5iz3vJ+obP2NWfe{?nUg>q@;gbs0zwWscj2jCr*C32c$lxQ+=4Exp3VPq2xzPUNKpQcy~h@N$a;FQxng%f-{rN>P_yc?!P-}$P_?H9E>1g+ea9y4H{m9vJ169o>9(N|lI$0M}Uv$@25BS_d%2FJ&gv3Sf0%>(OD;tg#mF(VF6GK7(DK*r9%wu?|AusPFcGjp-H&mfb5eGek@LU1s(Jk*`7!ur*UU_J@K4Ie>Su?<+apZL;q7d8Z?MJJ{Ew(x%93*MI1EI#&X%iGGdJ9!(|UhI+i*`+>jZn?h;K-w}>M~lFsBSVmjSI*Py=BlvSgU|RO*TT1Fa$2+@ie+S506rPh`84?h7$4MDe6a2Od4|YI$7k1`A|K_FixJ(d#P~6R>uB8!?^*`G~OIz3q_gyhleNle+a{7n>fPT}9i)}$yGwM~j_|YsLYVAJhKBb?=7uZ*07(_qNd_N5&8E8#c2Wq82=IEJY5s;WXJU|dRpVRJ1*7P~Kwj{V%l9Wc;(#+4XM)!#giyS6FL_7O`DvOWUo6zO(FggIxM-;oSqeP>I^_M`6)E1WsCDM2O~XMf-t1*O+P)yvRgmiUb)kFO@;xD+2G{@qPq#);P$7Wt00F=#yJi3YrO+j%vBYQl0ssI200dcD')).decode('ascii').split('\n')

entropy: int = secrets.randbits(128)

checksum: int = hashlib.sha256(entropy.to_bytes(16, 'big')).digest()[0] >> 4

combined: int = (entropy << 4) | checksum

mnemonic: list[str] = [bip39_words[(combined >> (index * 11)) & 2047] for index in reversed(range(12))]

print(f'Entropy (128 bits, hexadecimal): {hex(entropy)[2:].zfill(32)}')

print(f'Checksum (4 bits, binary): {bin(checksum)[2:].zfill(4)}')

print(f'Mnemonic (12 words, english): {" ".join(mnemonic)}')

And here's the same script but generalized to produce either 12, 15, 18, 21, or 24-word mnemonics:

Code:

#!/usr/bin/env python3

# Don't use this for anything serious; it *probably* works as intended, but I haven't tested it much (I wrote it quickly for apogio).

import base64

import lzma

import secrets

import hashlib

bip39_words: list[str] = lzma.decompress(base64.b85decode('{Wp48S^xk9=GL@E0stWa761SMbT8$j;4?ZDBwYY7n21WR2~SMNj7l(^H;DHJAH%e&gKef~1EXSD0Us}VywB$hQSCL24wlCsKYCV<&UARZHOh)&>5DZ-~ie;w#}??6$%xTl&8@gxbgANqLq@*=qXFu|h=a2+%Xss$^84V29ubWU|Jis}ZeS^SfW}jww7nw9K&Cx%HKy;0Y37#)`+p3zY1bZ`7k+YIMR3u^uY^9ccN{kp=S?u6N4zisDC8gkIJo$^^m6wM^a$8H3bAYEsAIc966#xe{9j&hY0zxEiT${N!QqS^qX$el?jg>&rnPf$K-yrI;)DRZN%5N67_SO#n=yj@!>$4Hz)evAV5=sYEQS$z7tp@GUj@>>S;S=L=!OK{Jtif0|-p%w0w#bxt71|pkbpS}&H*pv#^=Et7p$ITGaVb29^yoJQ!zsLIcy9`T(hCW;CliZjzHlUHk5C~^7_nJ1EUY6*S-;1uIhueo7A2kS$(TLIuv1k&86r%i)z0*-`orY@!pD9im@rths?CuQ90QmSY+!iP@8%nQ%d2Mnv#BVLo#w-6V9c5aG@e2d9O-C!?y>GSCUs5I^54+iV}u#dj~6%nU^oJ!C{P_)~vNh$*wcdG_&J;l!^sRRdSF->(8X?IogT__SWD7&M;TnqE9)DN*23c>sJsQxXi3;5J6DDh(_H0P%C?nxktxwmY48AW19?`wuF+F(0QfgOO}Ojr7j3KfbBaY+=fiqt?}oNK#yaFde;{ARhlk@2*Yh;H++vcQgmvVuv4wKzHFO&Ru(cfJW=QuOvQu>M0i6U6^04JXbcxiR-dL7D_vpPfb6ZAR`2uHBG)agXV6VC2h%>;B+vg*v^Xsw3(RIEe8ls@DQbyP-deH3-Hb)O_0nXteO2F_5u@cpMV^*D4qiuNSK-`U!)6zJN$A6{Hhkj+D@VTMcYtaNfsu>Wn{j~^6xu22NN$^PmEmEl*lr~k&2zu1nq>@!y1*ypM67Y`iGod_AKq_r-*jlOyBOA{KNrg}5Ac0RMtO%k2jizS0XF!`YtPkPQ`ImjU%rS#KF}#sq+r|(L72zb1TC2;dOS~ikeihcHtjY!p{N`Z*Q+Cx&oG&bsXVozbAQj&R&`HmZr4)YnH?nY^m}If@mU2OBIw{C4^z1meuyEu;2#(@;2Osppn@|UV6U2xJvDC)cQ`(jFILY>=0`+;VLBTS|!3}0QE&E(EzgOrO8NIJOzT>^p}ISTqQSGa0npbKH^kf2A+%8xf8FWGBc;!h)Vd2IuWu!v6lNZ{yVZpjQ$0$y<=)J7sK?*>Vv>n?iROPvbL3X*R&VFHz8cX8#e-z!_RyxYIU<(6gQ_4b(OE?b)!x=RJf;~4bi0p=4VlSVImRd!l7a7cE+)`2rKn4pAw_LqwZx&RWLHTaB*I{-F6zLlIF;yxKK4;IvNZ6t|?^jhonkM3>Vwmf(Vk0@EB2?UeVdB;Yb|Kd;W%R3JWCWlnF(sPu^{(m8)`sb8503#A0k-zZPD`M9T*&C+y`J9JD_0->(14W>=Y!~_Y8&AfDjoh)LpK6g5a&}4OZ^ryKhc3JF+hme1Be32X32Sb{U6qNKLDpmXx~u;TgGnn6m@#M;b$CnfIq7~tRFqGO>Nx0>tP_m`QW>a`i!TRAZ%h#-uM7BMoC{dn$iVRKwHz-B)^APT2#W+FMad$_X#-BM+bg%Rf2oYfSf2eeAFo8OyWD>GfI>IduIKFJlGhObtF)718Oq4s#zK*RIUGa7k|k$4HOG>9RiJM;@2yK=NBbEC6*Kd&2VVqPUxacYwN@0mswW469o32{DJf=y*CU9V7)?7loOwYQ9_8JLIotvhv~&Wz%i0Z!05V1|QY@4L4jp;%JOM*dix({v*!!8Z=C&nDTk&RqqACXz9fO2kwe61+}5DG1WkN7Hp0~$f1@=n~e=))pi@!)Tac?masVzC?^p0gZWN%vmU+jb}&3M!$|S+EHr+mGIU0i;)%3RPlu5L-Y@1H-swrrU`hjMq%4nK}+=J_f-N8;rWk>@YoZbR~hDP?=07HhnK7Yha4mRW0EOgl>Q5h&N?zt#)JjrC*Ls77Nt7Cp(!Js+pn%*(NBy}Z)CsN&bjh^gC{rL~LauR(COgtSd>bE4IHnvS$>ovdGkG|VU^PL1*cw*dz6l9aH($Kat5LTU9BKbaVBD}e6ZT~A9kcNB{KYbj~K(7{m$+D85|@#74?ms4(7n$~>n>pTX`Y%sMm%q37$sm}KmutW;E;=^VLusQ?lO6n`0TFjQAc`g)`IHG4G<%c8sRkb*u>;O8etvo0@=>SAhL77BuaNdi|LI0L>I=?nF^ZO%Qae<~ZG5pz#9ZAFRpFG%-Y8{Rmd*f!oYYX*iSsjQjl`TqEOaP?mMm@Y?2^|vWSQqhp#Ro~xw3f(fI6x=i)ocaQIs|Dvg?AX3|G_enP}P7N_?{nab4X|xO;Bvcny;Mgm&n;}#^N?ht)QPa=pK=OJT&Nc@`)s2y%=xX{IUe}??&l&?LPPJ^bTBK*!;mW_~-<`zW8)>h0fy7dX3AT&!LmnKb**OFB-IsN$aD^LPcO{na32FZCS;P!p)sbnR{<1H8&Io*XH>^%{X5H?vtC_<5Yl)4H{;T?eR=xo4zMgGxrMG9p@1UtXLg|+1$C`Eo$g|Lx#e~g(V3q>lQZ}FvGh-tAjMoOTg=^RKYDoUIaU;*WhzRh@}-9nE11k#_w`r+2d7A@2gw`AU7O5Nt!32Lo0+$xOMdW67%8`w@1^w*jNQ!&uSKfge%M28%OO{Ws}_mw5k2)UX#8EE`$J&ery(PFRTPRe+8ME$%+0(8%omCTo5iz3vJ+obP2NWfe{?nUg>q@;gbs0zwWscj2jCr*C32c$lxQ+=4Exp3VPq2xzPUNKpQcy~h@N$a;FQxng%f-{rN>P_yc?!P-}$P_?H9E>1g+ea9y4H{m9vJ169o>9(N|lI$0M}Uv$@25BS_d%2FJ&gv3Sf0%>(OD;tg#mF(VF6GK7(DK*r9%wu?|AusPFcGjp-H&mfb5eGek@LU1s(Jk*`7!ur*UU_J@K4Ie>Su?<+apZL;q7d8Z?MJJ{Ew(x%93*MI1EI#&X%iGGdJ9!(|UhI+i*`+>jZn?h;K-w}>M~lFsBSVmjSI*Py=BlvSgU|RO*TT1Fa$2+@ie+S506rPh`84?h7$4MDe6a2Od4|YI$7k1`A|K_FixJ(d#P~6R>uB8!?^*`G~OIz3q_gyhleNle+a{7n>fPT}9i)}$yGwM~j_|YsLYVAJhKBb?=7uZ*07(_qNd_N5&8E8#c2Wq82=IEJY5s;WXJU|dRpVRJ1*7P~Kwj{V%l9Wc;(#+4XM)!#giyS6FL_7O`DvOWUo6zO(FggIxM-;oSqeP>I^_M`6)E1WsCDM2O~XMf-t1*O+P)yvRgmiUb)kFO@;xD+2G{@qPq#);P$7Wt00F=#yJi3YrO+j%vBYQl0ssI200dcD')).decode('ascii').split('\n')

mnemonic_length: int = 12

assert mnemonic_length in {12, 15, 18, 21, 24}

mnemonic_factor: int = mnemonic_length // 3

entropy: int = secrets.randbits(mnemonic_factor * 32)

checksum: int = hashlib.sha256(entropy.to_bytes(mnemonic_factor * 4, 'big')).digest()[0] >> (8 - mnemonic_factor)

combined: int = (entropy << mnemonic_factor) | checksum

mnemonic: list[str] = [bip39_words[(combined >> (index * 11)) & 2047] for index in reversed(range(mnemonic_length))]

print(f'Entropy ({mnemonic_factor * 32} bits, hexadecimal): {hex(entropy)[2:].zfill(mnemonic_factor * 8)}')

print(f'Checksum ({mnemonic_factor} bits, binary): {bin(checksum)[2:].zfill(mnemonic_factor)}')

print(f'Mnemonic ({mnemonic_length} words, english): {" ".join(mnemonic)}')

apogio

sr. member

Activity: 406

Merit: 896

Quote from: ranochigo on March 03, 2024, 10:19:34 PM

It would be a small improvement to remove zfill, since it doesn't do much other than padding with zeros for which it isn't needed when there is a SHA256 hashing after. That probably should be removed eitherways, if the entropy is lower than required, then they should be able to run the entropy generation again. In a similar vein, a sanity check on the entropy before the SHA256 hashing would improve the security as well.

Hello, I just updated the OP with this:

Code:

entropy = bin(int(token_hex(16), 16))[2:]
while len(entropy) != 128:
    entropy = bin(int(token_hex(16), 16))[2:]

so now, if the entropy is less than 128, it repeats the process until it is 128 bits long.

Thanks for the feedback.

ranochigo

legendary

Activity: 2954

Merit: 4158

It would be a small improvement to remove zfill, since it doesn't do much other than padding with zeros for which it isn't needed when there is a SHA256 hashing after. That probably should be removed eitherways, if the entropy is lower than required, then they should be able to run the entropy generation again. In a similar vein, a sanity check on the entropy before the SHA256 hashing would improve the security as well.

Edit: I stand corrected on the point about regeneration of entropy; should not be strictly enforced at 128.

apogio

sr. member

Activity: 406

Merit: 896

As the title suggests, I developed an easy to use script that generates a BIP39 mnemonic.

I implemented it for fun. I don't plan to use it for real money.

The script:

Code:

# contact: [email protected]
from secrets import token_hex
from hashlib import sha256

# read bip39 wordlist from file and import in a list
bip39_wordlist_file = open("bip39_wordlist.txt", "r")
bip39_wordlist = bip39_wordlist_file.read().split('\n')

# entropy 
entropy = bin(int(token_hex(16), 16))[2:]
while len(entropy) != 128:
    entropy = bin(int(token_hex(16), 16))[2:]
print('---------')
print('ENTROPY: ')
print('---------')
print(entropy)

# calculate SHA256
sha256_hex = sha256(bytes.fromhex(hex(int(entropy,2))[2:])).hexdigest()
sha256_bin = bin(int(sha256_hex, 16))[2:]

# calculate checksum
checksum = sha256_bin[0:4]

# final seed to be converted into BIP39 mnemonic
final = entropy + checksum

num_of_words = 12
word_length = len(final) // num_of_words

# calculate mnemonic 
res = []
for idx in range(0, len(final), word_length):
    res.append(final[idx : idx + word_length])

mnemonic = []
for idx in range(0, num_of_words):
    binary_place = res[idx]
    decimal_place = int(binary_place,2)
    mnemonic.append(bip39_wordlist[decimal_place])

print('\n-------------')    
print('BIP39 PHRASE: ')
print('-------------') 
for w in range(0, len(mnemonic)):
    print(str(w+1) + ': ' + mnemonic[w])

How to run:
1. Create a file on your machine (example mnemonic_gen.py).
2. Copy - paste the code from above.
3. Create a file on your machine, called bip39_wordlist.txt and copy-paste the wordlist into the file.
4. Make sure to have both files in the same directory.
5. Just run python mnemonic_gen.py

Sample output:

Code:

---------
ENTROPY:
---------
11101110101000001011111101111000111100001001001100010000100001110011110100010011010100011000011001100100011111100100010111011100

-------------
BIP39 PHRASE:
-------------
1: upgrade
2: album
3: taste
4: thrive
5: country
6: drum
7: violin
8: health
9: major
10: catalog
11: multiply
12: ride

Extra notes:
1. The script uses secrets module to generate entropy. It is essentially a CSPRNG and is the recommended approach to generate pseudo-random numbers in Python. Internally, it makes use of os.urandom as well.
2. For best security, use it offline, by just running the script on an airgapped device.
3. This is not a complete wallet. You must import the seed phrase on an offline wallet that you like, in order to convert the BIP39 phrase into a seed and produce the corresponding xpriv and xpub.
4. This method is only recommended if you don't trust the entropy source of your device and you want to use CSPRNG on an airgapped computer though python libraries.
5. It is similar to Ian Coleman's BIP39 implementation, in a sense that they both must be executed offline. The difference lies in the libraries that are used, as Ian's implementation uses javascript, whereas the script above uses python libraries.

Topic: PY21 - A simple BIP39 mnemonic generator in PYTHON (Read 230 times)