Author

Topic: PY21 - A simple BIP39 mnemonic generator in PYTHON (Read 507 times)

hero member
Activity: 510
Merit: 4005
Thanks for the great explanation.
No problem.

I have carefully revisited the post you've written and I am confident I have realised exactly the bugs you 've mentioned.
Nice.

Now, having corrected the issues, I kindly ask to review them below, so I can then safely update OP.
Looks correct to me...

One trick I've picked up over the years is to do hash-based cumulative output testing. In this case, how that looks is to run your code many times in a loop on a deterministic sequence of random bits and then add each generated mnemonic to a running hash. Here's a test harness that I used to calculate the hash from running your code against a million deterministic inputs:

Code:
#!/usr/bin/env python3

import hashlib

def bip39(sixteen: bytes) -> list[str]:

    # ...

cumulative_hash = hashlib.sha256()

for test_index in range(1000 * 1000):

    mnemonic = bip39(hashlib.sha256(str(test_index).encode()).digest()[:16])

    cumulative_hash.update(''.join(mnemonic).encode())

print(cumulative_hash.hexdigest())

Its output was:

Code:
7827bfa44f0ccc8dce1e490ffe6f3009b6ad30c55b8578ebdd1ee327b36c28eb

If I run my own BIP39 code through that same process, then I get the same result, so I'm pretty confident that if mnemonic-correctness issues do remain, then they remain in both of our implementations. Cheesy
hero member
Activity: 560
Merit: 1060

Sometimes time flies and you forget important things, like responding to a fellow developer.

Bug #1 <~>

Bug #2 <~>

Bug #3 <~>

Thanks for the great explanation. I have carefully revisited the post you 've written and I am confident I have realised exactly the bugs you 've mentioned.

Now, having corrected the issues, I kindly ask to review them below, so I can then safely update OP.

Code:
# contact: [email protected]
from secrets import token_hex
from hashlib import sha256

# read bip39 wordlist from file and import in a list
bip39_wordlist_file = open("bip39_wordlist.txt", "r")
bip39_wordlist = bip39_wordlist_file.read().split('\n')

# entropy
entropy = bin(int(token_hex(16), 16))[2:].zfill(128)
   
print('---------')
print('ENTROPY: ')
print('---------')
print(entropy)

# calculate SHA256
sha256_hex = sha256(bytes.fromhex(hex(int(entropy,2))[2:].zfill(32))).hexdigest()
sha256_bin = bin(int(sha256_hex, 16))[2:].zfill(256)

# calculate checksum
checksum = sha256_bin[0:4]

# final seed to be converted into BIP39 mnemonic
final = entropy + checksum

num_of_words = 12
word_length = len(final) // num_of_words

# calculate mnemonic
res = []
for idx in range(0, len(final), word_length):
    res.append(final[idx : idx + word_length])

mnemonic = []
for idx in range(0, num_of_words):
    binary_place = res[idx]
    decimal_place = int(binary_place,2)
    mnemonic.append(bip39_wordlist[decimal_place])

print('\n-------------')   
print('BIP39 PHRASE: ')
print('-------------')
for w in range(0, len(mnemonic)):
    print(str(w+1) + ': ' + mnemonic[w])
   
member
Activity: 239
Merit: 53
New ideas will be criticized and then admired.
-snip-
Still, it looks as if given an entropy, the mnemonic generated doesn't match the one provided by Ian Coleman on his website...
For the "24 words" (not OP's code), that's because its checksum generator doesn't follow BIP39 standard which should be the first 8 (entropy/32) bits of the entropy's SHA256 hash.
Reference: https://github.com/bitcoin/bips/blob/master/bip-0039.mediawiki#generating-the-mnemonic

Changing the checksum generator to something like this should fix it:
Code:
def checksum(entropy):
    sha256_hex = sha256(bytes.fromhex(hex(int(entropy, 2))[2:].zfill(64))).hexdigest()
    sha256_bin = bin(int(sha256_hex, 16))[2:].zfill(256)
    return sha256_bin[0:8]

For the "12 words" (also not OP's code), aside from the checksum, it's using a 256-bit entropy instead of the standard 128-bit of a 12 word mnemonic.
Just adjust the code above to match the correct sizes:
Code:
def checksum(entropy):
    sha256_hex = sha256(bytes.fromhex(hex(int(entropy, 2))[2:].zfill(32))).hexdigest()
    sha256_bin = bin(int(sha256_hex, 16))[2:].zfill(256)
    return sha256_bin[0:4]
And set the correct entropy size:
Code:
def generate_entropy():
-snip-
return entropy[-128:]

Those are just "band-aid" solution BTW since the whole code is specifically written to work with those BIP39-incompatible code.
You are right, I have updated it to the BIP39 standard.
member
Activity: 315
Merit: 17
Just wondering; can someone explains why this
Code:
# entropy
from secrets import choice
entropy = ""
for _ in range(128):
    entropy += choice(['0','1'])
print (entropy)
is bad...if it is...And why the code given by the others members are better (if they are).

Thanks:)
legendary
Activity: 2618
Merit: 6452
Self-proclaimed Genius
-snip-
Still, it looks as if given an entropy, the mnemonic generated doesn't match the one provided by Ian Coleman on his website...
For the "24 words" (not OP's code), that's because its checksum generator doesn't follow BIP39 standard which should be the first 8 (entropy/32) bits of the entropy's SHA256 hash.
Reference: https://github.com/bitcoin/bips/blob/master/bip-0039.mediawiki#generating-the-mnemonic

Changing the checksum generator to something like this should fix it:
Code:
def checksum(entropy):
    sha256_hex = sha256(bytes.fromhex(hex(int(entropy, 2))[2:].zfill(64))).hexdigest()
    sha256_bin = bin(int(sha256_hex, 16))[2:].zfill(256)
    return sha256_bin[0:8]

For the "12 words" (also not OP's code), aside from the checksum, it's using a 256-bit entropy instead of the standard 128-bit of a 12 word mnemonic.
Just adjust the code above to match the correct sizes:
Code:
def checksum(entropy):
    sha256_hex = sha256(bytes.fromhex(hex(int(entropy, 2))[2:].zfill(32))).hexdigest()
    sha256_bin = bin(int(sha256_hex, 16))[2:].zfill(256)
    return sha256_bin[0:4]
And set the correct entropy size:
Code:
def generate_entropy():
-snip-
return entropy[-128:]

Those are just "band-aid" solution BTW since the whole code is specifically written to work with those BIP39-incompatible code.
member
Activity: 315
Merit: 17


12 words

Code:
from secrets import token_hex, choice
from hashlib import sha256, sha3_256, blake2b
from ecdsa import SECP256k1, SigningKey

def read_wordlist(file_path):
    try:
        with open(file_path, "r") as bip39_wordlist_file:
            return bip39_wordlist_file.read().split('\n')
    except FileNotFoundError:
        print("Error: bip39_wordlist.txt not found.")
        exit(1)

def entropy_words(wordlist):
    random_words = [choice(wordlist) for _ in range(12)]
    phrase = ' '.join(random_words)
    hash_algorithms = [sha256, sha3_256, blake2b]
    chosen_hash = choice(hash_algorithms)
    salt = token_hex(16)
    phrase_hash = chosen_hash((phrase + salt).encode()).hexdigest()
    entropy_bits = bin(int(phrase_hash, 16))[2:].zfill(256)
    return entropy_bits

def random_point():
    sk = SigningKey.generate(curve=SECP256k1)
    vk = sk.verifying_key
    point = vk.to_string().hex()
    hash_algorithms = [sha256, sha3_256, blake2b]
    chosen_hash = choice(hash_algorithms)
    salt = token_hex(16)
    point_hash = chosen_hash((point + salt).encode()).hexdigest()
    entropy_bits = bin(int(point_hash, 16))[2:].zfill(256)
    return entropy_bits

def generate_entropy():
    entropy_sources = [
        bin(int(token_hex(32), 16))[2:],
        entropy_words(bip39_wordlist),
        random_point()
    ]
    entropy = ''.join(entropy_sources)
    while len(entropy) < 256:
        entropy_sources = [
            bin(int(token_hex(32), 16))[2:],
            entropy_words(bip39_wordlist),
            random_point()
        ]
        entropy += ''.join(entropy_sources)
    return entropy[-256:]

def checksum(entropy):
    hash_algorithms = [sha256, sha3_256, blake2b]
    chosen_hash = choice(hash_algorithms)
    salt = token_hex(16)
    sha256_hex = chosen_hash((bytes.fromhex(hex(int(entropy, 2))[2:].zfill(64)) + salt.encode())).hexdigest()
    sha256_bin = bin(int(sha256_hex, 16))[2:].zfill(256)
    return sha256_bin[0:8]

def generate_mnemonic(entropy, checksum, wordlist):
    final = entropy + checksum
    num_of_words = 12
    word_length = len(final) // num_of_words
    res = [final[idx:idx + word_length] for idx in range(0, len(final), word_length)]
   
    mnemonic = [wordlist[int(binary_place, 2) % len(wordlist)] for binary_place in res]
    return mnemonic

def entropy_percentage(entropy):
    num_ones = entropy.count('1')
    return (num_ones / 256) * 100


bip39_wordlist = read_wordlist("bip39_wordlist.txt")
entropy = generate_entropy()
checksum = checksum(entropy)
mnemonic = generate_mnemonic(entropy, checksum, bip39_wordlist)
entropy_percentage = entropy_percentage(entropy)

print('---------')
print('ENTROPY: ')
print('---------')
print(entropy)

print('\n-------------')   
print('BIP39 PHRASE: ')
print('-------------')
for w in range(0, len(mnemonic)):
    print(str(w+1) + ': ' + mnemonic[w])

print('\n-------------------')
print('ENTROPY PERCENTAGE:')
print('-------------------')
print(f'{entropy_percentage:.2f}%')

Still, it looks as if given an entropy, the mnemonic generated doesn't match the one provided by Ian Coleman on his website...
member
Activity: 315
Merit: 17
It’s true, now the three of them combine.
Ok:)
member
Activity: 239
Merit: 53
New ideas will be criticized and then admired.
Interesting code.
You assume you have a better entropy source when you randomly choose one from three random
entropy source. But is it really the case?

Moreover, i don't get:
Code:
   choice(entropy_sources)
if you don't store the result of the function.
Do you mean:

Code:
   entropy_sources=choice(entropy_sources)

Cheers

It’s true, now the three of them combine.
member
Activity: 315
Merit: 17
Interesting code.
You assume you have a better entropy source when you randomly choose one from three random
entropy source. But is it really the case?

Moreover, i don't get:
Code:
   choice(entropy_sources)
if you don't store the result of the function.
Do you mean:

Code:
   entropy_sources=choice(entropy_sources)

Cheers
member
Activity: 239
Merit: 53
New ideas will be criticized and then admired.
Since this is a topic about entropy I made some interesting integrations, I modified the script by adding a random secp256k1 point, random passphrase, random entropy salted methods, using random hashes to avoid predictive patterns and random concatenation of their order.
As long as the entropy percentage is closer to 50%, it means better entropy "in theory".
What do you think?


updated to the BIP39 standard

24 words
Code:
from secrets import token_hex, choice
from hashlib import sha256, sha3_256, blake2b
from ecdsa import SECP256k1, SigningKey

def read_wordlist(file_path):
    try:
        with open(file_path, "r") as bip39_wordlist_file:
            return bip39_wordlist_file.read().split('\n')
    except FileNotFoundError:
        print("Error: bip39_wordlist.txt not found.")
        exit(1)

def entropy_words(wordlist):
    random_words = [choice(wordlist) for _ in range(12)]
    phrase = ' '.join(random_words)
    hash_algorithms = [sha256, sha3_256, blake2b]
    chosen_hash = choice(hash_algorithms)
    salt = token_hex(16)
    phrase_hash = chosen_hash((phrase + salt).encode()).hexdigest()
    entropy_bits = bin(int(phrase_hash, 16))[2:].zfill(256)
    return entropy_bits

def random_point():
    sk = SigningKey.generate(curve=SECP256k1)
    vk = sk.verifying_key
    point = vk.to_string().hex()
    hash_algorithms = [sha256, sha3_256, blake2b]
    chosen_hash = choice(hash_algorithms)
    salt = token_hex(16)
    point_hash = chosen_hash((point + salt).encode()).hexdigest()
    entropy_bits = bin(int(point_hash, 16))[2:].zfill(256)
    return entropy_bits

def generate_entropy():
    entropy_sources = [
        bin(int(token_hex(32), 16))[2:].zfill(256),
        entropy_words(bip39_wordlist),
        random_point()
    ]
    entropy = ''.join(entropy_sources)
    return entropy[:256]

def checksum(entropy):
    sha256_hex = sha256(bytes.fromhex(hex(int(entropy, 2))[2:].zfill(64))).hexdigest()
    sha256_bin = bin(int(sha256_hex, 16))[2:].zfill(256)
    return sha256_bin[:8]

def generate_mnemonic(entropy, checksum, wordlist):
    final = entropy + checksum
    num_of_words = 24
    word_length = len(final) // num_of_words
    res = [final[idx:idx + word_length] for idx in range(0, len(final), word_length)]
    mnemonic = [wordlist[int(binary_place, 2)] for binary_place in res]
    return mnemonic

def entropy_percentage(entropy):
    num_ones = entropy.count('1')
    return (num_ones / 256) * 100

bip39_wordlist = read_wordlist("bip39_wordlist.txt")
entropy = generate_entropy()
checksum = checksum(entropy)
mnemonic = generate_mnemonic(entropy, checksum, bip39_wordlist)
entropy_percentage = entropy_percentage(entropy)

print('---------')
print('ENTROPY: ')
print('---------')
print(entropy)

print('\n-------------')   
print('BIP39 PHRASE: ')
print('-------------')
for w in range(0, len(mnemonic)):
    print(str(w+1) + ': ' + mnemonic[w])

print('\n-------------------')
print('ENTROPY PERCENTAGE:')
print('-------------------')
print(f'{entropy_percentage:.2f}%')

12 words

Code:
from secrets import token_hex, choice
from hashlib import sha256, sha3_256, blake2b
from ecdsa import SECP256k1, SigningKey

def read_wordlist(file_path):
    try:
        with open(file_path, "r") as bip39_wordlist_file:
            return bip39_wordlist_file.read().split('\n')
    except FileNotFoundError:
        print("Error: bip39_wordlist.txt not found.")
        exit(1)

def entropy_words(wordlist):
    random_words = [choice(wordlist) for _ in range(6)] 
    phrase = ' '.join(random_words)
    hash_algorithms = [sha256, sha3_256, blake2b]
    chosen_hash = choice(hash_algorithms)
    salt = token_hex(16)
    phrase_hash = chosen_hash((phrase + salt).encode()).hexdigest()
    entropy_bits = bin(int(phrase_hash, 16))[2:].zfill(128) 
    return entropy_bits

def random_point():
    sk = SigningKey.generate(curve=SECP256k1)
    vk = sk.verifying_key
    point = vk.to_string().hex()
    hash_algorithms = [sha256, sha3_256, blake2b]
    chosen_hash = choice(hash_algorithms)
    salt = token_hex(16)
    point_hash = chosen_hash((point + salt).encode()).hexdigest()
    entropy_bits = bin(int(point_hash, 16))[2:].zfill(128) 
    return entropy_bits

def generate_entropy():
    entropy_sources = [
        bin(int(token_hex(16), 16))[2:], 
        entropy_words(bip39_wordlist),
        random_point()
    ]
    entropy = ''.join(entropy_sources)
    while len(entropy) < 128:
        entropy_sources = [
            bin(int(token_hex(16), 16))[2:],
            entropy_words(bip39_wordlist),
            random_point()
        ]
        entropy += ''.join(entropy_sources)
    return entropy[-128:]

def checksum(entropy):
    sha256_hex = sha256(bytes.fromhex(hex(int(entropy, 2))[2:].zfill(32))).hexdigest()
    sha256_bin = bin(int(sha256_hex, 16))[2:].zfill(256)
    return sha256_bin[:4] 

def generate_mnemonic(entropy, checksum, wordlist):
    final = entropy + checksum
    num_of_words = 12
    word_length = len(final) // num_of_words
    res = [final[idx:idx + word_length] for idx in range(0, len(final), word_length)]
    mnemonic = [wordlist[int(binary_place, 2) % len(wordlist)] for binary_place in res]
    return mnemonic

def entropy_percentage(entropy):
    num_ones = entropy.count('1')
    return (num_ones / 128) * 100

bip39_wordlist = read_wordlist("bip39_wordlist.txt")
entropy = generate_entropy()
checksum = checksum(entropy)
mnemonic = generate_mnemonic(entropy, checksum, bip39_wordlist)
entropy_percentage = entropy_percentage(entropy)

print('---------')
print('ENTROPY: ')
print('---------')
print(entropy)

print('\n-------------')   
print('BIP39 PHRASE: ')
print('-------------')
for w in range(0, len(mnemonic)):
    print(str(w+1) + ': ' + mnemonic[w])

print('\n-------------------')
print('ENTROPY PERCENTAGE:')
print('-------------------')
print(f'{entropy_percentage:.2f}%')
member
Activity: 315
Merit: 17
Hello,

Thanks for the experimentation. Although it may not be a huge improvement,
i think it might be more natural to replace:

Code:
# entropy
entropy = bin(int(token_hex(16), 16))[2:]
while len(entropy) != 128:
    entropy = bin(int(token_hex(16), 16))[2:]

with
Code:
# entropy
entropy = ""
for i in range(128):
    entropy += bin(int(token_hex(1), 16))[-1]

Cheers
hero member
Activity: 510
Merit: 4005
Do you think it would be better if I generated more bits of entropy and then keeping 128 of them? Instead of zfill or repeating the process if bits are less than 128.
Yup, in the context of your script, slicing a length-128 string from a larger string of random 1s and 0s would work (that is, it would correct the bias, because the high-order bit would then have a 50/50 chance of being either 1 or 0).

But, the way that you posed that option as an alternative to zfill makes me think you don't fully understand the source of this bias. I hope you don't find the following explanation tiresome, I just know that if it were me in your shoes, I'd really appreciate someone taking pains to help me understand the potential hole in my thinking:

Let's imagine a toy version of this problem where you're trying to generate just 4 (rather than 128) bits of entropy with a coin. Let's not go down the rabbit hole of coin fairness, entropy extraction, and tossing technique. (I have a sometimes-sophomoric, or, as my wife would say, "bloody stupid" sense of humor, especially when I'm in a good mood, so "tossing technique" just gave me the giggles. Don't toss while flipping coins, yeah?) Cheesy

There are only 16 possible ways for 4 coin-tosses-in-a-row to end up, so let's put them all in a table (ignore the last 4 columns for now):

OutcomePatternBinaryDecimalf1($pcv)f2($pcv)f3($pcv)f4($pcv)
#1HHHH111115"0b1111""1111""1111""HHHH"
#2HHHT111014"0b1110""1110""1110""HHHT"
#3HHTH110113"0b1101""1101""1101""HHTH"
#4HHTT110012"0b1100""1100""1100""HHTT"
#5HTHH101111"0b1011""1011""1011""HTHH"
#6HTHT101010"0b1010""1010""1010""HTHT"
#7HTTH10019"0b1001""1001""1001""HTTH"
#8HTTT10008"0b1000""1000""1000""HTTT"
#9THHH01117"0b111""111""0111""THHH"
#10THHT01106"0b110""110""0110""THHT"
#11THTH01015"0b101""101""0101""THTH"
#12THTT01004"0b100""100""0100""THTT"
#13TTHH00113"0b11""11""0011""TTHH"
#14TTHT00102"0b10""10""0010""TTHT"
#15TTTH00011"0b1""1""0001""TTTH"
#16TTTT00000"0b0""0""0000""TTTT"

The first four columns are: 1. the possibility/outcome # (1 through 16, so, every possible outcome accounted for), 2. the heads-or-tails pattern corresponding to each outcome (H = heads, T = tails), 3. the same heads-or-tails pattern but in base-2 (1 = heads, 0 = tails), and 4. the heads-or-tails pattern converted from base-2 into base-10.

The last four columns involve the imaginary variable $pcv (previous column's value) and the following definitions:

Code:
f1 = lambda x: bin(x)
f2 = lambda x: x[2:]
f3 = lambda x: x.zfill(4)
f4 = lambda x: x.replace('0', 'T').replace('1', 'H')

Now, the first thing to think about when looking at that table is if there are any "bad" or "low entropy" outcomes in it? The answer to that is no: as long as each outcome is equally probable, then any given outcome is just as "entropic" as any other outcome. HTTH is just as good as TTTT, savvy? That means that, tempting though it is, it's a mistake to use the string length of the values in the f2($pcv) column to decide whether or not you "have enough entropy" (to be clear, it's a mistake to base that decision on anything about a single outcome). If you discard outcomes whenever len(f2($pcv)) != 4, then you're only ensuring that the pattern will always begin with a heads instead of a tails (check the table to confirm that).

The second thing to think about when looking at that table is if the final patterns in the f4($pcv) column ever disagree with the source patterns in column 2? They don't. So, you can rest assured that the zfill(128) that was present in your original script was correct (that is, it was only restoring 0s that, in some sense, were there from the start, and were "lost" during an int to str conversion). You can/should pat yourself on the back for getting it right the first time. Wink

There are at least two other bugs in your script though (beyond the bias bug, which we'll call Bug #1):

Bug #2

You need a zfill(32) before passing the hexadecimal representation of your entropy into bytes.fromhex (otherwise, you'll occasionally pass it an odd number of nibbles, which will cause it to raise a ValueError).

Bug #3

You need a zfill(256) at the end of the line that calculates your sha256_bin variable (otherwise, your checksum, and therefore the last word of your mnemonic, will be wrong 50% of the time).

For an example, try manually setting your entropy to cc399b43e82bfbc07f2fe3fd1f13ed41.

Your script will spit out the following mnemonic: slow smoke special space sausage then witness wise wonder weasel win lobster

Ian's script (and mine) will spit out this instead: slow smoke special space sausage then witness wise wonder weasel win lion

The reason your wonder weasel only managed to win a lobster instead of a lion, is because you're miscalculating the checksum bits as 1001 instead of 0001. Smiley

(I'm going to be too busy to respond for a while. I'll take a peek at this thread again in a few weeks' time.)
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
Do you think it would be better if I generated more bits of entropy and then keeping 128 of them? Instead of zfill or repeating the process if bits are less than 128.

If you generate more rounds of entropy then the amount that you end up with will be the average of the entropy of all the rounds. in other words: (E(Loop1) + E(Loop2) + ...) / N. Unless you select the bits from a specific place in the total amount in which case the calculation becomes slightly different but still follows through.

It doesn't make much of a difference, and if the reason you are doing this is to cover for a potential insecurity in the random number generator, it will not work well. OpenSSL's RNG is much better for that purpose.
hero member
Activity: 560
Merit: 1060
Thanks PowerGlove, I find your answer very helpful.
I will check it in detail later today and I will comment, if I find anything important.
As you said, this is the approach that I like to follow. I code in order to understand something better.

Do you think it would be better if I generated more bits of entropy and then keeping 128 of them? Instead of zfill or repeating the process if bits are less than 128.
hero member
Activity: 510
Merit: 4005
Hello, I just updated the OP with this:

Code:
entropy = bin(int(token_hex(16), 16))[2:]
while len(entropy) != 128:
    entropy = bin(int(token_hex(16), 16))[2:]

so now, if the entropy is less than 128, it repeats the process until it is 128 bits long.
You've introduced a small bias by doing that...

This is a pattern that I see pretty often: an attempt to make things "more secure" that ends up backfiring.

Think of it like this: by predicating that loop on the entropy being exactly 128 bits long, you've removed the possibility of your program ever generating a 128-bit value with the high-order bit set to 0. That means, in hexadecimal terms, your entropy will always start with an 8, 9, A, B, C, D, E, or F (0 through 7 are no longer possible). In BIP39 terms, the first word of your mnemonic will always start with one of the letters L through Z (no matter how many times you run your script, you'll never generate a mnemonic that starts with an A-word, for example).

I think that what you were doing before with the zfill was better. (But, if you change it back, you should be aware that that will reveal a different bug that's hiding in your code: bytes.fromhex always expects a string with an even number of nibbles. That is, bytes.fromhex('abcd') will return b'\xab\xcd', but bytes.fromhex('abc') won't return b'\x0a\xbc', as you might expect, instead, it will raise a ValueError. So, there's a ~6% chance that your script will fail on any given invocation. Try it yourself, run the original version of your script again and again in a terminal and you'll encounter that problem eventually and get a traceback that'll point to the offending line.)

Please don't feel discouraged. It's cool that you're learning Bitcoin by programming little bits and pieces of it yourself, and then sharing your code: that's a really effective way to learn in my experience, so keep it up, dude. Wink

I don't know about you, but I really appreciate seeing other people's takes on problems that I've recently solved myself, especially if they involve techniques that I may not be familiar with. So, here's my take on a 12-word BIP39 mnemonic generator in Python:

Code:
#!/usr/bin/env python3

# Don't use this for anything serious; it *probably* works as intended, but I haven't tested it much (I wrote it quickly for apogio).

import base64

import lzma

import secrets

import hashlib

bip39_words: list[str] = lzma.decompress(base64.b85decode('{Wp48S^xk9=GL@E0stWa761SMbT8$j;4?ZDBwYY7n21WR2~SMNj7l(^H;DHJAH%e&gKef~1EXSD0Us}VywB$hQSCL24wlCsKYCV<&UARZHOh)&>5DZ-~ie;w#}??6$%xTl&8@gxbgANqLq@*=qXFu|h=a2+%Xss$^84V29ubWU|Jis}ZeS^SfW}jww7nw9K&Cx%HKy;0Y37#)`+p3zY1bZ`7k+YIMR3u^uY^9ccN{kp=S?u6N4zisDC8gkIJo$^^m6wM^a$8H3bAYEsAIc966#xe{9j&hY0zxEiT${N!QqS^qX$el?jg>&rnPf$K-yrI;)DRZN%5N67_SO#n=yj@!>$4Hz)evAV5=sYEQS$z7tp@GUj@>>S;S=L=!OK{Jtif0|-p%w0w#bxt71|pkbpS}&H*pv#^=Et7p$ITGaVb29^yoJQ!zsLIcy9`T(hCW;CliZjzHlUHk5C~^7_nJ1EUY6*S-;1uIhueo7A2kS$(TLIuv1k&86r%i)z0*-`orY@!pD9im@rths?CuQ90QmSY+!iP@8%nQ%d2Mnv#BVLo#w-6V9c5aG@e2d9O-C!?y>GSCUs5I^54+iV}u#dj~6%nU^oJ!C{P_)~vNh$*wcdG_&J;l!^sRRdSF->(8X?IogT__SWD7&M;TnqE9)DN*23c>sJsQxXi3;5J6DDh(_H0P%C?nxktxwmY48AW19?`wuF+F(0QfgOO}Ojr7j3KfbBaY+=fiqt?}oNK#yaFde;{ARhlk@2*Yh;H++vcQgmvVuv4wKzHFO&Ru(cfJW=QuOvQu>M0i6U6^04JXbcxiR-dL7D_vpPfb6ZAR`2uHBG)agXV6VC2h%>;B+vg*v^Xsw3(RIEe8ls@DQbyP-deH3-Hb)O_0nXteO2F_5u@cpMV^*D4qiuNSK-`U!)6zJN$A6{Hhkj+D@VTMcYtaNfsu>Wn{j~^6xu22NN$^PmEmEl*lr~k&2zu1nq>@!y1*ypM67Y`iGod_AKq_r-*jlOyBOA{KNrg}5Ac0RMtO%k2jizS0XF!`YtPkPQ`ImjU%rS#KF}#sq+r|(L72zb1TC2;dOS~ikeihcHtjY!p{N`Z*Q+Cx&oG&bsXVozbAQj&R&`HmZr4)YnH?nY^m}If@mU2OBIw{C4^z1meuyEu;2#(@;2Osppn@|UV6U2xJvDC)cQ`(jFILY>=0`+;VLBTS|!3}0QE&E(EzgOrO8NIJOzT>^p}ISTqQSGa0npbKH^kf2A+%8xf8FWGBc;!h)Vd2IuWu!v6lNZ{yVZpjQ$0$y<=)J7sK?*>Vv>n?iROPvbL3X*R&VFHz8cX8#e-z!_RyxYIU<(6gQ_4b(OE?b)!x=RJf;~4bi0p=4VlSVImRd!l7a7cE+)`2rKn4pAw_LqwZx&RWLHTaB*I{-F6zLlIF;yxKK4;IvNZ6t|?^jhonkM3>Vwmf(Vk0@EB2?UeVdB;Yb|Kd;W%R3JWCWlnF(sPu^{(m8)`sb8503#A0k-zZPD`M9T*&C+y`J9JD_0->(14W>=Y!~_Y8&AfDjoh)LpK6g5a&}4OZ^ryKhc3JF+hme1Be32X32Sb{U6qNKLDpmXx~u;TgGnn6m@#M;b$CnfIq7~tRFqGO>Nx0>tP_m`QW>a`i!TRAZ%h#-uM7BMoC{dn$iVRKwHz-B)^APT2#W+FMad$_X#-BM+bg%Rf2oYfSf2eeAFo8OyWD>GfI>IduIKFJlGhObtF)718Oq4s#zK*RIUGa7k|k$4HOG>9RiJM;@2yK=NBbEC6*Kd&2VVqPUxacYwN@0mswW469o32{DJf=y*CU9V7)?7loOwYQ9_8JLIotvhv~&Wz%i0Z!05V1|QY@4L4jp;%JOM*dix({v*!!8Z=C&nDTk&RqqACXz9fO2kwe61+}5DG1WkN7Hp0~$f1@=n~e=))pi@!)Tac?masVzC?^p0gZWN%vmU+jb}&3M!$|S+EHr+mGIU0i;)%3RPlu5L-Y@1H-swrrU`hjMq%4nK}+=J_f-N8;rWk>@YoZbR~hDP?=07HhnK7Yha4mRW0EOgl>Q5h&N?zt#)JjrC*Ls77Nt7Cp(!Js+pn%*(NBy}Z)CsN&bjh^gC{rL~LauR(COgtSd>bE4IHnvS$>ovdGkG|VU^PL1*cw*dz6l9aH($Kat5LTU9BKbaVBD}e6ZT~A9kcNB{KYbj~K(7{m$+D85|@#74?ms4(7n$~>n>pTX`Y%sMm%q37$sm}KmutW;E;=^VLusQ?lO6n`0TFjQAc`g)`IHG4G<%c8sRkb*u>;O8etvo0@=>SAhL77BuaNdi|LI0L>I=?nF^ZO%Qae<~ZG5pz#9ZAFRpFG%-Y8{Rmd*f!oYYX*iSsjQjl`TqEOaP?mMm@Y?2^|vWSQqhp#Ro~xw3f(fI6x=i)ocaQIs|Dvg?AX3|G_enP}P7N_?{nab4X|xO;Bvcny;Mgm&n;}#^N?ht)QPa=pK=OJT&Nc@`)s2y%=xX{IUe}??&l&?LPPJ^bTBK*!;mW_~-<`zW8)>h0fy7dX3AT&!LmnKb**OFB-IsN$aD^LPcO{na32FZCS;P!p)sbnR{<1H8&Io*XH>^%{X5H?vtC_<5Yl)4H{;T?eR=xo4zMgGxrMG9p@1UtXLg|+1$C`Eo$g|Lx#e~g(V3q>lQZ}FvGh-tAjMoOTg=^RKYDoUIaU;*WhzRh@}-9nE11k#_w`r+2d7A@2gw`AU7O5Nt!32Lo0+$xOMdW67%8`w@1^w*jNQ!&uSKfge%M28%OO{Ws}_mw5k2)UX#8EE`$J&ery(PFRTPRe+8ME$%+0(8%omCTo5iz3vJ+obP2NWfe{?nUg>q@;gbs0zwWscj2jCr*C32c$lxQ+=4Exp3VPq2xzPUNKpQcy~h@N$a;FQxng%f-{rN>P_yc?!P-}$P_?H9E>1g+ea9y4H{m9vJ169o>9(N|lI$0M}Uv$@25BS_d%2FJ&gv3Sf0%>(OD;tg#mF(VF6GK7(DK*r9%wu?|AusPFcGjp-H&mfb5eGek@LU1s(Jk*`7!ur*UU_J@K4Ie>Su?<+apZL;q7d8Z?MJJ{Ew(x%93*MI1EI#&X%iGGdJ9!(|UhI+i*`+>jZn?h;K-w}>M~lFsBSVmjSI*Py=BlvSgU|RO*TT1Fa$2+@ie+S506rPh`84?h7$4MDe6a2Od4|YI$7k1`A|K_FixJ(d#P~6R>uB8!?^*`G~OIz3q_gyhleNle+a{7n>fPT}9i)}$yGwM~j_|YsLYVAJhKBb?=7uZ*07(_qNd_N5&8E8#c2Wq82=IEJY5s;WXJU|dRpVRJ1*7P~Kwj{V%l9Wc;(#+4XM)!#giyS6FL_7O`DvOWUo6zO(FggIxM-;oSqeP>I^_M`6)E1WsCDM2O~XMf-t1*O+P)yvRgmiUb)kFO@;xD+2G{@qPq#);P$7Wt00F=#yJi3YrO+j%vBYQl0ssI200dcD')).decode('ascii').split('\n')

entropy: int = secrets.randbits(128)

checksum: int = hashlib.sha256(entropy.to_bytes(16, 'big')).digest()[0] >> 4

combined: int = (entropy << 4) | checksum

mnemonic: list[str] = [bip39_words[(combined >> (index * 11)) & 2047] for index in reversed(range(12))]

print(f'Entropy (128 bits, hexadecimal): {hex(entropy)[2:].zfill(32)}')

print(f'Checksum (4 bits, binary): {bin(checksum)[2:].zfill(4)}')

print(f'Mnemonic (12 words, english): {" ".join(mnemonic)}')

And here's the same script but generalized to produce either 12, 15, 18, 21, or 24-word mnemonics:

Code:
#!/usr/bin/env python3

# Don't use this for anything serious; it *probably* works as intended, but I haven't tested it much (I wrote it quickly for apogio).

import base64

import lzma

import secrets

import hashlib

bip39_words: list[str] = lzma.decompress(base64.b85decode('{Wp48S^xk9=GL@E0stWa761SMbT8$j;4?ZDBwYY7n21WR2~SMNj7l(^H;DHJAH%e&gKef~1EXSD0Us}VywB$hQSCL24wlCsKYCV<&UARZHOh)&>5DZ-~ie;w#}??6$%xTl&8@gxbgANqLq@*=qXFu|h=a2+%Xss$^84V29ubWU|Jis}ZeS^SfW}jww7nw9K&Cx%HKy;0Y37#)`+p3zY1bZ`7k+YIMR3u^uY^9ccN{kp=S?u6N4zisDC8gkIJo$^^m6wM^a$8H3bAYEsAIc966#xe{9j&hY0zxEiT${N!QqS^qX$el?jg>&rnPf$K-yrI;)DRZN%5N67_SO#n=yj@!>$4Hz)evAV5=sYEQS$z7tp@GUj@>>S;S=L=!OK{Jtif0|-p%w0w#bxt71|pkbpS}&H*pv#^=Et7p$ITGaVb29^yoJQ!zsLIcy9`T(hCW;CliZjzHlUHk5C~^7_nJ1EUY6*S-;1uIhueo7A2kS$(TLIuv1k&86r%i)z0*-`orY@!pD9im@rths?CuQ90QmSY+!iP@8%nQ%d2Mnv#BVLo#w-6V9c5aG@e2d9O-C!?y>GSCUs5I^54+iV}u#dj~6%nU^oJ!C{P_)~vNh$*wcdG_&J;l!^sRRdSF->(8X?IogT__SWD7&M;TnqE9)DN*23c>sJsQxXi3;5J6DDh(_H0P%C?nxktxwmY48AW19?`wuF+F(0QfgOO}Ojr7j3KfbBaY+=fiqt?}oNK#yaFde;{ARhlk@2*Yh;H++vcQgmvVuv4wKzHFO&Ru(cfJW=QuOvQu>M0i6U6^04JXbcxiR-dL7D_vpPfb6ZAR`2uHBG)agXV6VC2h%>;B+vg*v^Xsw3(RIEe8ls@DQbyP-deH3-Hb)O_0nXteO2F_5u@cpMV^*D4qiuNSK-`U!)6zJN$A6{Hhkj+D@VTMcYtaNfsu>Wn{j~^6xu22NN$^PmEmEl*lr~k&2zu1nq>@!y1*ypM67Y`iGod_AKq_r-*jlOyBOA{KNrg}5Ac0RMtO%k2jizS0XF!`YtPkPQ`ImjU%rS#KF}#sq+r|(L72zb1TC2;dOS~ikeihcHtjY!p{N`Z*Q+Cx&oG&bsXVozbAQj&R&`HmZr4)YnH?nY^m}If@mU2OBIw{C4^z1meuyEu;2#(@;2Osppn@|UV6U2xJvDC)cQ`(jFILY>=0`+;VLBTS|!3}0QE&E(EzgOrO8NIJOzT>^p}ISTqQSGa0npbKH^kf2A+%8xf8FWGBc;!h)Vd2IuWu!v6lNZ{yVZpjQ$0$y<=)J7sK?*>Vv>n?iROPvbL3X*R&VFHz8cX8#e-z!_RyxYIU<(6gQ_4b(OE?b)!x=RJf;~4bi0p=4VlSVImRd!l7a7cE+)`2rKn4pAw_LqwZx&RWLHTaB*I{-F6zLlIF;yxKK4;IvNZ6t|?^jhonkM3>Vwmf(Vk0@EB2?UeVdB;Yb|Kd;W%R3JWCWlnF(sPu^{(m8)`sb8503#A0k-zZPD`M9T*&C+y`J9JD_0->(14W>=Y!~_Y8&AfDjoh)LpK6g5a&}4OZ^ryKhc3JF+hme1Be32X32Sb{U6qNKLDpmXx~u;TgGnn6m@#M;b$CnfIq7~tRFqGO>Nx0>tP_m`QW>a`i!TRAZ%h#-uM7BMoC{dn$iVRKwHz-B)^APT2#W+FMad$_X#-BM+bg%Rf2oYfSf2eeAFo8OyWD>GfI>IduIKFJlGhObtF)718Oq4s#zK*RIUGa7k|k$4HOG>9RiJM;@2yK=NBbEC6*Kd&2VVqPUxacYwN@0mswW469o32{DJf=y*CU9V7)?7loOwYQ9_8JLIotvhv~&Wz%i0Z!05V1|QY@4L4jp;%JOM*dix({v*!!8Z=C&nDTk&RqqACXz9fO2kwe61+}5DG1WkN7Hp0~$f1@=n~e=))pi@!)Tac?masVzC?^p0gZWN%vmU+jb}&3M!$|S+EHr+mGIU0i;)%3RPlu5L-Y@1H-swrrU`hjMq%4nK}+=J_f-N8;rWk>@YoZbR~hDP?=07HhnK7Yha4mRW0EOgl>Q5h&N?zt#)JjrC*Ls77Nt7Cp(!Js+pn%*(NBy}Z)CsN&bjh^gC{rL~LauR(COgtSd>bE4IHnvS$>ovdGkG|VU^PL1*cw*dz6l9aH($Kat5LTU9BKbaVBD}e6ZT~A9kcNB{KYbj~K(7{m$+D85|@#74?ms4(7n$~>n>pTX`Y%sMm%q37$sm}KmutW;E;=^VLusQ?lO6n`0TFjQAc`g)`IHG4G<%c8sRkb*u>;O8etvo0@=>SAhL77BuaNdi|LI0L>I=?nF^ZO%Qae<~ZG5pz#9ZAFRpFG%-Y8{Rmd*f!oYYX*iSsjQjl`TqEOaP?mMm@Y?2^|vWSQqhp#Ro~xw3f(fI6x=i)ocaQIs|Dvg?AX3|G_enP}P7N_?{nab4X|xO;Bvcny;Mgm&n;}#^N?ht)QPa=pK=OJT&Nc@`)s2y%=xX{IUe}??&l&?LPPJ^bTBK*!;mW_~-<`zW8)>h0fy7dX3AT&!LmnKb**OFB-IsN$aD^LPcO{na32FZCS;P!p)sbnR{<1H8&Io*XH>^%{X5H?vtC_<5Yl)4H{;T?eR=xo4zMgGxrMG9p@1UtXLg|+1$C`Eo$g|Lx#e~g(V3q>lQZ}FvGh-tAjMoOTg=^RKYDoUIaU;*WhzRh@}-9nE11k#_w`r+2d7A@2gw`AU7O5Nt!32Lo0+$xOMdW67%8`w@1^w*jNQ!&uSKfge%M28%OO{Ws}_mw5k2)UX#8EE`$J&ery(PFRTPRe+8ME$%+0(8%omCTo5iz3vJ+obP2NWfe{?nUg>q@;gbs0zwWscj2jCr*C32c$lxQ+=4Exp3VPq2xzPUNKpQcy~h@N$a;FQxng%f-{rN>P_yc?!P-}$P_?H9E>1g+ea9y4H{m9vJ169o>9(N|lI$0M}Uv$@25BS_d%2FJ&gv3Sf0%>(OD;tg#mF(VF6GK7(DK*r9%wu?|AusPFcGjp-H&mfb5eGek@LU1s(Jk*`7!ur*UU_J@K4Ie>Su?<+apZL;q7d8Z?MJJ{Ew(x%93*MI1EI#&X%iGGdJ9!(|UhI+i*`+>jZn?h;K-w}>M~lFsBSVmjSI*Py=BlvSgU|RO*TT1Fa$2+@ie+S506rPh`84?h7$4MDe6a2Od4|YI$7k1`A|K_FixJ(d#P~6R>uB8!?^*`G~OIz3q_gyhleNle+a{7n>fPT}9i)}$yGwM~j_|YsLYVAJhKBb?=7uZ*07(_qNd_N5&8E8#c2Wq82=IEJY5s;WXJU|dRpVRJ1*7P~Kwj{V%l9Wc;(#+4XM)!#giyS6FL_7O`DvOWUo6zO(FggIxM-;oSqeP>I^_M`6)E1WsCDM2O~XMf-t1*O+P)yvRgmiUb)kFO@;xD+2G{@qPq#);P$7Wt00F=#yJi3YrO+j%vBYQl0ssI200dcD')).decode('ascii').split('\n')

mnemonic_length: int = 12

assert mnemonic_length in {12, 15, 18, 21, 24}

mnemonic_factor: int = mnemonic_length // 3

entropy: int = secrets.randbits(mnemonic_factor * 32)

checksum: int = hashlib.sha256(entropy.to_bytes(mnemonic_factor * 4, 'big')).digest()[0] >> (8 - mnemonic_factor)

combined: int = (entropy << mnemonic_factor) | checksum

mnemonic: list[str] = [bip39_words[(combined >> (index * 11)) & 2047] for index in reversed(range(mnemonic_length))]

print(f'Entropy ({mnemonic_factor * 32} bits, hexadecimal): {hex(entropy)[2:].zfill(mnemonic_factor * 8)}')

print(f'Checksum ({mnemonic_factor} bits, binary): {bin(checksum)[2:].zfill(mnemonic_factor)}')

print(f'Mnemonic ({mnemonic_length} words, english): {" ".join(mnemonic)}')
hero member
Activity: 560
Merit: 1060
It would be a small improvement to remove zfill, since it doesn't do much other than padding with zeros for which it isn't needed when there is a SHA256 hashing after. That probably should be removed eitherways, if the entropy is lower than required, then they should be able to run the entropy generation again. In a similar vein, a sanity check on the entropy before the SHA256 hashing would improve the security as well.

Hello, I just updated the OP with this:

Code:
entropy = bin(int(token_hex(16), 16))[2:]
while len(entropy) != 128:
    entropy = bin(int(token_hex(16), 16))[2:]

so now, if the entropy is less than 128, it repeats the process until it is 128 bits long.

Thanks for the feedback.
legendary
Activity: 3038
Merit: 4418
Crypto Swap Exchange
It would be a small improvement to remove zfill, since it doesn't do much other than padding with zeros for which it isn't needed when there is a SHA256 hashing after. That probably should be removed eitherways, if the entropy is lower than required, then they should be able to run the entropy generation again. In a similar vein, a sanity check on the entropy before the SHA256 hashing would improve the security as well.

Edit: I stand corrected on the point about regeneration of entropy; should not be strictly enforced at 128.
hero member
Activity: 560
Merit: 1060
As the title suggests, I developed an easy to use script that generates a BIP39 mnemonic.

I implemented it for fun. I don't plan to use it for real money.

The script:
Code:
# contact: [email protected]
from secrets import token_hex
from hashlib import sha256

# read bip39 wordlist from file and import in a list
bip39_wordlist_file = open("bip39_wordlist.txt", "r")
bip39_wordlist = bip39_wordlist_file.read().split('\n')

# entropy
entropy = bin(int(token_hex(16), 16))[2:].zfill(128)
   
print('---------')
print('ENTROPY: ')
print('---------')
print(entropy)

# calculate SHA256
sha256_hex = sha256(bytes.fromhex(hex(int(entropy,2))[2:].zfill(32))).hexdigest()
sha256_bin = bin(int(sha256_hex, 16))[2:].zfill(256)

# calculate checksum
checksum = sha256_bin[0:4]

# final seed to be converted into BIP39 mnemonic
final = entropy + checksum

num_of_words = 12
word_length = len(final) // num_of_words

# calculate mnemonic
res = []
for idx in range(0, len(final), word_length):
    res.append(final[idx : idx + word_length])

mnemonic = []
for idx in range(0, num_of_words):
    binary_place = res[idx]
    decimal_place = int(binary_place,2)
    mnemonic.append(bip39_wordlist[decimal_place])

print('\n-------------')   
print('BIP39 PHRASE: ')
print('-------------')
for w in range(0, len(mnemonic)):
    print(str(w+1) + ': ' + mnemonic[w])
   

How to run:
1. Create a file on your machine (example mnemonic_gen.py).
2. Copy - paste the code from above.
3. Create a file on your machine, called bip39_wordlist.txt and copy-paste the wordlist into the file.
4. Make sure to have both files in the same directory.
5. Just run python mnemonic_gen.py

Sample output:
Code:
---------
ENTROPY:
---------
11101110101000001011111101111000111100001001001100010000100001110011110100010011010100011000011001100100011111100100010111011100

-------------
BIP39 PHRASE:
-------------
1: upgrade
2: album
3: taste
4: thrive
5: country
6: drum
7: violin
8: health
9: major
10: catalog
11: multiply
12: ride

Extra notes:
1. The script uses secrets module to generate entropy. It is essentially a CSPRNG and is the recommended approach to generate pseudo-random numbers in Python. Internally, it makes use of os.urandom as well.
2. For best security, use it offline, by just running the script on an airgapped device.
3. This is not a complete wallet. You must import the seed phrase on an offline wallet that you like, in order to convert the BIP39 phrase into a seed and produce the corresponding xpriv and xpub.
4. This method is only recommended if you don't trust the entropy source of your device and you want to use CSPRNG on an airgapped computer though python libraries.
5. It is similar to Ian Coleman's BIP39 implementation, in a sense that they both must be executed offline. The difference lies in the libraries that are used, as Ian's implementation uses javascript, whereas the script above uses python libraries.
Jump to: