Author

Topic: Introducing a version field to BIP39 Mnemonic Phrases. (Read 227 times)

legendary
Activity: 2870
Merit: 7490
Crypto Swap Exchange
1. I see. It looks there's major trade-off between preserving backward compatibility (not many things could be added) or breaking backward compatibility (less adaption/usage).
I don't think so. OP's proposal as it stands is 100% backwards compatible. If I generate a new 15 word seed phrase with the 32 bit versioning system, old software will still see the entire seed phrase as a valid BIP39 seed phrase and will recover the exact same wallets (provided of course I know my script type/derivation path, since old wallets won't be able to interpret the 32 extra bits). Any legacy seed phrases will still be recoverable by new software, provided there is a simple check box to indicate it is a legacy seed phrase so the software does not try to interpret the first 32 bits of entropy as versioning data.

Yes, i'm aware OP's proposal is backward compatible. What i actually wanted to say are,
1. With backward compatibility, that means there's not much room for more improvement/change. In this case, some people may think "Why bother accepting/using this proposal?".
2. By breaking backward compatibility, it could hamper adaption or usage.

Perhaps you could even stipulate that the new seed phrases must be either 15, 21, or 27 words long, and so any seed phrase which is 12, 18, or 24 words is immediately known to be a legacy seed phrase.

As reminder, BIP 39 allows entropy from 128 - 256 bits with multiple of 32 bits. It means you can generate 12, 15, 18, 21 or 24 words of BIP 39 mnemonic.
newbie
Activity: 7
Merit: 29
Perhaps you could even stipulate that the new seed phrases must be either 15, 21, or 27 words long, and so any seed phrase which is 12, 18, or 24 words is immediately known to be a legacy seed phrase.

Regarding false positive rates, an 8-bit version field would result in an nVersion/256 false positive rate.
Assuming support for three versions, this would yield a false positive rate of less than 1.2%.
If this isn't enough, one option is to use the 24-bit general purpose field to mitigate this.
This has already been addressed by Lukechilds here : https://github.com/lukechilds/bip39-versioned?tab=readme-ov-file#false-positives
legendary
Activity: 2268
Merit: 18775
Since I am not so experienced, could you number 2-3 possible shortcomings of not having a versioning system, apart from the one mentioned in the quoted text above?
As we develop more and more script types, each with their own default derivation path, then not having a versioning system is becoming more and more problematic. It's why wallets such as Electrum have implemented a feature to scan all the commonly used script/path pairs for BIP39 seed phrases, because many users do not know where exactly their coins are stored and many wallets do their own thing such as derive P2WPKH addresses at m/44'/0'/0', use m/0' instead, and so on.

1. I see. It looks there's major trade-off between preserving backward compatibility (not many things could be added) or breaking backward compatibility (less adaption/usage).
I don't think so. OP's proposal as it stands is 100% backwards compatible. If I generate a new 15 word seed phrase with the 32 bit versioning system, old software will still see the entire seed phrase as a valid BIP39 seed phrase and will recover the exact same wallets (provided of course I know my script type/derivation path, since old wallets won't be able to interpret the 32 extra bits). Any legacy seed phrases will still be recoverable by new software, provided there is a simple check box to indicate it is a legacy seed phrase so the software does not try to interpret the first 32 bits of entropy as versioning data.

Perhaps you could even stipulate that the new seed phrases must be either 15, 21, or 27 words long, and so any seed phrase which is 12, 18, or 24 words is immediately known to be a legacy seed phrase.
legendary
Activity: 3472
Merit: 10611
Strength of a KDF is only a concern when the input is weak like wanting to store your user's login details where for example their password could be 6 digits.
However, in BIP39 context our input is a strong entropy that provides a security at least equal to the security of a ECC key (ie. 128 bits) which means it no longer matter if the KDF is a weak one anymore, it is not there to provide any kind of security. It is there to give us the ability to derive a different BIP32 seed if we wanted to which is not even something BIP39 is popular for!
legendary
Activity: 2870
Merit: 7490
Crypto Swap Exchange
Few thoughts,
1. Long time ago i read Bitcoin Core doesn't implement BIP39 due to security issue[1], which isn't solved by your proposal.
2. If i understood your proposal correctly, does that mean 12 word mnemonic have less than 128-bit entropy due to version field addition?

[1] https://bitcoin.stackexchange.com/a/88244


[1] If the security issue you are referring to involves the use of PBKDF2, this could be addressed by implementing versioning, although it would break compatibility with non-versioned BIP39 software.

[2] Correct, a 128-bit entropy results in a 15-word mnemonic when considering the 24-bit general-purpose field is used.
For a 12-word mnemonic phrase under the versioned BIP39, you can attain a maximum of 120 bits of entropy, whereas the non-versioned BIP39 allows for 128 bits of entropy with a 12-word mnemonic.

1. I see. It looks there's major trade-off between preserving backward compatibility (not many things could be added) or breaking backward compatibility (less adaption/usage).

2. 120-bit should be fine since IIRC 112-bit security is still acceptable today.
newbie
Activity: 7
Merit: 29
Since I am not so experienced, could you number 2-3 possible shortcomings of not having a versioning system, apart from the one mentioned in the quoted text above?
From what I know these are the main criticism:
Quote from: aezeed
The lack a version means that wallets may not necessarily know how to re-derive addresses during the recovery process. A lack of a birthday means that wallets don’t know how far back to look in the chain to ensure that they derive all the proper user addresses. Additionally, BIP39 use a very weak KDF.
(https://pkg.go.dev/github.com/lightningnetwork/lnd/aezeed#section-readme)

Given the limitations of non-versioned BIP39 mnemonic phrases, it's equally important to consider the benefits that versioning could bring.
It's essential to design systems with forward compatibility in mind to ensure their long-term relevance and adaptability.



hero member
Activity: 560
Merit: 1060


I read the mail on the mailing list but there is one thing I haven't figured out. The idea of versioning, why do you think it is very important?

I saw the concerns here:

Quote
BIP39 seed phrases do not include a version number. This means that software should always know how to generate keys and addresses. BIP43 suggests that wallet software will try various existing derivation schemes within the BIP32 framework. This is extremely inefficient and rests on the assumption that future wallets will support all previously accepted derivation methods. If, in the future, a wallet developer decides not to implement a particular derivation method because it is deprecated, then the software will not be able to detect that the corresponding seed phrases are not supported, and it will return an empty wallet instead. This threatens users funds.

And I also saw gmaxwell commenting in 2017 that because of the lack of versioning he voted against using the BIP39.

Since I am not so experienced, could you number 2-3 possible shortcomings of not having a versioning system, apart from the one mentioned in the quoted text above?
newbie
Activity: 7
Merit: 29
Thanks for your feedback !

I am not sure whether the VF will be appended or prepended to the entropy. Because you said "prepending" but the quote below looks like the input to the hash function will be ENT + VF, not VF + ENT.
The VF is indeed prepended to the entropy, I agree that  ENT + VF might be confusing. Read it as (ENT + 32)
I was referring to the number of bits of the checksum, basically it's the same way to compute the checksum as per BIP39.

The VF is split in 2 parts where:
Code:
part A = arbitrary bits (24 bits long)
part B = the version 00001101 (8 bits long).
So in part A, we can include any data we want? Because it looks like this adds 24 bits of "protection" against malicious activities. Isn't it? Supposing the 8 bits have a standard set of values, the total "protection" of the seed phrase will be:

128 bits of entropy + 24 arbitrary bits + 8 version bits (the latter will be somewhat standard)
Yes the 24-bit general purpose field can include any data which will be interpreted by the software depending on the version number.

The combined entropy of the seed phrase will look like:
24 general purpose bits + 8 version bits + entropy bits + checksum


Using a passphrase on top of that, will it work symmetrically to the existing BIP39 pattern?
Yes, it'll work exactly the same as long as the KDF is the same.
Passphrases would work with other KDFs too as long as they support salting.
hero member
Activity: 560
Merit: 1060
First of all, thanks for your effort.

I have a few questions as well:

Question 1:
You suggest that VF is a 32-bit field that is concatenated with the initial entropy. I assume something like this:
Code:
ENT = 0010101...110 (128 bits long)
VF = 00001...101 (32 bits long).
I am not sure whether the VF will be appended or prepended to the entropy. Because you said "prepending" but the quote below looks like the input to the hash function will be ENT + VF, not VF + ENT.
Quote
A checksum is generated following the BIP39 method: taking the first (ENT + VF ) / 32 bits of the SHA256 hash of the combined entropy (initial entropy plus the 32-bit version field). This checksum is then appended to the combined entropy.

Question 2:
The VF is split in 2 parts where:
Code:
part A = arbitrary bits (24 bits long)
part B = the version 00001101 (8 bits long).

So in part A, we can include any data we want? Because it looks like this adds 24 bits of "protection" against malicious activities. Isn't it? Supposing the 8 bits have a standard set of values, the total "protection" of the seed phrase will be:

128 bits of entropy + 24 arbitrary bits + 8 version bits (the latter will be somewhat standard)

Question 3:
Using a passphrase on top of that, will it work symmetrically to the existing BIP39 pattern?
newbie
Activity: 7
Merit: 29
Thanks for your feedbacks!

Interesting proposal! A few questions:

How would you propose assigning your 8 bit field version field? Something like this?

00000000 - P2PKH at m/44'/0'/0'
00000001 - P2SH-P2WPKH at m/49'/0'/0'
00000010 - P2WPKH at m/84'/0'/0'
00000011 - P2TR at m/86'/0'/0'

What if I want to use a script type/derivation path combo which isn't assigned a version number? What happens then? And what if I want to use the same seed phrase to generate both a P2PKH wallet and a P2WPKH wallet, for example?

I deliberately refrained from drafting specifications for the versions, as I believe that falls outside the scope of this proposal.
However, if I were to design a version dedicated to specifying derivation paths, I would consider utilizing the 24-bit purpose field.

Personally, I envision two methods to achieve this:
  • Employ the entire 24-bit field to define a custom derivation path. This approach, however, would limit the mnemonic phrase to a single derivation path.
  • Alternatively, use 16 bits to designate standard derivation paths, allocate the subsequent 8 bits for subversions (allowing for future expansion of these standard paths).

For example:
first bit     - m/44'/0'/0'
second bit - m/49'/0'/0'
third bit    - m/84'/0'/0'
fourth bit  - m/86'/0'/0'

where:
0001 0000 0000 0000 - m/86'/0'/0' only
1001 0000 0000 0000 - m/44'/0'/0' & m/86'/0'/0'
1111 0000 0000 0000 - all of the above derivation path


I believe that, ultimately, a compromise is necessary between the flexibility of setting a custom derivation path and the capability to utilize multiple derivation paths simultaneously.


Taking 128 bits of entropy generating a 15 word seed phrase using your new system, I assume you are feeding the full 15 words in to PBKDF2? Or are you stripping out the 128 bits of entropy and converting to a "legacy" 12 word seed phrase before generating your wallet?

Yes, to maintain compatibility with non-versioned BIP39 wallets, the complete set of 15 words must be entered into PBKDF2.
Of course, this could be changed in future versions, albeit at the cost of breaking compatibility.

I disagree with your suggestion above to indicate whether a passphrase has been used in the general purpose field. One of the main uses of a passphrase is to add plausible deniability, which is eliminated if you indicate in your seed phrase that you have used a passphrase.

I agree that it'd be a bad idea.
legendary
Activity: 2268
Merit: 18775
Interesting proposal! A few questions:

How would you propose assigning your 8 bit field version field? Something like this?

00000000 - P2PKH at m/44'/0'/0'
00000001 - P2SH-P2WPKH at m/49'/0'/0'
00000010 - P2WPKH at m/84'/0'/0'
00000011 - P2TR at m/86'/0'/0'

What if I want to use a script type/derivation path combo which isn't assigned a version number? What happens then? And what if I want to use the same seed phrase to generate both a P2PKH wallet and a P2WPKH wallet, for example?

Taking 128 bits of entropy generating a 15 word seed phrase using your new system, I assume you are feeding the full 15 words in to PBKDF2? Or are you stripping out the 128 bits of entropy and converting to a "legacy" 12 word seed phrase before generating your wallet?

I disagree with your suggestion above to indicate whether a passphrase has been used in the general purpose field. One of the main uses of a passphrase is to add plausible deniability, which is eliminated if you indicate in your seed phrase that you have used a passphrase.
newbie
Activity: 7
Merit: 29
Few thoughts,
1. Long time ago i read Bitcoin Core doesn't implement BIP39 due to security issue[1], which isn't solved by your proposal.
2. If i understood your proposal correctly, does that mean 12 word mnemonic have less than 128-bit entropy due to version field addition?

[1] https://bitcoin.stackexchange.com/a/88244


[1] If the security issue you are referring to involves the use of PBKDF2, this could be addressed by implementing versioning, although it would break compatibility with non-versioned BIP39 software.

[2] Correct, a 128-bit entropy results in a 15-word mnemonic when considering the 24-bit general-purpose field is used.
For a 12-word mnemonic phrase under the versioned BIP39, you can attain a maximum of 120 bits of entropy, whereas the non-versioned BIP39 allows for 128 bits of entropy with a 12-word mnemonic.
legendary
Activity: 2870
Merit: 7490
Crypto Swap Exchange
Few thoughts,
1. Long time ago i read Bitcoin Core doesn't implement BIP39 due to security issue[1], which isn't solved by your proposal.
2. If i understood your proposal correctly, does that mean 12 word mnemonic have less than 128-bit entropy due to version field addition?

[1] https://bitcoin.stackexchange.com/a/88244
newbie
Activity: 7
Merit: 29
I think it is best to address all shortcomings of BIP39 when proposing a new algorithm. Version only addresses one of them. I tried to address some more in my rough idea a couple of years ago: https://bitcointalksearch.org/topic/experimental-2-better-mnemonic-5330229
Hello, and thank you for your interest!

The primary criticisms of BIP39 that I have encountered include:
  • The absence of versioning.
  • The reliance on a fixed wordlist for checksum verification.

Implementing versioning and a general purpose field could offer several advantages, such as:
  • Specifying the derivation path.
  • Including the wallet's birthdate.
  • Modifying the Key Derivation Function (KDF).
  • Enhancing error detection and correction capabilities.
  • Indicating whether a passphrase is used.
  • Facilitating improved methods of seed xoring.
  • And more...
Regrettably, I don't foresee a method to maintain compatibility with legacy software (non-versioned BIP39) while simultaneously eliminating the requirement for a fixed wordlist.
The objective here is not to introduce a new algorithm, but rather to incorporate versioning while remaining consistent with the BIP39 standard.
legendary
Activity: 1042
Merit: 2805
Bitcoin and C♯ Enthusiast
I think it is best to address all shortcomings of BIP39 when proposing a new algorithm. Version only addresses one of them. I tried to address some more in my rough idea a couple of years ago: https://bitcointalksearch.org/topic/experimental-2-better-mnemonic-5330229
newbie
Activity: 7
Merit: 29
I'm currently involved in a proposal to add a version field to BIP39 mnemonic phrases, an idea that isn't new but one I believe is worth exploring further.
I would really appreciate your feedback on this.
The details of the proposal are outlined in the mailing list, and I'm seeking insights and thoughts from the community on this concept.

For more information, please visit the mailing list at https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2024-January/022275.html

Looking forward to hearing your opinions!
Jump to: