Author

Topic: Randomly picking 24 words from the BIP39 wordlist (Read 862 times)

legendary
Activity: 2268
Merit: 18775
How can you quote a statistic on something that is not considered a valid error pattern?
The statistics I quoted refer to any error involving the substitution of 8 characters. This is an entirely valid error pattern when the substitution of those characters is random. Substituting 8 consecutive characters with "aardvark" is not a valid error pattern because of the reasons you have quoted above, but it would still have the same 0.931 per billion chance of going undetected.

Emphasis mine:
This means that when 5 changed characters occur randomly distributed in the 39 characters of a P2WPKH address, there is a chance of 0.756 per billion that it will go undetected. When those 5 changes occur randomly within a 19-character window, that chance goes down to 0.093 per billion. As the number of errors goes up, the chance converges towards 1 in 230 = 0.931 per billion.
sr. member
Activity: 1190
Merit: 469
we are now replacing 8 consecutive characters with 8 random ones.  so it is not something that we know statistics on bech32 error detection about.
We do have statistics on that. They are summarized at the end of BIP173: https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki#checksum-design

Before you can compute statistics on error detection you have to first define what you mean by an error pattern. Why? Because error detection is done on error patterns. If you don't define an error pattern then you can't have idea of what an error is. Maybe you're right but i'm not sure.

We define an error pattern as a sequence of first one or more deletions, then swaps of adjacent characters, followed by substitutions, insertions, and duplications, in that order, all in specific positions, applied to a string with valid checksum that is otherwise randomly chosen. For insertions and substitutions we assume a uniformly random new character. For example, "delete the 17th character, swap the 11th character with the 12th character, and insert a random character in the 24th position" is an error pattern. "Replace the 43rd through 48th character with 'aardvark'" is not a valid error pattern, because the new characters are not random and there is no reason why this particular string is more likely than any other to be substituted.


Bech32 has a probability of 0 to incorrectly accept error patterns consisting of up to 4 substitutions—they are always detected.


Quote
For both address lengths, and considering 8 characters being substituted, then the chance of this going undetected by the checksum converges on 0.931 per billion.
How can you quote a statistic on something that is not considered a valid error pattern?


legendary
Activity: 2268
Merit: 18775
we are now replacing 8 consecutive characters with 8 random ones.  so it is not something that we know statistics on bech32 error detection about.
We do have statistics on that. They are summarized at the end of BIP173: https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki#checksum-design

The lines we are interested in are length 39 and 59, which correspond to a 42 character P2WPKH address and a 62 character P2WSH address (3 additional characters for the non-data part "bc1"). For both address lengths, and considering 8 characters being substituted, then the chance of this going undetected by the checksum converges on 0.931 per billion.
sr. member
Activity: 1190
Merit: 469
Note that you are quoting from BIP350 which defines the Bech32m variant, which is used for version 1 segwit addresses (taproot, bc1p) and future versions of segwit addresses. These addresses can be between 14 and 74 characters long, so you can indeed have additions and deletions without invalidating the address.
yes, i suppose i am. it further goes on to say this too:

Replace the 43rd through 48th character with 'aardvark'" is not a valid error pattern, because the new characters are not random and there is no reason why this particular string is more likely than any other to be substituted.

they are replacing 6 consecutive characters with 8 random ones.  so it is not something that we know statistics on error detection about.

Quote
Version 0 segwit addresses (native segwit, bc1q) as were being discussed above have fixed lengths, so you can only have a deletion if you also have an addition of the same length, and vice versa.

fine. let's revise their statement just a bit to read as follows:

Replace the 43rd through 50th character with 'aardvark'" is not a valid error pattern, because the new characters are not random and there is no reason why this particular string is more likely than any other to be substituted.

we are now replacing 8 consecutive characters with 8 random ones.  so it is not something that we know statistics on bech32 error detection about.

string length didn't change.  Shocked
legendary
Activity: 2268
Merit: 18775
Note that you are quoting from BIP350 which defines the Bech32m variant, which is used for version 1 segwit addresses (taproot, bc1p) and future versions of segwit addresses. These addresses can be between 14 and 74 characters long, so you can indeed have additions and deletions without invalidating the address.

Version 0 segwit addresses (native segwit, bc1q) as were being discussed above have fixed lengths, so you can only have a deletion if you also have an addition of the same length, and vice versa.
sr. member
Activity: 1190
Merit: 469

What other errors are there? Bech32 addresses are of a fixed length (42 characters for P2WPKH, 62 characters for P2WSH or P2TR), so any error which adds or deletes a few characters will immediately result in an invalid address. The checksum will detect any error which affects up to 4 characters.

well i mean maybe this is splitting hairs but you're not defining what your universe of errors consists of. i think they have a definition of what an "error pattern" is:

We define an error pattern as a sequence of first one or more deletions, then swaps of adjacent characters, followed by substitutions, insertions, and duplications, in that order, all in specific positions, applied to a string with valid checksum that is otherwise randomly chosen.

so any error that does not fall into that particular definition is the answer to your question maybe.


Quote
The reason that Bech32 excludes the characters "1", "b", "i", and "o" is explained in BIP173:
The character set is chosen to minimize ambiguity according to this visual similarity data, and the ordering is chosen to minimize the number of pairs of similar characters (according to the same data) that differ in more than 1 bit. As the checksum is chosen to maximize detection capabilities for low numbers of bit errors, this choice improves its performance under some error models.
so it's basically what i said. for visual reasons.
legendary
Activity: 2268
Merit: 18775
it's only guaranteed to detect up to 4 "substitution errors".  a substitution error would be when you replace a g with a q maybe. i guess that's what it is. probably the most common error but still. the statement needs qualification.
What other errors are there? Bech32 addresses are of a fixed length (42 characters for P2WPKH, 62 characters for P2WSH or P2TR), so any error which adds or deletes a few characters will immediately result in an invalid address. The checksum will detect any error which affects up to 4 characters.

there can't be any other reason.
The reason that Bech32 excludes the characters "1", "b", "i", and "o" is explained in BIP173:
The character set is chosen to minimize ambiguity according to this visual similarity data, and the ordering is chosen to minimize the number of pairs of similar characters (according to the same data) that differ in more than 1 bit. As the checksum is chosen to maximize detection capabilities for low numbers of bit errors, this choice improves its performance under some error models.
sr. member
Activity: 1190
Merit: 469
It is guaranteed to detect up to 4 errors, and has less than a 1 in a billion chance of failing to detect more errors than that.
it's only guaranteed to detect up to 4 "substitution errors".  a substitution error would be when you replace a g with a q maybe. i guess that's what it is. probably the most common error but still. the statement needs qualification.



To be fair, reading, writing and speaking are 3 different action.


well it seem the reasons for excluding "b" in Bech32 is because it might be mistaken for the number "6". they look similar. a computer will never mistake one for the other. a person will never mistake one for the other if they are spoken since "six" and "bee" don't sound alike at all. which leaves the writing part. someone writes it down and then tries to read it. hence why they don't include the letter "b". there can't be any other reason.
legendary
Activity: 2870
Merit: 7490
Crypto Swap Exchange
wow that was tricky! i looked at it a bunch of times and they looked identical. Shocked I'm surprised bech32 allows both of those letters since they look so similar.

I can see the confusion, although IMO Bech32 encoding is better than other 32-bit encoding such as RFC 3548 Base32 which use A-Z, 2-7 and =.

Well, Bitcoin addresses are not made to be written down or spelled by humans.

If that's the case then please explain:

Bech32 is an encoding scheme used to encode SegWit addresses and Lightning invoices. The Bech32 alphabet contains 32 characters, including lowercase letters a-z and the numbers 0-9, excluding the number 1 and the letters ‘b’, ‘i’, ‘o’ to avoid reader confusion.

To be fair, reading, writing and speaking are 3 different action.
legendary
Activity: 2268
Merit: 18775
At least Bech32 detects them pretty accurately and not only one at a time even. As far as I remember Bech32 can detect where an error is and pinpoint it
It is guaranteed to detect up to 4 errors, and has less than a 1 in a billion chance of failing to detect more errors than that.

If that's the case then please explain:
Addresses aren't designed to be hand written, but they should still be double (or even triple) checked after you have copy and pasted them. And excluding one character from similar character pairs such as o and 0 helps to make the manual double checking process easier and more accurate.
sr. member
Activity: 1190
Merit: 469
Well, Bitcoin addresses are not made to be written down or spelled by humans.

If that's the case then please explain:

Bech32 is an encoding scheme used to encode SegWit addresses and Lightning invoices. The Bech32 alphabet contains 32 characters, including lowercase letters a-z and the numbers 0-9, excluding the number 1 and the letters ‘b’, ‘i’, ‘o’ to avoid reader confusion.

The Base58 symbol chart used in Bitcoin is specific to the Bitcoin project and is not intended to be the same as any other Base58 implementation used outside the context of Bitcoin (the characters excluded are: 0, O, I, and l)
hero member
Activity: 714
Merit: 1010
Crypto Swap Exchange
you didn't swap anything.
bc1qyt4n4qvg86y33qfa7zts0wa8kv6ls47kmuyw5e
bc1qyt4n4qvq86y33qfa7zts0wa8kv6ls47kmuyw5e

wow that was tricky! i looked at it a bunch of times and they looked identical. Shocked I'm surprised bech32 allows both of those letters since they look so similar.
Well, Bitcoin addresses are not made to be written down or spelled by humans. You use copy/paste or risk errors. At least Bech32 detects them pretty accurately and not only one at a time even. As far as I remember Bech32 can detect where an error is and pinpoint it, see https://bitcoin.sipa.be/bech32/demo/demo.html

E.g. for bc1qyt4n4qvq86y33qfa7zts0va8kv6ls47kmuyw5e the two wrong characters are precisely indicated by above Bech32 decoder demo.
sr. member
Activity: 1190
Merit: 469

you didn't swap anything.
bc1qyt4n4qvg86y33qfa7zts0wa8kv6ls47kmuyw5e
bc1qyt4n4qvq86y33qfa7zts0wa8kv6ls47kmuyw5e
I think that proves my point perfectly. Grin And the exact same character swap (g -> q) is easy to spot in the seed phrase in the word "ceilinq". Additionally, you can spot an error in a seed phrase like this without having the original to compare to. With an address, if you have no original to compare to then you can't spot anything at all.

wow that was tricky! i looked at it a bunch of times and they looked identical. Shocked I'm surprised bech32 allows both of those letters since they look so similar.
legendary
Activity: 2268
Merit: 18775
then it also serves to discourage people from legitimate uses of the wordlist such as flipping a coin or rolling dice to create their seed phrase.
It doesn't, because you can use a piece of software to calculate the checksum for you. You must use a piece of software to turn your seed phrase in to private keys and addresses - doing this by hand is simply not feasible. So requiring that same piece of software to also calculate your checksum for you brings zero additional risk.

you didn't swap anything.
bc1qyt4n4qvg86y33qfa7zts0wa8kv6ls47kmuyw5e
bc1qyt4n4qvq86y33qfa7zts0wa8kv6ls47kmuyw5e
I think that proves my point perfectly. Grin And the exact same character swap (g -> q) is easy to spot in the seed phrase in the word "ceilinq". Additionally, you can spot an error in a seed phrase like this without having the original to compare to. With an address, if you have no original to compare to then you can't spot anything at all.

If I were to have my cold wallet stolen, or if I were to steal one, what is the possibility of selecting seed words at random and gaining access.
If your plan is to randomly guess the seed phrase, then you don't need to steal the hardware wallet first. Just start generating seed phrases and checking for a balance, but note that the Earth will become uninhabitable by the dying sun expanding to a red giant long before you find a collision with any wallet, let alone a specific one.

With a library of 2048 seeds, presuming repeats are not allowed
Repeats are allowed. Provided your seed phrase was generated properly, then there is around a 1 in 31 chance of a repeated word in a 12 word seed phrase, and a 1 in 8 chance in a 24 word seed phrase.
legendary
Activity: 3472
Merit: 10611
~
Well, if I were writing code for a wallet that would be there.
That would both be pointless and a bad idea.
Nobody would use a wallet software to brute force a mnemonic so it is pointless to add such a feature.
Also it is a bad idea because it makes it harder on legitimate use cases like a normal user entering their seed phrase wrong and wanting to retry by fixing the words, typos, order, etc. Adding a delay would harm user experience.
member
Activity: 76
Merit: 35
A search stated that there are 2048 seeds in the library.
...
There are no such mechanisms in wallets not to mention that you could always write a script that searches the space on its own without needing the overhead of a wallet software. It goes without saying that it is a pointless code to write.

Well, if I were writing code for a wallet that would be there.
legendary
Activity: 3472
Merit: 10611
A search stated that there are 2048 seeds in the library.
You are making it too complex. A 12 word mnemonic is representing 128 bits of entropy. The chance of finding the same entropy is 1 in 2128. And a 24 word mnemonic represents 256 bits of entropy...

Quote
I suspect that after some number of tries each cold wallet will do something like: 
There are no such mechanisms in wallets not to mention that you could always write a script that searches the space on its own without needing the overhead of a wallet software. It goes without saying that it is a pointless code to write.
member
Activity: 76
Merit: 35
What are the chances of generating a valid seed phrase (or 24 mnemonic words) from the BIP39 wordlist of 2048 words?

I know the last word is a checksum generated from the first 23 words, but there's got to some % chance you correctly guess a valid working seed phrase just from manually randomly picking out 24 words...

As I read the OP the question that popped up in my mind is:  If I were to have my cold wallet stolen, or if I were to steal one, what is the possibility of selecting seed words at random and gaining access.  I also presume that the seed is not global, but local to each wallet and to each account on the wallet.  A search stated that there are 2048 seeds in the library.

I entered some numbers in an Excel work book for this.  With a library of 2048 seeds, presuming repeats are not allowed, the probably of getting the first one right is 1/2048.  For the second one, divide by 2047, then by 2046, etc.  By the time we get to the

11th word:  1 out of 2.58789 * 10^^36
23rd word: 1 out of 1.27862* 10^^76

That is about the size of the private key.

I suspect that after some number of tries each cold wallet will do something like: 
A) delete its private key(s).  Not the best, but at least the thief is not rewarded. 
B)  each time a bad sequence is provided slow down the response.  Start with, maybe 1 second of additional time, then double the time for each attempt.  It could write the number of attempts into a storage location and reset it upon getting the correct seed.

When I did a seed check with my wallet, there are a few seconds delay before it was ready for the next word.  That would introduce sufficient time to deter any thief.
Is this reasonable?  Or do I have a flaw in my understanding?
sr. member
Activity: 1190
Merit: 469
Checksum in mnemonic algorithms serve more purpose than just error detection.
hopefully so

Quote
And in Electrum they act as the version to announce the child address type that has to be derived and their derivation path.
that seems like a practical and useful use of it.

Quote from: ETFbitcoin
What exactly do you expect when the checksum only has 4-bit (for 12 words) and 8-bit (for 24 words) size?
i dont know  i guess i expected the probability of a false positive to be on the order of 1 in 2^32. is that so unreasonable?

A false positive error, or false positive, is a result that indicates a given condition exists (you entered the correct seed phrase) when it does not (you actually entered the wrong seed phrase).

Quote
And that's why some wallet force their user to verify and re-enter some/all of generated words.
I'm not sure that completely solves the problem. But i guess it's better than nothing.  Shocked


Quote from: o_e_l_e_o
It also serves to discourage people from just opening up the wordlist and picking 12 or 24 words they like the look of, which as we all know is an incredibly insecure way of generating a seed phrase, but is one that we see people discussing as a possibility over and over again.
then it also serves to discourage people from legitimate uses of the wordlist such as flipping a coin or rolling dice to create their seed phrase.

Quote
With a seed phrase, the words themselves serve as a sort of checksum. For example, compare these two addresses which have 1 character swapped:
Code:
bc1qyt4n4qvg86y33qfa7zts0wa8kv6ls47kmuyw5e, bc1qyt4n4qvq86y33qfa7zts0wa8kv6ls47kmuyw5e

you didn't swap anything.
bc1qyt4n4qvg86y33qfa7zts0wa8kv6ls47kmuyw5e
bc1qyt4n4qvq86y33qfa7zts0wa8kv6ls47kmuyw5e

but it would be trivial to see if you had by just lining them up like i did above.  


Quote
And now look at this seed phrase, which has the exact same character swap:
Code:
decorate cactus vivid amazing endorse banana pipe train lazy viable ceilinq suffer

It is significantly easier to immediately spot the mistake in the seed phrase than it is in the address.
i don't see any mistake or character swap.  Huh not that i'm a wordlist junkie but those words all seem correctly spelled.

legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
Bech32 is not really that complicated, in a way it is simpler to implement since it uses a multiple of 2 (32 as opposed to 58) so there is no need for an external library or a class like BigInteger for its computation.

You can convert to Hexadecimal and Base58 without using bignum division, just use base-58 logarithms:

Normally:

- Take hex number to convert and calculate the log58 and call it X
- Calculate log58(16)
- While X > 0:
-- Subtract X - log58(16) (equivalent to log58(hex/16))
-- Calculate 58^result (edit), store integer part in A, store fractional part in B
-- Multiply B by 58, this is your next base58 character, push it to the front of the Base58 string after lookup
-- Set X = log58 A

But the log58 can be converted into a log2 following a division by log2(58). Ie log58 X = log2(X)/log2(58).

And the nice using about base2 is that it can be optimized to use bit shifts. So an exponentiation is just a right-shift, and there's an extremely optimized log2 for Linux (and with some ASM mods, Windows and Mac) on this link: https://stackoverflow.com/questions/11376288/fast-computing-of-log2-for-64-bit-integers/11376759#11376759 (second answer).

You would need an array of uint64_t pointers to store all those bits without using a bignum, but there is hardly any performance penalty for doing that (as opposed to, eg. vector or lists).
legendary
Activity: 2268
Merit: 18775
It also serves to discourage people from just opening up the wordlist and picking 12 or 24 words they like the look of, which as we all know is an incredibly insecure way of generating a seed phrase, but is one that we see people discussing as a possibility over and over again.

With a seed phrase, the words themselves serve as a sort of checksum. For example, compare these two addresses which have 1 character swapped:
Code:
bc1qyt4n4qvg86y33qfa7zts0wa8kv6ls47kmuyw5e, bc1qyt4n4qvq86y33qfa7zts0wa8kv6ls47kmuyw5e

And now look at this seed phrase, which has the exact same character swap:
Code:
decorate cactus vivid amazing endorse banana pipe train lazy viable ceilinq suffer

It is significantly easier to immediately spot the mistake in the seed phrase than it is in the address.
legendary
Activity: 3472
Merit: 10611
wait so it is useless as an error detection tool? any mistake you make in one of the words there's a 1 in 16/1 in 256 chance it will not raise any flag and be happy to let you use that set of words. if that's the case then i'm not sure what purpose it serves.
Checksum in mnemonic algorithms serve more purpose than just error detection. For example in BIP39 it also works as padding since each word is 11 bits and 12 words would be 132 bits while it is easier to generate entropy that is multiple of 8-bit. And in Electrum they act as the version to announce the child address type that has to be derived and their derivation path.
sr. member
Activity: 1190
Merit: 469

Based on checksum length, the chance is 1 in 2^4 for 12 words and 1 in 2^8 for 24 words. If you want 1 in 4 billion chance, the checksum length should be 36-bit which equal 3.2727 4 words.

wait so it is useless as an error detection tool? any mistake you make in one of the words there's a 1 in 16/1 in 256 chance it will not raise any flag and be happy to let you use that set of words. if that's the case then i'm not sure what purpose it serves.
sr. member
Activity: 1190
Merit: 469
so you've written a bech32 checksum tool? you must be a genius then. Shocked
It's basically a simple for loop bro, it is not rocket science!
there's a vast difference between a 10 or 20 line for loop that you might come up with off the seat of your pants to sort a list and this thing Angry the length of the code is not a barometer of how much knowledge is required to fully understand the algorithm.

Quote from: o_e_l_e_o
Yeah, I would echo what pooya87 has said above. If your wallet software is not clear which derivation path it is using, then you shouldn't be using that software, exactly because you will likely run in to problems trying to recover access to your coins on a different piece of software in the future. Stick to reputable open source wallet software and you will not run in to such problems.
in an ideal world yes, i don't disagree to steer clear of wallet software that is not open source and fully documented. thats best practice.

Quote
Not as far as I know. Any time someone has sent money to the wrong address it has either been they copied the entirely wrong address or they were subjected to clipboard malware, and did not bother to double check before hitting send.
what about someone accidentally typing in the wrong 24 or 12 word seed phrase but it is still valid? what's the chances of that? hopefully that's in the same 1 in 4 billion probability area too. if not then...not good. Angry
legendary
Activity: 2268
Merit: 18775
the majority of users have to go with whatever their wallet software decided for them. and unfortunately, it's not always obvious what exact derivation path is being used.
Yeah, I would echo what pooya87 has said above. If your wallet software is not clear which derivation path it is using, then you shouldn't be using that software, exactly because you will likely run in to problems trying to recover access to your coins on a different piece of software in the future. Stick to reputable open source wallet software and you will not run in to such problems.

they need to write the derivation path down along with their seed phrase, might need another titanium plate to record that.
This should really only be necessary if you are using a really weird derivation path, which as I said above, the vast majority of users should never do. There are tools out there which will scan the most common alternative derivation paths automatically for you in order to try to recover your coins. Electrum itself offers this functionality for BIP39 seed phrases.

so there must have not been anyone ever come here on the forum who said they put in their address and sent money to it but then found out they made a typo and the money is sitting there on the blockchain...the probability of that happening is too small to have ever happened.  Shocked good to know.
Not as far as I know. Any time someone has sent money to the wrong address it has either been they copied the entirely wrong address or they were subjected to clipboard malware, and did not bother to double check before hitting send.
legendary
Activity: 3472
Merit: 10611
so you've written a bech32 checksum tool? you must be a genius then. Shocked
It's basically a simple for loop bro, it is not rocket science!

and unfortunately, it's not always obvious what exact derivation path is being used.
Wrong. Any decent open source wallet has that information either already documented or it is easily extractable from the code. For example we already know what derivation paths bitcoin core and electrum use which are 2 good open source wallets.
If it is not "obvious" what derivation path a wallet is using that is because it is closed source and something that should not be used in first place.
sr. member
Activity: 1190
Merit: 469

A better option is for the majority of users to just stick to the BIP44/49/84 standards and not mess around with custom derivation paths unless they really understand what they are doing.

the majority of users have to go with whatever their wallet software decided for them. and unfortunately, it's not always obvious what exact derivation path is being used. but that's probably the first question they should ask. and then if it's not as you stated, they need to write the derivation path down along with their seed phrase, might need another titanium plate to record that. Shocked but it's surely better than not being able to retrieve your money.

Quote
For legacy addresses, the chance of an incorrect address with the correct checksum is 1 in 4,294,967,296. For segwit addresses, the checksum is guaranteed to detect any error effecting up to 4 characters, and has less than 1 in a billion chance of failing to detect more than that. So not quite 1 in trillions, but still incredibly safe.
so there must have not been anyone ever come here on the forum who said they put in their address and sent money to it but then found out they made a typo and the money is sitting there on the blockchain...the probability of that happening is too small to have ever happened.  Shocked good to know.
legendary
Activity: 2268
Merit: 18775
But it would be nice if important details like the derivation path was somehow possible to be encoded into the seed phrase.
As you point out, Electrum seed phrases do this. Basically, when Electrum generates a seed phrase, it then hashes it and checks if the hash starts with the correct version number. If not, it increments the entropy by 1 and tries again, until it reaches a seed phrase whose hash does start with the correct version number. That version number tells Electrum which script type and derivation path to use, which is why Electrum seed phrases are either legacy or segwit and will only ever recover one wallet, as opposed to BIP39 seed phrases which can use any script type at any derivation path and restore a near infinite number of wallets.

they need to store the derivation path along with it
A better option is for the majority of users to just stick to the BIP44/49/84 standards and not mess around with custom derivation paths unless they really understand what they are doing.

no i would not want it doing that. but what if i entered something that wasn't my address and it actually passed the checksum? hopefully the probability of that is on the order of 1 in trillions or even more.
For legacy addresses, the chance of an incorrect address with the correct checksum is 1 in 4,294,967,296. For segwit addresses, the checksum is guaranteed to detect any error effecting up to 4 characters, and has less than 1 in a billion chance of failing to detect more than that. So not quite 1 in trillions, but still incredibly safe.
sr. member
Activity: 1190
Merit: 469
Exactly. Just like most people would skip over writing down their seed phrase twice like you suggest. They can't skip a hard coded checksum, however.
yeah i guess that's true.
Quote
And if checksums didn't exist, and someone comes saying their seed phrase isn't working, you have no idea if it is a problem with the seed phrase itself or if it is a problem with something they are doing with the seed phrase (passphrase, derivation path, etc.) By having a checksum, you can immediately narrow down the problem.
Yeah I guess i didn't think about it that way. But it would be nice if important details like the derivation path was somehow possible to be encoded into the seed phrase. Because from what I seen alot of cases that's where the problem is. they don't know the derivation path. and if you don't know that, you don't know anything. so not only does someone need to store their seed, they need to store the derivation path along with it (and make sure they wrote that down correctly too). with that said though, i think electrum stores the derivation path so the user doesn't have to worry about it. so that's cool.

Quote
Extremely useful. You don't want your wallet software accepting an incorrect address and allowing you to sign transactions to that incorrect address.
no i would not want it doing that. but what if i entered something that wasn't my address and it actually passed the checksum? hopefully the probability of that is on the order of 1 in trillions or even more.
 
Quote from: pooya87
Bech32 is not really that complicated, in a way it is simpler to implement since it uses a multiple of 2 (32 as opposed to 58) so there is no need for an external library or a class like BigInteger for its computation.
so you've written a bech32 checksum tool? you must be a genius then. Shocked

Quote from: odolvlobo
If you strongly believe that you have a better method, then the best thing to do would be to publish a BIP so that everyone can see the merits of your proposal. Those who like your proposal will adopt it, and those who don't will not.
apparently a huge amount of computer cpu time was spent on optimizing bech32. to make it as efficient as possible. it's on a different level than sha256 but even so, i have no idea how it works.  Grin

Quote
That's one of the beauties of open source development -- you can implement anything you want without the permission or approval of anyone else.
of course.




legendary
Activity: 4522
Merit: 3426
@larry_vw_1955

If you strongly believe that you have a better method, then the best thing to do would be to publish a BIP so that everyone can see the merits of your proposal. Those who like your proposal will adopt it, and those who don't will not.

That's one of the beauties of open source development -- you can implement anything you want without the permission or approval of anyone else.
legendary
Activity: 3472
Merit: 10611
you mean like the bech32 situation? it seems way more complicated than the sha256. why it needs to be, i don't know.
Bech32 is not really that complicated, in a way it is simpler to implement since it uses a multiple of 2 (32 as opposed to 58) so there is no need for an external library or a class like BigInteger for its computation.

Quote
you must have heard of the bech32 length extension mutation weakness  Grin the tldr on that is that it was such a complicated thing that they couldn't foresee these type of issues in advance. not a good thing at all.
It is not really a serious thing considering all bitcoin addresses have been fixed length (P2WPKH and P2WSH and even the Taproot addresses). It also didn't happen because of "being complicated" it was a simple unforeseen case.
legendary
Activity: 2268
Merit: 18775
plus lets be honest most people probably skip over that step if it's not required.  Shocked
Exactly. Just like most people would skip over writing down their seed phrase twice like you suggest. They can't skip a hard coded checksum, however.

if it was that simple people wouldn't come on to this forum saying their seed phrase isnt "working".
And if checksums didn't exist, and someone comes saying their seed phrase isn't working, you have no idea if it is a problem with the seed phrase itself or if it is a problem with something they are doing with the seed phrase (passphrase, derivation path, etc.) By having a checksum, you can immediately narrow down the problem.

being able to detect up to 4 characters that are in error sounds good but if it can't fix it too then i'm not sure how useful it is.
Extremely useful. You don't want your wallet software accepting an incorrect address and allowing you to sign transactions to that incorrect address. And by showing you have an error, you know you've made a mistake in your process somewhere or have some malware and can re-examine your process before losing your coins to an incorrect address.
sr. member
Activity: 1190
Merit: 469
@larry_vw_1955
What you are forgetting is that adding any more error detection/correction algorithm in the seed phrase algorithms makes them that much more complicated
you mean like the bech32 situation? it seems way more complicated than the sha256. why it needs to be, i don't know.

Quote
and it could also increase the size of the final phrase. Complication is also not a good thing since it makes implementation harder and more error prone.
you must have heard of the bech32 length extension mutation weakness  Grin the tldr on that is that it was such a complicated thing that they couldn't foresee these type of issues in advance. not a good thing at all.
legendary
Activity: 3472
Merit: 10611
@larry_vw_1955
What you are forgetting is that adding any more error detection/correction algorithm in the seed phrase algorithms makes them that much more complicated and it could also increase the size of the final phrase. Complication is also not a good thing since it makes implementation harder and more error prone.
sr. member
Activity: 1190
Merit: 469

There's a very easy way for that. Double check (or triple check) what you wrote down.
if it was that simple people wouldn't come on to this forum saying their seed phrase isnt "working". maybe they should use a computer printer to print it out that way they are guaranteed to not make any transcription mistakes. the probability of making a transcription error is much greater than any security risk that might occur by using a computer printer at home.
Quote from: o_e_l_e_o
Arguably, you only want error detection and not error correction. ... In short, you don't want an error to accidentally be corrected to the wrong address, resulting in loss of funds.

well here is what they said:


This implements a BCH code that guarantees detection of any error affecting at most 4 characters and has less than a 1 in 109 chance of failing to detect more errors.


that is the part that sounds good. the part that sounds bad is the following:


Error correction

One of the properties of these BCH codes is that they can be used for error correction. An unfortunate side effect of error correction is that it erodes error detection: correction changes invalid inputs into valid inputs, but if more than a few errors were made then the valid input may not be the correct input. Use of an incorrect but valid input can cause funds to be lost irrecoverably. Because of this, implementations SHOULD NOT implement correction beyond potentially suggesting to the user where in the string an error might be found, without suggesting the correction to make.


being able to detect up to 4 characters that are in error sounds good but if it can't fix it too then i'm not sure how useful it is. might as well just use a simpler checksum mechanism. one that can't fix anything just detect.

Quote from: PrimeNumber7
You seem to have made a solid argument against using paper wallets, particularly paper wallets in which the secret (seed) is written by hand.
well if you read carefully, I said that you need to write it out twice. and then compare them visually to make sure they are the same thing. so it's not an argument against paper wallets that one might generate by rolling dice or flipping coins but you have to be REALLY careful. Grin
legendary
Activity: 2268
Merit: 18775
First of all. It is not just a missing last word that has 128 possibilities. Every word has 128 possibilities if it is missing, assuming that no others are also wrong or missing.
That's not quite right. Only the last word has exactly 128 possibilities, since for every final seven bits of entropy the last word provides, there will be exactly one word out of the 16 possibilities which has the correct checksum. When swapping out any other word, since the checksum is already fixed, there will be 128 possibilities on average (as opposed to exactly 128 words), since you cannot predict exactly how many possibilities will hash to the already fixed checksum.

why? why is sha-256 an appropriate choice for a checksum? it was not designed for that purpose. all it has the ability to do is detect errors but not correct them right? so how is that appropriate? not being able to correct a certain minimal number of errors. it can do zero in that regard.
Arguably, you only want error detection and not error correction. The checksum used in Bech32 addresses can provide error correction, but no piece of wallet software implements it. The reason behind this is explained in BIP173. In short, you don't want an error to accidentally be corrected to the wrong address, resulting in loss of funds.
copper member
Activity: 1666
Merit: 1901
Amazon Prime Member #7
The checksum is not meant for you and your eyes. It is meant for the application so that whenever a user enters the checksum incorrectly into that tool's textbox it can automatically detect if there were any mistakes in the input so that the application can inform the user that something is wrong.
but we still need a way to make sure we wrote it down correctly. and you just admitted as much. since just by looking at 12 words, you can't tell whether there is any discrepancy in them or not.

also, they'll definitely know something is wrong whether the software detects it or not. when their balance doesn't show up.
You seem to have made a solid argument against using paper wallets, particularly paper wallets in which the secret (seed) is written by hand.


to further counter your argument, it would be possible to manually check if the checksum is valid “by hand”.
legendary
Activity: 2380
Merit: 5213
but we still need a way to make sure we wrote it down correctly.
There's a very easy way for that. Double check (or triple check) what you wrote down.
You can also recover your wallet from the seed before sending any fund to that to make sure the seed phrase is correct and it generates the same addresses.
sr. member
Activity: 1190
Merit: 469
The checksum is not meant for you and your eyes. It is meant for the application so that whenever a user enters the checksum incorrectly into that tool's textbox it can automatically detect if there were any mistakes in the input so that the application can inform the user that something is wrong.
but we still need a way to make sure we wrote it down correctly. and you just admitted as much. since just by looking at 12 words, you can't tell whether there is any discrepancy in them or not.

also, they'll definitely know something is wrong whether the software detects it or not. when their balance doesn't show up.
legendary
Activity: 3472
Merit: 10611
I then compare them. I see they are the same. no mistakes were made. we know that with 100% certainty. No need for any checksum. The eyes are good enough.
The checksum is not meant for you and your eyes. It is meant for the application so that whenever a user enters the checksum incorrectly into that tool's textbox it can automatically detect if there were any mistakes in the input so that the application can inform the user that something is wrong.
sr. member
Activity: 1190
Merit: 469

The purpose of a SHA-256 hash is to detect corruption of the data. It is a checksum. If you don't agree, then what is its purpose?

you could even require a valid seed phrase to be 46 words long by just duplicating the original 23 word phrase. then the last 23 words would be your checksum for the first 23 words. and they could detect AND fix errors.  Cheesy

You are contradicting yourself. Your 46-word phrase would detect an error, but it could not fix it because if the duplicates don't match, you don't know which one is wrong. So, it is no better than the BIP-39 checksum.

For example, lets say the seed phrase is hotel obvious agent lecture gadget evil jealous keen fragile before damp clarify


Now what I do is write it twice:

hotel obvious agent lecture gadget evil jealous keen fragile before damp clarify
hotel obvious agent lecture gadget evil jealous keen fragile before damp clarify

I then compare them. I see they are the same. no mistakes were made. we know that with 100% certainty. No need for any checksum. The eyes are good enough. Now lets say I was not paying attention and wrote it down like this:

hotel obvious agent lecture gadget evil jealous keen fragile before damp clarify
hotel obvious agent lecture gadget jealous before fragile damp clarify

I made some serious errors but its very easy to compare and fix. Try that with sha256 and see how long it takes you, if you can even fix it at all.
legendary
Activity: 4522
Merit: 3426
Quote
Bech32 does have better error detection, but that doesn't make BIP-39's error detection bad and SHA-256 is an appropriate choice for a checksum.
why? why is sha-256 an appropriate choice for a checksum? it was not designed for that purpose. all it has the ability to do is detect errors but not correct them right? so how is that appropriate? not being able to correct a certain minimal number of errors. it can do zero in that regard.

The purpose of a SHA-256 hash is to detect corruption of the data. It is a checksum. If you don't agree, then what is its purpose?

you could even require a valid seed phrase to be 46 words long by just duplicating the original 23 word phrase. then the last 23 words would be your checksum for the first 23 words. and they could detect AND fix errors.  Cheesy

You are contradicting yourself. Your 46-word phrase would detect an error, but it could not fix it because if the duplicates don't match, you don't know which one is wrong. So, it is no better than the BIP-39 checksum.
copper member
Activity: 1666
Merit: 1901
Amazon Prime Member #7
I was just curious on behalf of those that don't trust wallet software and wanted to be hardcore about it.
In general, I would say that you probably are not going to be better off "manually" selecting your seed. Many who attempt to generate a seed outside of a computer program will adopt a procedure that is not truly random and will generate a seed that is vulnerable to theft.

Further, in order to spend your coin, you will need to use some software that is used to generate and sign a transaction.
legendary
Activity: 2380
Merit: 5213
Because this checksum thing can't fix things if there were too many mistakes made like writing words in the wrong order and leaving a few of them out.
Your expectations from checksum are too high.
Checksum only helps detect errors. Checksum isn't supposed to eliminate the errors or correct them. You should always double check or triple check your seed phrase after writing it down to make sure there is no error.
sr. member
Activity: 1190
Merit: 469

First of all. It is not just a missing last word that has 128 possibilities. Every word has 128 possibilities if it is missing, assuming that no others are also wrong or missing.
ok.



Quote
Bech32 does have better error detection, but that doesn't make BIP-39's error detection bad and SHA-256 is an appropriate choice for a checksum.
why? why is sha-256 an appropriate choice for a checksum? it was not designed for that purpose. all it has the ability to do is detect errors but not correct them right? so how is that appropriate? not being able to correct a certain minimal number of errors. it can do zero in that regard.

Quote
Embedding a checksum may not be the best solution, but it is better than nothing. Also, your assumption that the software can correct a seed phrase is wrong. The software does not have enough information and would have to resort to brute force search. And that would only be practical if you know which words are wrong or missing.
which brings us back to the original question of why not just write down your seed phrase two times and ditch the checksum altogether. Because this checksum thing can't fix things if there were too many mistakes made like writing words in the wrong order and leaving a few of them out. You would be SOL. has anyone here ever been in the situation where they needed this checksum or else all their funds were lost?  Huh that seems like such a remote possibility as to not even be worried about happening.

you could even require a valid seed phrase to be 46 words long by just duplicating the original 23 word phrase. then the last 23 words would be your checksum for the first 23 words. and they could detect AND fix errors.  Cheesy
legendary
Activity: 4522
Merit: 3426
if there's only 128 possibilities for the last word then what's the point of having one since it is easily guessed. easily brute forced.

First of all. It is not just a missing last word that has 128 possibilities. Every word has 128 possibilities if it is missing, assuming that no others are also wrong or missing.

I think the checksum idea is a badly implemented one. Sha256 is good for checksums why? I think bech32 has a more robust checksum thing going on but I found it impossible to find a good explanation of that made much sense.

Bech32 does have better error detection, but that doesn't make BIP-39's error detection bad and SHA-256 is an appropriate choice for a checksum.


Also the whole concept of a checksum embedded into your seed phrase is questionable since someone could write down a wrong seed phrase and the software could just correct it for them and they would never even know they were entering something wrong. i guess?

Embedding a checksum may not be the best solution, but it is better than nothing. Also, your assumption that the software can correct a seed phrase is wrong. The software does not have enough information and would have to resort to brute force search. And that would only be practical if you know which words are wrong or missing.
sr. member
Activity: 1190
Merit: 469
Are you saying it's bad that there would be 128 possibilities for the last word? What's the problem with that?
if there's only 128 possibilities for the last word then what's the point of having one since it is easily guessed. easily brute forced.


can't write down 11 words?
Quote
I don't see any reason for not writing the 12th word. But if you have written down 11 words and don't have the 12th word for any reason, it can be easily brute-forced and there wouldn't be a big problem.
why not just write down your seed words twice in a row on the same piece of paper. double the security. no checksum needed.

Quote from: o_e_l_e_o
And yes, it is important. If you don't have a checksum and import an incorrect seed phrase, then you have no idea you have imported an incorrect seed phrase. You could spend weeks or months trying to brute force a passphrase which doesn't exist, or searching weird and wonderful derivation paths, or who knows what else, trying to hunt down your wallet. With a checksum, you know immediately one of your words is wrong and can immediately narrow down your search significantly. Not to mention that brute forcing an incorrect seed phrase is also quicker with a checksum since you do not have to derive addresses and check them for balance for all the invalid phrases.


I think the checksum idea is a badly implemented one. Sha256 is good for checksums why? I think bech32 has a more robust checksum thing going on but I found it impossible to find a good explanation of that made much sense.

Also the whole concept of a checksum embedded into your seed phrase is questionable since someone could write down a wrong seed phrase and the software could just correct it for them and they would never even know they were entering something wrong. i guess?

also well i could go on but you get the idea.

legendary
Activity: 2268
Merit: 18775
is that good or bad?
I wouldn't say it is either. It's just how the checksum works.

that really doesn't seem ideal. it makes me wonder about this whole checksum thing and if it's really all that important or just a gimmick.
That's because we are considering it backwards here. There are only 128 possible words if you are picking them manually. Since the last word of a 12 word seed phrase also contains 7 bits of entropy, then when generated properly there is exactly one word which provides the correct checksum for the provided entropy.

And yes, it is important. If you don't have a checksum and import an incorrect seed phrase, then you have no idea you have imported an incorrect seed phrase. You could spend weeks or months trying to brute force a passphrase which doesn't exist, or searching weird and wonderful derivation paths, or who knows what else, trying to hunt down your wallet. With a checksum, you know immediately one of your words is wrong and can immediately narrow down your search significantly. Not to mention that brute forcing an incorrect seed phrase is also quicker with a checksum since you do not have to derive addresses and check them for balance for all the invalid phrases.
legendary
Activity: 2730
Merit: 7065
If a wallet or the tool you use for generating the seed phrase is open-source and the code has been reviewed, there's nothing to worry about.
That depends on the quality of the people reviewing the software and all other community members and their abilities to spot vulnerabilities in a piece of code. And also how long it will take them to do it. 1 day, 1 month, 1 year, 10 years.... A vulnerability that gets discovered and patched in a day is totally different from something that's out there publicly for a year, for example.

Here is a good article that mentions a few interesting points:
https://thehackernews.com/2022/11/last-years-open-source-tomorrows.html

Quote
Finding open source vulnerabilities is typically done by the maintainers of the open source project, users, auditors, or external security researchers. But despite these great code-archaeologists helping secure our world, the community still struggles to find security flaws.

On average, it takes over 800 days to discover a security flaw in open source projects. For instance, the infamous Log4shell (CVE-2021-44228) vulnerability was undiscovered for a whopping 2649 days.

The analysis shows that 74% of security flaws are actually undiscovered for at least one year! Java and Ruby seem to have the most challenges here, as it takes the community more than 1000 days to find and disclose vulnerabilities.
legendary
Activity: 2380
Merit: 5213
that really doesn't seem ideal. it makes me wonder about this whole checksum thing and if it's really all that important or just a gimmick.
Are you saying it's bad that there would be 128 possibilities for the last word? What's the problem with that?


can't write down 11 words?
I don't see any reason for not writing the 12th word. But if you have written down 11 words and don't have the 12th word for any reason, it can be easily brute-forced and there wouldn't be a big problem.
sr. member
Activity: 1190
Merit: 469
there will be 8 valid final words for any given 23 words.
is that good or bad?

Quote
For a 12 word phrase which has 4 bits of checksum, there will be 27 = 128 possible valid final words.
that really doesn't seem ideal. it makes me wonder about this whole checksum thing and if it's really all that important or just a gimmick.

i guess the argument against checksums is if you store your seedphrase correctly there should be no need for error correction and I do tend to agree. can't write down 11 words? then you got bigger problems. Cry such as not caring enough about your money.
legendary
Activity: 4522
Merit: 3426
Thanks to all for the response.
I was just curious on behalf of those that don't trust wallet software and wanted to be hardcore about it.

It is not easy to generate a bip-39 phrase without software because a SHA-256 hash is required. However, many wallets will allow you to use an invalid phrase, so simply picking 12 (or 24) random words is a viable method, but it is not as safe.

This page describes in simple terms how it is done: https://medium.com/coinmonks/mnemonic-generation-bip39-simply-explained-e9ac18db9477
legendary
Activity: 2380
Merit: 5213
I was just curious on behalf of those that don't trust wallet software and wanted to be hardcore about it.
If a wallet or the tool you use for generating the seed phrase is open-source and the code has been reviewed, there's nothing to worry about.
If you still want to generate the seed phrase yourself for any reason, it would better to generate a random number and convert that to a seed phrase instead of directly going the word list.
jr. member
Activity: 36
Merit: 27
Thanks to all for the response.

I was just curious on behalf of those that don't trust wallet software and wanted to be hardcore about it.
legendary
Activity: 1596
Merit: 1288
Your chances are only about 0.4% but if you do not trust how words are chosen by your wallet software, it is better to use another software.

If you are very skeptical and have no programming skills it is best to use a dice, coin, piece of paper and then extract the results using: https://github.com/taelfrinn/Bip39-diceware

It is much better than relying on the human brain that is based on searching for similar things than generating random words.
legendary
Activity: 3668
Merit: 6382
Looking for campaign manager? Contact icopress!
I wouldn't recommend anyone to do this because our brain and thinking is total disaster in term of creating anything random.

Exactly. One would pick some nice words, one could pick the words alphabetically, one may not know / care that those words can exist multiple times in a seed...

If somebody has issues with the seed generated by his wallet (works he doesn't know nor want for some reason), a better way is to simply generate 1-2 more wallets until the words are good enough. The result is that the seed is still much better than the one the user would have been picking by himself word by word.
legendary
Activity: 2212
Merit: 7064
What are the chances of generating a valid seed phrase (or 24 mnemonic words) from the BIP39 wordlist of 2048 words?
There are websites that allow you to pick whatever 23 mnemonic words you want from BIP39 wordlist, and then last word is calculated to create everything correctly.
I wouldn't recommend anyone to do this because our brain and thinking is total disaster in term of creating anything random.
One of the websites I saw before is called seedpicker, but do your own research and read the guide before using it:
https://seedpicker.net/calculator/last-word.html
legendary
Activity: 2268
Merit: 18775
Each word encodes 11 bits of data. As hosseinimr93 has pointed out, for a 24 word seed phrase the checksum is 8 bits. This means the final word has 3 bits which are not checksum, which gives 23 = 8 possible combinations. For each of these 8 combinations, there will be exactly one correct checksum, meaning there will be 8 valid final words for any given 23 words.

For a 12 word phrase which has 4 bits of checksum, there will be 27 = 128 possible valid final words.

And of course, I have to ask, why are you manually picking words to create a seed phrase? Such a process leaves you with a very insecure seed phrase and liable to have all your coins stolen.
legendary
Activity: 2380
Merit: 5213
If you pick 24 words randomly, the probability of having a seed phrase which passes the checksum would be 1 in 256. For the last word, 8 out of the 2048 words would produce a valid seed phrase.

I know the last word is a checksum generated from the first 23 words,
The checksum isn't the last word. In a 24 word BIP39 seed phrase, the last 8 bits are the checksum. The first 3 bits of the last word are chosen randomly.
jr. member
Activity: 36
Merit: 27
What are the chances of generating a valid seed phrase (or 24 mnemonic words) from the BIP39 wordlist of 2048 words?

I know the last word is a checksum generated from the first 23 words, but there's got to some % chance you correctly guess a valid working seed phrase just from manually randomly picking out 24 words...
Jump to: