Randomly picking 24 words from the BIP39 wordlist

o_e_l_e_o

legendary

Activity: 2268

Merit: 18775

Quote from: larry_vw_1955 on December 14, 2022, 08:22:06 PM

How can you quote a statistic on something that is not considered a valid error pattern?

The statistics I quoted refer to any error involving the substitution of 8 characters. This is an entirely valid error pattern when the substitution of those characters is random. Substituting 8 consecutive characters with "aardvark" is not a valid error pattern because of the reasons you have quoted above, but it would still have the same 0.931 per billion chance of going undetected.

Emphasis mine:

Quote from: https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki#checksum-design

This means that when 5 changed characters occur randomly distributed in the 39 characters of a P2WPKH address, there is a chance of 0.756 per billion that it will go undetected. When those 5 changes occur randomly within a 19-character window, that chance goes down to 0.093 per billion. As the number of errors goes up, the chance converges towards 1 in 230 = 0.931 per billion.

larry_vw_1955

sr. member

Activity: 1190

Merit: 469

Quote from: o_e_l_e_o on December 14, 2022, 04:19:57 AM

Quote from: larry_vw_1955 on December 13, 2022, 07:18:09 PM

we are now replacing 8 consecutive characters with 8 random ones. so it is not something that we know statistics on bech32 error detection about.

We do have statistics on that. They are summarized at the end of BIP173: https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki#checksum-design

Before you can compute statistics on error detection you have to first define what you mean by an error pattern. Why? Because error detection is done on error patterns. If you don't define an error pattern then you can't have idea of what an error is. Maybe you're right but i'm not sure.

We define an error pattern as a sequence of first one or more deletions, then swaps of adjacent characters, followed by substitutions, insertions, and duplications, in that order, all in specific positions, applied to a string with valid checksum that is otherwise randomly chosen. For insertions and substitutions we assume a uniformly random new character. For example, "delete the 17th character, swap the 11th character with the 12th character, and insert a random character in the 24th position" is an error pattern. "Replace the 43rd through 48th character with 'aardvark'" is not a valid error pattern, because the new characters are not random and there is no reason why this particular string is more likely than any other to be substituted.

Bech32 has a probability of 0 to incorrectly accept error patterns consisting of up to 4 substitutions—they are always detected.

Quote

For both address lengths, and considering 8 characters being substituted, then the chance of this going undetected by the checksum converges on 0.931 per billion.

How can you quote a statistic on something that is not considered a valid error pattern?

o_e_l_e_o

legendary

Activity: 2268

Merit: 18775

Quote from: larry_vw_1955 on December 13, 2022, 07:18:09 PM

we are now replacing 8 consecutive characters with 8 random ones. so it is not something that we know statistics on bech32 error detection about.

We do have statistics on that. They are summarized at the end of BIP173: https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki#checksum-design

The lines we are interested in are length 39 and 59, which correspond to a 42 character P2WPKH address and a 62 character P2WSH address (3 additional characters for the non-data part "bc1"). For both address lengths, and considering 8 characters being substituted, then the chance of this going undetected by the checksum converges on 0.931 per billion.

larry_vw_1955

sr. member

Activity: 1190

Merit: 469

Quote from: o_e_l_e_o on December 13, 2022, 03:48:44 AM

Note that you are quoting from BIP350 which defines the Bech32m variant, which is used for version 1 segwit addresses (taproot, bc1p) and future versions of segwit addresses. These addresses can be between 14 and 74 characters long, so you can indeed have additions and deletions without invalidating the address.

yes, i suppose i am. it further goes on to say this too:

Replace the 43rd through 48th character with 'aardvark'" is not a valid error pattern, because the new characters are not random and there is no reason why this particular string is more likely than any other to be substituted.

they are replacing 6 consecutive characters with 8 random ones. so it is not something that we know statistics on error detection about.

Quote

Version 0 segwit addresses (native segwit, bc1q) as were being discussed above have fixed lengths, so you can only have a deletion if you also have an addition of the same length, and vice versa.

fine. let's revise their statement just a bit to read as follows:

Replace the 43rd through 50th character with 'aardvark'" is not a valid error pattern, because the new characters are not random and there is no reason why this particular string is more likely than any other to be substituted.

we are now replacing 8 consecutive characters with 8 random ones. so it is not something that we know statistics on bech32 error detection about.

string length didn't change. Shocked

o_e_l_e_o

legendary

Activity: 2268

Merit: 18775

Note that you are quoting from BIP350 which defines the Bech32m variant, which is used for version 1 segwit addresses (taproot, bc1p) and future versions of segwit addresses. These addresses can be between 14 and 74 characters long, so you can indeed have additions and deletions without invalidating the address.

Version 0 segwit addresses (native segwit, bc1q) as were being discussed above have fixed lengths, so you can only have a deletion if you also have an addition of the same length, and vice versa.

larry_vw_1955

sr. member

Activity: 1190

Merit: 469

Quote from: o_e_l_e_o on December 12, 2022, 06:13:55 AM

What other errors are there? Bech32 addresses are of a fixed length (42 characters for P2WPKH, 62 characters for P2WSH or P2TR), so any error which adds or deletes a few characters will immediately result in an invalid address. The checksum will detect any error which affects up to 4 characters.

well i mean maybe this is splitting hairs but you're not defining what your universe of errors consists of. i think they have a definition of what an "error pattern" is:

We define an error pattern as a sequence of first one or more deletions, then swaps of adjacent characters, followed by substitutions, insertions, and duplications, in that order, all in specific positions, applied to a string with valid checksum that is otherwise randomly chosen.

so any error that does not fall into that particular definition is the answer to your question maybe.

Quote

The reason that Bech32 excludes the characters "1", "b", "i", and "o" is explained in BIP173:

Quote from: https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki#cite_note-4

The character set is chosen to minimize ambiguity according to this visual similarity data, and the ordering is chosen to minimize the number of pairs of similar characters (according to the same data) that differ in more than 1 bit. As the checksum is chosen to maximize detection capabilities for low numbers of bit errors, this choice improves its performance under some error models.

so it's basically what i said. for visual reasons.

o_e_l_e_o

legendary

Activity: 2268

Merit: 18775

Quote from: larry_vw_1955 on December 11, 2022, 09:07:32 PM

it's only guaranteed to detect up to 4 "substitution errors". a substitution error would be when you replace a g with a q maybe. i guess that's what it is. probably the most common error but still. the statement needs qualification.

What other errors are there? Bech32 addresses are of a fixed length (42 characters for P2WPKH, 62 characters for P2WSH or P2TR), so any error which adds or deletes a few characters will immediately result in an invalid address. The checksum will detect any error which affects up to 4 characters.

Quote from: larry_vw_1955 on December 11, 2022, 09:07:32 PM

there can't be any other reason.

The reason that Bech32 excludes the characters "1", "b", "i", and "o" is explained in BIP173:

Quote from: https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki#cite_note-4

The character set is chosen to minimize ambiguity according to this visual similarity data, and the ordering is chosen to minimize the number of pairs of similar characters (according to the same data) that differ in more than 1 bit. As the checksum is chosen to maximize detection capabilities for low numbers of bit errors, this choice improves its performance under some error models.

larry_vw_1955

sr. member

Activity: 1190

Merit: 469

Quote from: o_e_l_e_o on December 11, 2022, 03:18:15 AM

It is guaranteed to detect up to 4 errors, and has less than a 1 in a billion chance of failing to detect more errors than that.

it's only guaranteed to detect up to 4 "substitution errors". a substitution error would be when you replace a g with a q maybe. i guess that's what it is. probably the most common error but still. the statement needs qualification.

Quote from: ABCbits on December 11, 2022, 04:55:03 AM

To be fair, reading, writing and speaking are 3 different action.

well it seem the reasons for excluding "b" in Bech32 is because it might be mistaken for the number "6". they look similar. a computer will never mistake one for the other. a person will never mistake one for the other if they are spoken since "six" and "bee" don't sound alike at all. which leaves the writing part. someone writes it down and then tries to read it. hence why they don't include the letter "b". there can't be any other reason.

ABCbits

legendary

Activity: 2870

Merit: 7490

Crypto Swap Exchange

Quote from: larry_vw_1955 on December 10, 2022, 09:11:05 PM

wow that was tricky! i looked at it a bunch of times and they looked identical. Shocked

I'm surprised bech32 allows both of those letters since they look so similar.

I can see the confusion, although IMO Bech32 encoding is better than other 32-bit encoding such as RFC 3548 Base32 which use A-Z, 2-7 and =.

Quote from: larry_vw_1955 on December 11, 2022, 12:24:46 AM

Quote from: Cricktor on December 10, 2022, 10:39:24 PM

Well, Bitcoin addresses are not made to be written down or spelled by humans.

If that's the case then please explain:

Bech32 is an encoding scheme used to encode SegWit addresses and Lightning invoices. The Bech32 alphabet contains 32 characters, including lowercase letters a-z and the numbers 0-9, excluding the number 1 and the letters ‘b’, ‘i’, ‘o’ to avoid reader confusion.

To be fair, reading, writing and speaking are 3 different action.

o_e_l_e_o

legendary

Activity: 2268

Merit: 18775

Quote from: Cricktor on December 10, 2022, 10:39:24 PM

At least Bech32 detects them pretty accurately and not only one at a time even. As far as I remember Bech32 can detect where an error is and pinpoint it

It is guaranteed to detect up to 4 errors, and has less than a 1 in a billion chance of failing to detect more errors than that.

Quote from: larry_vw_1955 on December 11, 2022, 12:24:46 AM

If that's the case then please explain:

Addresses aren't designed to be hand written, but they should still be double (or even triple) checked after you have copy and pasted them. And excluding one character from similar character pairs such as o and 0 helps to make the manual double checking process easier and more accurate.

larry_vw_1955

sr. member

Activity: 1190

Merit: 469

Quote from: Cricktor on December 10, 2022, 10:39:24 PM

Well, Bitcoin addresses are not made to be written down or spelled by humans.

If that's the case then please explain:

Bech32 is an encoding scheme used to encode SegWit addresses and Lightning invoices. The Bech32 alphabet contains 32 characters, including lowercase letters a-z and the numbers 0-9, excluding the number 1 and the letters ‘b’, ‘i’, ‘o’ to avoid reader confusion.

The Base58 symbol chart used in Bitcoin is specific to the Bitcoin project and is not intended to be the same as any other Base58 implementation used outside the context of Bitcoin (the characters excluded are: 0, O, I, and l)

Cricktor

hero member

Activity: 714

Merit: 1010

Crypto Swap Exchange

Quote from: larry_vw_1955 on December 10, 2022, 09:11:05 PM

Quote from: larry_vw_1955 on December 08, 2022, 08:56:25 PM

you didn't swap anything.
bc1qyt4n4qvg86y33qfa7zts0wa8kv6ls47kmuyw5e
bc1qyt4n4qvq86y33qfa7zts0wa8kv6ls47kmuyw5e

wow that was tricky! i looked at it a bunch of times and they looked identical. Shocked

I'm surprised bech32 allows both of those letters since they look so similar.

Well, Bitcoin addresses are not made to be written down or spelled by humans. You use copy/paste or risk errors. At least Bech32 detects them pretty accurately and not only one at a time even. As far as I remember Bech32 can detect where an error is and pinpoint it, see https://bitcoin.sipa.be/bech32/demo/demo.html

E.g. for bc1qyt4n4qvq86y33qfa7zts0va8kv6ls47kmuyw5e the two wrong characters are precisely indicated by above Bech32 decoder demo.

larry_vw_1955

sr. member

Activity: 1190

Merit: 469

Quote from: o_e_l_e_o on December 10, 2022, 07:44:57 AM

Quote from: larry_vw_1955 on December 08, 2022, 08:56:25 PM

you didn't swap anything.
bc1qyt4n4qvg86y33qfa7zts0wa8kv6ls47kmuyw5e
bc1qyt4n4qvq86y33qfa7zts0wa8kv6ls47kmuyw5e

I think that proves my point perfectly. Grin

And the exact same character swap (g -> q) is easy to spot in the seed phrase in the word "ceilinq". Additionally, you can spot an error in a seed phrase like this without having the original to compare to. With an address, if you have no original to compare to then you can't spot anything at all.

wow that was tricky! i looked at it a bunch of times and they looked identical. Shocked

I'm surprised bech32 allows both of those letters since they look so similar.

o_e_l_e_o

legendary

Activity: 2268

Merit: 18775

Quote from: larry_vw_1955 on December 08, 2022, 08:56:25 PM

then it also serves to discourage people from legitimate uses of the wordlist such as flipping a coin or rolling dice to create their seed phrase.

It doesn't, because you can use a piece of software to calculate the checksum for you. You must use a piece of software to turn your seed phrase in to private keys and addresses - doing this by hand is simply not feasible. So requiring that same piece of software to also calculate your checksum for you brings zero additional risk.

Quote from: larry_vw_1955 on December 08, 2022, 08:56:25 PM

you didn't swap anything.
bc1qyt4n4qvg86y33qfa7zts0wa8kv6ls47kmuyw5e
bc1qyt4n4qvq86y33qfa7zts0wa8kv6ls47kmuyw5e

I think that proves my point perfectly. Grin

And the exact same character swap (g -> q) is easy to spot in the seed phrase in the word "ceilinq". Additionally, you can spot an error in a seed phrase like this without having the original to compare to. With an address, if you have no original to compare to then you can't spot anything at all.

Quote from: bkelly13 on December 08, 2022, 11:36:53 PM

If I were to have my cold wallet stolen, or if I were to steal one, what is the possibility of selecting seed words at random and gaining access.

If your plan is to randomly guess the seed phrase, then you don't need to steal the hardware wallet first. Just start generating seed phrases and checking for a balance, but note that the Earth will become uninhabitable by the dying sun expanding to a red giant long before you find a collision with any wallet, let alone a specific one.

Quote from: bkelly13 on December 08, 2022, 11:36:53 PM

With a library of 2048 seeds, presuming repeats are not allowed

Repeats are allowed. Provided your seed phrase was generated properly, then there is around a 1 in 31 chance of a repeated word in a 12 word seed phrase, and a 1 in 8 chance in a 24 word seed phrase.

pooya87

legendary

Activity: 3472

Merit: 10611

Quote from: bkelly13 on December 09, 2022, 01:04:31 AM

~
Well, if I were writing code for a wallet that would be there.

That would both be pointless and a bad idea.
Nobody would use a wallet software to brute force a mnemonic so it is pointless to add such a feature.
Also it is a bad idea because it makes it harder on legitimate use cases like a normal user entering their seed phrase wrong and wanting to retry by fixing the words, typos, order, etc. Adding a delay would harm user experience.

bkelly13

member

Activity: 76

Merit: 35

Quote from: pooya87 on December 09, 2022, 12:30:22 AM

Quote from: bkelly13 on December 08, 2022, 11:36:53 PM

A search stated that there are 2048 seeds in the library.

...
There are no such mechanisms in wallets not to mention that you could always write a script that searches the space on its own without needing the overhead of a wallet software. It goes without saying that it is a pointless code to write.

Well, if I were writing code for a wallet that would be there.

pooya87

legendary

Activity: 3472

Merit: 10611

Quote from: bkelly13 on December 08, 2022, 11:36:53 PM

A search stated that there are 2048 seeds in the library.

You are making it too complex. A 12 word mnemonic is representing 128 bits of entropy. The chance of finding the same entropy is 1 in 2¹²⁸. And a 24 word mnemonic represents 256 bits of entropy...

Quote

I suspect that after some number of tries each cold wallet will do something like:

There are no such mechanisms in wallets not to mention that you could always write a script that searches the space on its own without needing the overhead of a wallet software. It goes without saying that it is a pointless code to write.

bkelly13

member

Activity: 76

Merit: 35

Quote from: jiamijiang on November 27, 2022, 10:03:58 AM

What are the chances of generating a valid seed phrase (or 24 mnemonic words) from the BIP39 wordlist of 2048 words?

I know the last word is a checksum generated from the first 23 words, but there's got to some % chance you correctly guess a valid working seed phrase just from manually randomly picking out 24 words...

As I read the OP the question that popped up in my mind is: If I were to have my cold wallet stolen, or if I were to steal one, what is the possibility of selecting seed words at random and gaining access. I also presume that the seed is not global, but local to each wallet and to each account on the wallet. A search stated that there are 2048 seeds in the library.

I entered some numbers in an Excel work book for this. With a library of 2048 seeds, presuming repeats are not allowed, the probably of getting the first one right is 1/2048. For the second one, divide by 2047, then by 2046, etc. By the time we get to the

11th word: 1 out of 2.58789 * 10^^36
23rd word: 1 out of 1.27862* 10^^76

That is about the size of the private key.

I suspect that after some number of tries each cold wallet will do something like:
A) delete its private key(s). Not the best, but at least the thief is not rewarded.
B) each time a bad sequence is provided slow down the response. Start with, maybe 1 second of additional time, then double the time for each attempt. It could write the number of attempts into a storage location and reset it upon getting the correct seed.

When I did a seed check with my wallet, there are a few seconds delay before it was ready for the next word. That would introduce sufficient time to deter any thief.
Is this reasonable? Or do I have a flaw in my understanding?

larry_vw_1955

sr. member

Activity: 1190

Merit: 469

Quote from: pooya87 on December 07, 2022, 10:53:25 PM

Checksum in mnemonic algorithms serve more purpose than just error detection.

hopefully so

Quote

And in Electrum they act as the version to announce the child address type that has to be derived and their derivation path.

that seems like a practical and useful use of it.

Quote from: ETFbitcoin

What exactly do you expect when the checksum only has 4-bit (for 12 words) and 8-bit (for 24 words) size?

i dont know i guess i expected the probability of a false positive to be on the order of 1 in 2^32. is that so unreasonable?

A false positive error, or false positive, is a result that indicates a given condition exists (you entered the correct seed phrase) when it does not (you actually entered the wrong seed phrase).

Quote

And that's why some wallet force their user to verify and re-enter some/all of generated words.

I'm not sure that completely solves the problem. But i guess it's better than nothing. Shocked

Quote from: o_e_l_e_o

It also serves to discourage people from just opening up the wordlist and picking 12 or 24 words they like the look of, which as we all know is an incredibly insecure way of generating a seed phrase, but is one that we see people discussing as a possibility over and over again.

then it also serves to discourage people from legitimate uses of the wordlist such as flipping a coin or rolling dice to create their seed phrase.

Quote

With a seed phrase, the words themselves serve as a sort of checksum. For example, compare these two addresses which have 1 character swapped:
Code:
bc1qyt4n4qvg86y33qfa7zts0wa8kv6ls47kmuyw5e, bc1qyt4n4qvq86y33qfa7zts0wa8kv6ls47kmuyw5e

you didn't swap anything.
bc1qyt4n4qvg86y33qfa7zts0wa8kv6ls47kmuyw5e
bc1qyt4n4qvq86y33qfa7zts0wa8kv6ls47kmuyw5e

but it would be trivial to see if you had by just lining them up like i did above.

Quote

And now look at this seed phrase, which has the exact same character swap:
Code:
decorate cactus vivid amazing endorse banana pipe train lazy viable ceilinq suffer

It is significantly easier to immediately spot the mistake in the seed phrase than it is in the address.

i don't see any mistake or character swap. Huh

not that i'm a wordlist junkie but those words all seem correctly spelled.

NotATether

legendary

Activity: 1568

Merit: 6660

bitcoincleanup.com / bitmixlist.org

Quote from: pooya87 on December 04, 2022, 03:40:39 AM

Bech32 is not really that complicated, in a way it is simpler to implement since it uses a multiple of 2 (32 as opposed to 58) so there is no need for an external library or a class like BigInteger for its computation.

You can convert to Hexadecimal and Base58 without using bignum division, just use base-58 logarithms:

Normally:

- Take hex number to convert and calculate the log58 and call it X
- Calculate log58(16)
- While X > 0:
-- Subtract X - log58(16) (equivalent to log58(hex/16))
-- Calculate 58^result (edit), store integer part in A, store fractional part in B
-- Multiply B by 58, this is your next base58 character, push it to the front of the Base58 string after lookup
-- Set X = log58 A

But the log58 can be converted into a log2 following a division by log2(58). Ie log58 X = log2(X)/log2(58).

And the nice using about base2 is that it can be optimized to use bit shifts. So an exponentiation is just a right-shift, and there's an extremely optimized log2 for Linux (and with some ASM mods, Windows and Mac) on this link: https://stackoverflow.com/questions/11376288/fast-computing-of-log2-for-64-bit-integers/11376759#11376759 (second answer).

You would need an array of uint64_t pointers to store all those bits without using a bignum, but there is hardly any performance penalty for doing that (as opposed to, eg. vector or lists).

Topic: Randomly picking 24 words from the BIP39 wordlist (Read 875 times)