Author

Topic: Typo-tolerant base58 for Bitcoin clients (Read 1414 times)

legendary
Activity: 2576
Merit: 1186
October 05, 2011, 01:40:13 PM
#14
Bitcoin shouldn't accept them automatically because people will then use the alternative characters in vanity addresses, and old clients (including many sites that verify addresses without using Bitcoin) won't be able to accept these "new" addresses.
Well, that's the point of the final poll option-- that in a few years, these could be adopted as valid.
administrator
Activity: 5222
Merit: 13032
October 05, 2011, 01:33:11 PM
#13
Bitcoin shouldn't accept them automatically because people will then use the alternative characters in vanity addresses, and old clients (including many sites that verify addresses without using Bitcoin) won't be able to accept these "new" addresses.
vip
Activity: 1386
Merit: 1140
The Casascius 1oz 10BTC Silver Round (w/ Gold B)
October 05, 2011, 12:08:12 PM
#12

I am guessing that you virtually engineered these addresses to be this close to one another with a brute-force search, and that you can't spend the bitcents sent to either of them.

With 32 bits of error correction, one in 4.2 billion errors is going to slip through the cracks.  Of course, if you make a program that is willing to try 4.2 billion (or more) times to find one, it's going to be able to find one.  In a practical sense, real typos that result in a valid address are going to be extremely rare, but a "did you mean" box would give the user control over that risk.

If an error correction scheme were adopted similar to the one I suggested, the odds of getting a wrong address go up considerably, mainly because the error correction algorithm will be searching a few tens of thousands of possible keys to find the right one.  But even then, 1-in-1000 odds are acceptable when the program is explicitly asking you to double-check its suggested correction.
legendary
Activity: 905
Merit: 1012
October 05, 2011, 11:50:09 AM
#11
This assumption is not safe. If a person is typing in a string they are reading from a piece of paper, when they read a "L" they have it in their heads as a letter "ell" and may type it as "l".
Which is why most base-58 codes use only 'L'. Satoshi, for whatever reason, decided to roll his own. Regardless, case *matters* for bitcoin addresses, so it is highly unlikely that someone would type 'l' for 'L', but type everything else correctly...
vip
Activity: 1386
Merit: 1140
The Casascius 1oz 10BTC Silver Round (w/ Gold B)
October 05, 2011, 11:45:37 AM
#10
The Base58check coding scheme used in Bitcoin has enough redundancy (32 bits) that correcting simple typos is a viable option.  A "did you mean _____?" prompt with the differences highlighted/bold (similar to what Google does if it thinks there's a typo in your search) would be a good idea.  I can't think of any good reason why not to offer this.

Error correction would involve a simple brute force search that exhausted all possibilities within the space of likely simple typos, including (for example:) one letter changed, two adjacent letters transposed, capitalization reversed on 1 to n letters (n=3?), and insertions/deletions of one character.

An invalid character in "IlO0" could be simply substituted with the most similar looking character before proceeding with the remaining steps.  If a 0 was really a Q, the brute force search will catch it.

I would mainly suggest only offering this in the UI.  The RPC interface should never auto-correct, at most it should offer a correction suggestion in its failure message.
sr. member
Activity: 416
Merit: 277
October 05, 2011, 11:41:35 AM
#9
It always uses "1", because "l" looks like "1", not "L". I'm assuming the error is in reading it off some print.

This assumption is not safe. If a person is typing in a string they are reading from a piece of paper, when they read a "L" they have it in their heads as a letter "ell" and may type it as "l".


When you see "O", how do you choose whether to change it to "0" or "o" and why?
"0" is not valid. Both "O" and "0" are interpreted as "o".
"O" and "0" could be interepreted as the valid "Q" though....

ByteCoin
legendary
Activity: 905
Merit: 1012
October 05, 2011, 11:40:37 AM
#8
I voted no but send people a warning. If you've ever used an iphone you'd understand how retarded autocorrect is.   Tongue
There's no reason to tell the user they're an idiot and can't read properly Wink

Thanks for the pull, Luke. That was on my list to fix as well.
legendary
Activity: 2576
Merit: 1186
October 05, 2011, 11:28:10 AM
#7
When you see "l", how do you choose whether to change it to "L" or "1" and why?
It always uses "1", because "l" looks like "1", not "L". I'm assuming the error is in reading it off some print.

When you see "O", how do you choose whether to change it to "0" or "o" and why?
"0" is not valid. Both "O" and "0" are interpreted as "o".
sr. member
Activity: 416
Merit: 277
October 05, 2011, 11:26:25 AM
#6
If the correction function only results in the replacement of an invalid character with the corresponding valid character then the problem is not relevant.
This is correct.

Ok.

When you see "l", how do you choose whether to change it to "L" or "1" and why?

ByteCoin
legendary
Activity: 2576
Merit: 1186
October 05, 2011, 11:10:49 AM
#5
If the correction function only results in the replacement of an invalid character with the corresponding valid character then the problem is not relevant.
This is correct.
sr. member
Activity: 416
Merit: 277
October 05, 2011, 11:04:48 AM
#4
It does not make guesses that could be wrong. There is a checksum to prevent that.

This is impossible in general as the checksum does not guarantee the invalidity of an address that differs by the smallest amount from a valid address.

For instance
https://blockexplorer.com/address/1ByteCoinAddressesMatch1kpCWNXmHKW
https://blockexplorer.com/address/1ByteCoinAddressesMatch1kpCxNXmHKW
are both valid addresses even though they differ by only one character.

Similar collisions can be found for omitting one character.

If a new address type is introduced, I suggest the adoption of a shorter error detecting code which provides some guarantees.

If the correction function only results in the replacement of an invalid character with the corresponding valid character then the problem is not relevant. However, if the address contains a lowercase "L" then there's some uncertainty whether it should be converted into "1" or "L" and potentially both might result in valid addresses. I suppose the validity of both interpretations could be checked but this would complicate matters and finding a full set of test cases for the code would not be trivial.

ByteCoin
legendary
Activity: 2576
Merit: 1186
October 05, 2011, 10:48:00 AM
#3
I voted no but send people a warning. If you've ever used an iphone you'd understand how retarded autocorrect is.   Tongue
That doesn't apply here. If the correction is wrong, it still won't work. That's what the checksum is for.
hero member
Activity: 686
Merit: 500
Wat
October 05, 2011, 10:28:20 AM
#2
I voted no but send people a warning. If you've ever used an iphone you'd understand how retarded autocorrect is.   Tongue

legendary
Activity: 2576
Merit: 1186
October 05, 2011, 10:23:08 AM
#1
One of my pull requests for bitcoin{d,-qt} makes it tolerate typo'd base58 data-- specifically, zero and uppercase 'o' are treated as a lowercase 'o'; and lowercase 'L', pipe, and exclamation point are treated as a one.

Gavin thinks such a change needs a consensus, so please vote. Smiley

Edit: Please note: this would only tolerate typos when it is certain to be correct. It does not make guesses that could be wrong. There is a checksum to prevent that.
Jump to: