Author

Topic: Need some feedback on a search space limiter option (Read 186 times)

legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
I have to test it but I think it could be simplified by using the Levenshtein distance to find words that are "similar". That is rather straight forward and fast.

My bip39validator project has a levenshtein distance checker if you are interested in copying the algo (it's MIT-licensed and on Github).

But you should probably exclude words that are not similar in such a way that there is a *contiguous group of chars* (more than 1 char) that appear in both words. I'm not sure how you would adapt LD for that.
legendary
Activity: 1042
Merit: 2805
Bitcoin and C♯ Enthusiast
I think that's enough implementing for now so I'll release the new version very soon. I just have to cross some tees and dot some eyes. I ended up embedding the search space inside each option's window instead of a separate window. It's just under an "advanced" expander.
I ended up adding similar letters for Base58 option too which is a hard-coded list of characters I think look similar. You can see the whole list here and feel free to add to them. Some may seem silly or impossible to look alike but people always have the option to remove them from the list. Here is a preview:


And of course it also exists for BIP39 words but it works best for English words due to the very low threshold I use which is hard-coded and I didn't want to complicated this feature so I didn't add any option to the UI for user to change it.
However, if the threshold is increased by modifying the code, it could also work for other languages like French in the preview below. But as you can see the result starts being less accurate.
legendary
Activity: 1042
Merit: 2805
Bitcoin and C♯ Enthusiast
this would be a huge amount of work I guess to go through the word list 2048 times and pick out similar looking words.
I have to test it but I think it could be simplified by using the Levenshtein distance to find words that are "similar". That is rather straight forward and fast.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
A really great option would be a feature to add words "similar to" a given word. So if I entered the word "air", then it would add air, aim, arm, art, chair, hair, pair, liar, affair, repair, unfair, etc., but this would be a huge amount of work I guess to go through the word list 2048 times and pick out similar looking words.
If you take all words that contain at least 2 out of 3 characters (from the word "air"), you get a list of 633 words:
Code:
ability
absorb
abstract
absurd
accident
achieve
acid
acoustic
acquire
across
action
actor
actress
addict
address
admit
advice
aerobic
affair
afford
afraid
again
agree
aim
air
airport
aisle
alarm
alert
alien
already
alter
amateur
amazing
anchor
ancient
anger
angry
animal
another
answer
antique
anxiety
apart
appear
approve
april
arch
arctic
area
arena
argue
arm
armed
armor
army
around
arrange
arrest
arrive
arrow
art
artefact
artist
artwork
assist
attitude
attract
auction
audit
author
average
avoid
aware
awkward
axis
bachelor
banner
bar
barely
bargain
barrel
basic
betray
bird
birth
bitter
board
boring
bracket
brain
brand
brass
brave
bread
brick
bridge
brief
bright
bring
brisk
broccoli
cabin
camera
capital
captain
car
carbon
card
cargo
carpet
carry
cart
casino
category
caution
cereal
certain
chair
champion
chapter
charge
chronic
cigar
cinnamon
circle
claim
clarify
confirm
consider
coral
crack
cradle
craft
cram
crane
crash
crater
crawl
crazy
cream
credit
cricket
crime
crisp
critic
crucial
cruise
crystal
cupboard
curious
curtain
danger
daring
daughter
debris
decorate
decrease
deliver
denial
depart
derive
describe
despair
detail
diagram
dial
diamond
diary
differ
digital
dilemma
dinner
dinosaur
direct
dirt
disagree
discover
disease
disorder
display
distance
divert
divorce
domain
draft
dragon
drama
drastic
draw
dream
drift
drill
drink
drip
drive
during
dwarf
dynamic
eager
early
earn
earth
easily
either
electric
elevator
embark
embrace
enrich
entire
era
erase
erosion
eternal
exercise
expire
explain
extra
fabric
faint
faith
family
farm
fashion
father
fatigue
favorite
feature
february
federal
festival
fiber
figure
filter
final
finger
fire
firm
first
fiscal
flavor
forward
fragile
frame
friend
fringe
fruit
furnace
gain
gallery
garage
garbage
garden
garlic
garment
gather
general
giant
ginger
giraffe
girl
glare
gorilla
grab
grace
grain
grant
grape
grass
gravity
great
grid
grief
grit
guard
guitar
habit
hair
hammer
hamster
harbor
hard
harsh
harvest
hazard
heart
hire
history
holiday
hospital
hybrid
idea
ignore
illegal
image
imitate
impact
improve
increase
indicate
indoor
industry
infant
inform
inhale
inherit
initial
injury
inmate
inner
inquiry
insane
inspire
install
intact
interest
iron
island
isolate
ivory
jaguar
jar
junior
kangaroo
labor
ladder
large
later
latin
laundry
lawsuit
layer
leader
learn
leisure
leopard
liar
liberty
library
lizard
lunar
lyrics
machine
magic
maid
mail
main
major
mansion
marble
march
margin
marine
market
marriage
master
material
matrix
matter
maximum
measure
mechanic
media
merit
minor
miracle
mirror
misery
mistake
mixture
monitor
moral
morning
mountain
naive
napkin
narrow
nation
nature
near
negative
neither
neutral
normal
nuclear
obtain
opera
orange
orbit
orchard
ordinary
organ
orient
original
orphan
ostrich
pair
panic
panther
paper
parade
parent
park
parrot
party
patient
patrol
pattern
pear
pelican
permit
phrase
physical
piano
picture
pioneer
pizza
plastic
polar
popular
portion
practice
praise
predict
prepare
price
pride
primary
print
priority
prison
private
prize
profit
program
provide
purchase
purity
pyramid
quality
quarter
rabbit
raccoon
race
rack
radar
radio
rail
rain
raise
rally
ramp
ranch
random
range
rapid
rare
rate
rather
raven
raw
razor
ready
real
reason
rebuild
recall
receive
recipe
region
regular
relax
release
relief
remain
remind
repair
repeat
replace
require
resist
retire
retreat
reunion
reveal
review
reward
rib
ribbon
rice
rich
ride
ridge
rifle
right
rigid
ring
riot
ripple
risk
ritual
rival
river
road
roast
romance
rookie
rotate
royal
runway
rural
sail
satisfy
satoshi
scare
scatter
scissors
scorpion
scrap
script
search
security
seminar
senior
series
service
share
sheriff
shiver
shrimp
silver
similar
siren
sister
situate
skirt
smart
social
solar
soldier
spare
spatial
special
spider
spirit
spray
spread
spring
square
squirrel
stadium
stairs
start
strategy
strike
sugar
surface
surprise
sustain
swarm
swear
tail
target
taxi
thrive
tiger
timber
tired
tornado
tortoise
tourist
toward
track
trade
traffic
tragic
train
transfer
trap
trash
travel
tray
treat
trial
tribe
trick
trigger
trim
trip
typical
umbrella
unaware
unfair
uniform
universe
upgrade
urban
valid
vanish
vapor
various
verify
version
veteran
viable
vibrant
victory
village
vintage
virtual
virus
visa
visual
vital
wait
warfare
warm
warrior
water
wear
weather
weird
whisper
winner
winter
wire
wrap
wrist
write
yard
year
zebra
If you limit this list to words of no more than 4 characters, you end up with 75:
Code:
acid
aim
air
arch
area
arm
army
art
axis
bar
bird
car
card
cart
cram
dial
dirt
draw
drip
earn
era
farm
fire
firm
gain
girl
grab
grid
grit
hair
hard
hire
idea
iron
jar
liar
maid
mail
main
near
pair
park
pear
race
rack
rail
rain
ramp
rare
rate
raw
real
rib
rice
rich
ride
ring
riot
risk
road
sail
tail
taxi
trap
tray
trim
trip
visa
wait
warm
wear
wire
wrap
yard
year
Even though not many words are really similar from a human perspective, it's easy to make this list.
legendary
Activity: 2268
Merit: 18711
Q 1. What do you think of how it looks and the process?
Looks good and straightforward to use. Great addition to your software.

When you hit "Add all" it might be better to simply have a single line which says "All 2048 BIP39 words" or something similar, rather than list all 2048 words, so the user can easily confirm it is indeed searching all possible words and they haven't accidentally deleted some?

Q 2. Is there any other option you think I should add?
Maybe an option to combine those two. For example, words which both start with "d" and end in "e", in case the middle of the word is illegible?

A really great option would be a feature to add words "similar to" a given word. So if I entered the word "air", then it would add air, aim, arm, art, chair, hair, pair, liar, affair, repair, unfair, etc., but this would be a huge amount of work I guess to go through the word list 2048 times and pick out similar looking words.
legendary
Activity: 1042
Merit: 2805
Bitcoin and C♯ Enthusiast
I'm working on a new option for FinderOuter to limit the search space in each recovery option. I'm currently working on the concept and want to know what you think.
For example the mnemonic recovery looks like the following pictures.

User clicks Start to perform some basic checks and find out how many words are missing then it creates a set of "steps" to set what words can be used in that missing place and limit the search space.

For first missing word (grace) it adds words that start with "gr":


User clicks the next button (>) to move on to the next missing word.

For second missing word (cruise) it adds words that contain letters "is":


User clicks Finish button to finish up and create a SearchSpace object to be sent to the brute force service.
As a result the search space is now limited to 976 permutations instead of 4,194,304.

User can also add custom words one by one, or add all possible words.
Q 1. What do you think of how it looks and the process?
Q 2. Is there any other option you think I should add?

Q 3. I'm also not sure whether I should add this option as a new window like the picture below shows or just extend the main window's height and add the option right there under an advanced expander (I'm leaning toward the later myself although my implementation so far is using the former).


Jump to: