Father lost his Electrum wallet, and remembers some of the words in the seed - page 2.

pooya87

legendary

Activity: 3472

Merit: 10611

Quote from: o_e_l_e_o on November 22, 2020, 04:45:29 AM

You don't need to use PBKDF2 for this. A single SHA256 hash is all that is needed to calculate the checksum. PBKDF2 is only used on the whole seed phrase as the first step towards generating private keys.

correct but checksum for Electrum mnemonics are a bit different from BIP39.
the checksum requires computation of HMAC-SHA512 of the mnemonic, if it is using English and is 12 words and since the HMAC key is a small one then it is technically two SHA512 compressions (1x inner pad + mneminic, 1x outer pad + hash of previous round).

o_e_l_e_o

legendary

Activity: 2268

Merit: 18775

Quote from: ileikmath on November 21, 2020, 07:44:41 PM

My dad gave 5 words with extreme certainty were part of the seed, he gave me 5 more words he said were probably in the seed. I made no assumptions as to which index those words should be placed in the seed. That leaves 2 wild cards still, up to 7 (scary case, haha)

This is probably an impossible task, then, if your father does not know the order of the words.

Unscrambling a 12 word seed phrase in which all the words are known has 12! possibilities = 479,001,600
However, at least two of your words are unknown, so there are 2,048 possibilities for each of those 2 words.
This would put the number of possibilities at 12!*2048*2048 = 2,009,078,326,886,400.

Even splitting this work up between all your available core, best case scenario you are still looking at several years of non-stop computing to crack the seed phrase. If there is any doubt as to the 5 "probable" words, then there is no point in even trying.

Quote from: NotATether on November 21, 2020, 07:05:21 PM

If the last word is known, you still have to pass the mnemonic through PBKDF2 and then get the first 32 bytes of the SHA256 hash of each BIP39 seed (entropy) of the 2048*2048*66 combinations I mentioned in my previous post, because there is no way to see if the checksum is correct without knowing the SHA256 hash.

You don't need to use PBKDF2 for this. A single SHA256 hash is all that is needed to calculate the checksum. PBKDF2 is only used on the whole seed phrase as the first step towards generating private keys.

pooya87

legendary

Activity: 3472

Merit: 10611

Quote from: ileikmath on November 21, 2020, 07:44:41 PM

My dad gave 5 words with extreme certainty were part of the seed, he gave me 5 more words he said were probably in the seed. I made no assumptions as to which index those words should be placed in the seed. That leaves 2 wild cards still, up to 7 (scary case, haha)

it sounds like the order and position of the 10 words you have is also unknown, which means the number of combinations to check is even larger than initially speculated and it makes it impossible to recover.
do you know how he is recovering these words? for example is it from memory or was it written down or is it a file? there could be other options to recover in different cases which may be easier to look into than trying to brute force what you have at this stage.

HCP

legendary

Activity: 2086

Merit: 4363

Obviously, the total number of possibilities with 2 missing words is a lot less than the total search space of 12 word seeds... I was simply pointing out that "12 factorial" was not the correct starting point.

Quote from: bitmover on November 21, 2020, 06:33:17 PM

Is it possible to reduce the dictionary attack range by brute forcing only combinations that would pass the checksum requirements? Maybe someone already made a program to do that?
It would narrow a bit the possibilities.

Most of the utilities simply create the combinations dynamically, rather than generating a dictionary of every possible combination first... which is rather wasteful, given it might take only a relatively small number to find the correct one.

So basically, the idea is that you create a given combination, calculate if the checksum is valid and discard the seed mnemonic if it fails... otherwise move on to the more computationally intensive task of actually deriving keys/addresses etc.

At the end of the day... trying to find 2 missing words is actually relatively trivial, regardless of whether the position of the missing words is known, but only if the 10 you already have are in the correct order. A script should be able to find the correct combination of 12 words in a relatively short amount of time.

However, 7 words is going to be, for all intents and purposes, "impossible"... you'd be looking at a timeframe measured in centuries or thousands of years Undecided

ileikmath

newbie

Activity: 7

Merit: 4

Thanks so much for the help so far everyone!

Quote

Does your father know the location of the two missing words? That would cut down the search time massively.

Probably not, I asked a few hours ago but he still hasn't responded. I doubt it

Quote

Does your father know the master public key or some of the early addresses which were generated by the seed phrase?

I doubt it, what's the relation to a master public key to an address? Just finished cross referencing electrum statements with emails and texts, I found the address I believe he owns. It still has an unspent txo fortunately. I'll get started on that soon.

Quote

...so there are actually 276824064 ways to insert two missing words into a 10 word phrase.

Ahh, probability theory has humbled me again... Thanks!

My dad gave 5 words with extreme certainty were part of the seed, he gave me 5 more words he said were probably in the seed. I made no assumptions as to which index those words should be placed in the seed. That leaves 2 wild cards still, up to 7 (scary case, haha)

Again, thanks to everyone who has helped, I think I have enough information now to run the btcrecover tool. I'll update if it works

NotATether

legendary

Activity: 1568

Merit: 6660

bitcoincleanup.com / bitmixlist.org

Quote from: bitmover on November 21, 2020, 06:33:17 PM

Quote from: HCP on November 21, 2020, 06:00:55 PM

Unfortunately, you are wrong... a 12 word seed has 2048^12 possibilities => 5444517870735015415413993718908291383296 total possibilities (the actual number of valid possibilities is a bit less due to the checksum requirements etc).

Is it possible to reduce the dictionary attack range by brute forcing only combinations that would pass the checksum requirements? Maybe someone already made a program to do that?
It would narrow a bit the possibilities.

It's a 12 word phrase so there are 4 bits of checksum at the end, that means only 512 * 2048 * 66 valid combinations to search in if the last word is the one lost. The remaining 4 bits can be derived by constructing the checksum.

If the last word is known, you still have to pass the mnemonic through PBKDF2 and then get the first 32 bytes of the SHA256 hash of each BIP39 seed (entropy) of the 2048*2048*66 combinations I mentioned in my previous post, because there is no way to see if the checksum is correct without knowing the SHA256 hash. You can however skip the heavy computation that comes after it for invalid phrases, namely looking for its master public key or address given as input.

bitmover

legendary

Activity: 2352

Merit: 6089

bitcoindata.science

Quote from: ileikmath on November 21, 2020, 03:31:42 PM

Before I go any further, I wanted to check if there was a smarter way of doing this kind of dictionary recovery attack.

Quote from: HCP on November 21, 2020, 06:00:55 PM

Unfortunately, you are wrong... a 12 word seed has 2048^12 possibilities => 5444517870735015415413993718908291383296 total possibilities (the actual number of valid possibilities is a bit less due to the checksum requirements etc).

Is it possible to reduce the dictionary attack range by brute forcing only combinations that would pass the checksum requirements? Maybe someone already made a program to do that?
It would narrow a bit the possibilities.

HCP

legendary

Activity: 2086

Merit: 4363

Quote from: ileikmath on November 21, 2020, 03:31:42 PM

Correct me if i'm wrong, but a 12 word seed has 12 factorial (479001600) possibilities, and since i'm missing two of those words, that leaves the dictionary size squared as roughly 4 million times factor.

Unfortunately, you are wrong... a 12 word seed has 2048^12 possibilities => 5444517870735015415413993718908291383296 total possibilities (the actual number of valid possibilities is a bit less due to the checksum requirements etc).

Quote from: NotATether on November 21, 2020, 04:51:53 PM

Quote from: o_e_l_e_o on November 21, 2020, 04:27:10 PM

It also has multi-device support, so you can split the work across all your available cores.

How is this activated? I couldn't find any command line option for specifying IP addresses or servers except for --worker, and that doesn't look like you can pass different server IPs to it.

The multiple devices don't talk to each other... you provide the --worker argument as "n/m" (ie. --worker 1/5) on each worker and it will divide the password search space up into "m" chunks and assign chunk "n" to that particular worker.

If you watch the demo of using Vast.ai with multiple instances here: https://www.youtube.com/watch?v=8Zqc-2Te3zQ&list=PL7rfJxwogDzmd1IanPrmlTg3ewAIq-BZJ&index=13&t=1220s

You can see him say "they don't communicate at all" and some of the other workers are still running, even when one had found the password.

The written instructions for this demo is here: https://github.com/3rdIteration/btcrecover/blob/master/docs/Usage_Examples/2020-10-06_Multi-GPU_with_vastai/Multi-GPU_with_vastai.md

NotATether

legendary

Activity: 1568

Merit: 6660

bitcoincleanup.com / bitmixlist.org

Specifically, you need the seedrecover.py script included in this repo. Follow the instructions at https://github.com/3rdIteration/btcrecover/blob/master/docs/Seedrecover_Quick_Start_Guide.md . Make sure you also pip install groestlcoin-hash, and apt install python3-pyopencl for GPU support. If the clusters are running Windows I think you can just install numpy and pyopencl from pip. Then pass --enable-opencl as an argument to seedrecover.py to use the GPUs.

Then give seedrecover.py an address (and how deep it was listed in the electrum window which they call "how many addresses were generated before it"). If you have the master public key use that instead because it's quicker.

Then insert all the words in the seed phrase you remember, in the correct order. You should probably run this on the Tesla node if it's the fastest one.

Quote from: o_e_l_e_o on November 21, 2020, 04:27:10 PM

It also has multi-device support, so you can split the work across all your available cores.

How is this activated? I couldn't find any command line option for specifying IP addresses or servers except for --worker, and that doesn't look like you can pass different server IPs to it.

Quote from: HCP on November 21, 2020, 06:00:55 PM

Quote from: ileikmath on November 21, 2020, 03:31:42 PM

Correct me if i'm wrong, but a 12 word seed has 12 factorial (479001600) possibilities, and since i'm missing two of those words, that leaves the dictionary size squared as roughly 4 million times factor.

Unfortunately, you are wrong... a 12 word seed has 2048^12 possibilities => 5444517870735015415413993718908291383296 total possibilities (the actual number of valid possibilities is a bit less due to the checksum requirements etc).

But the number of combinations for a twelve word seed with two missing words is 2048**2 = 4194304, but depending on which words were forgotten (they could be in the middle of the phrase, not necessarily at the end, or next to each other), we have to multiply this by 12 nCr 2 = 66, so there are actually 276824064 ways to insert two missing words into a 10 word phrase.

Of course this number is less because of checksums, and even less if the last word is one of the words forgotten, maybe OP can clarify if this is the case.

o_e_l_e_o

legendary

Activity: 2268

Merit: 18775

Does your father know the location of the two missing words? That would cut down the search time massively.

Does your father know the master public key or some of the early addresses which were generated by the seed phrase? If he does, then the quickest way to do this is going to be to use this piece of software: https://github.com/3rdIteration/btcrecover/

It also has multi-device support, so you can split the work across all your available cores.

ileikmath

newbie

Activity: 7

Merit: 4

Hello all, and thanks to any who are able to help, my father remembers most of the words (say m=10 for example), and I'm fairly certain it was a 12 word seed.

Correct me if i'm wrong, but a 12 word seed has 12 factorial (479001600) possibilities, and since i'm missing two of those words, that leaves the dictionary size squared as roughly 4 million times factor.

I'm familiar with python, and thankfully electrum uses a pretty capable python console. But just generating all permutations killed my program. I redid it in Haskell (NOT A PRO AT HASKELL tho I love what little I know) and was able to generate ~~50GB list of all permutations in 33 minutes, but still need the 4 million substitutions of words in the dictionary so my plan of just having a text file containing all possible phrase ideas and having python run through that is seemingly less feasible.

I'm familiar with multithreading, tho in C, not python. and have access to a large computer cluster if need be (~~44 CPU cores in one node, 24 cores in the GPU node w/ 4xTesla, and another 48 cores on an AMD node)

Before I go any further, I wanted to check if there was a smarter way of doing this kind of dictionary recovery attack.

Please and thank you for any time spent helping

Topic: Father lost his Electrum wallet, and remembers some of the words in the seed - page 2. (Read 485 times)