Pages:
Author

Topic: Bruteforce partial electrum seed words (Read 378 times)

hero member
Activity: 2702
Merit: 716
Nothing lasts forever
February 11, 2021, 12:38:29 PM
#29
what you are describing is the definition of parallelism and is used by any modern CPU that has more than one core. you don't need to write or run different programs, you just write once but split the workload between multiple CPU cores and each core performs the same task on a different data but separately and in "parallel".
CPUs can have between 2 to 64 cores.

Thats great, I'll read more kn how to execute a set of instructions on a particular core and how to divide the load in % to different cores. If you have any leads let me know.

Quote
with that in mind i don't think it is hard to recover a seed with missing 4 words.

It actually depends on various factors whether it's hard to recover the missing words or not. For instance

if you know the sequence then the possible combinations for entropy of 128 bit is
2048*12 (for one word)
2048*2048*2048*2048*12 (for 4 words)

If you dont know the sequence
2048*12! (for one word)
2048*2048*2048*2048*12! (for 4 words)
where ! is factorial

and that is too many combinations. Also, if the entropy is higher the number of combinations drastically increases.
Someone correct me if I am wrong.
legendary
Activity: 2128
Merit: 1293
There is trouble abrewing
February 11, 2021, 12:26:44 PM
#28
Whenever I am in such a situation where I have to perform the loops faster I do the below thing. Let me know if it can work well in this scenario as well.

So let's consider that 4 words will take 964 days according to you. So when we write a program to bruteforce the seed one program will take 964 days.
Now let's restrict that program to limited amount of combination and create another program with another set of combinations such that both programs have different set of combinations and are executed in parallel.
In this way both will be executed simultaneously and we will get the results in half of it's time meaning 964/2 = 482 days.

If we repeat this and create 10 such programs then the speed of results will be 10 times faster. Even if we don't run it in the same machine, we can run them in different machines.
So 964/10 = 96.4 days.

How do you find the above approach ?

what you are describing is the definition of parallelism and is used by any modern CPU that has more than one core. you don't need to write or run different programs, you just write once but split the workload between multiple CPU cores and each core performs the same task on a different data but separately and in "parallel".
CPUs can have between 2 to 64 cores.

GPUs have hundreds of cores and can run a lot more operations in parallel which is why using GPU in brute forcing is very common and the speed gain is huge.

with that in mind i don't think it is hard to recover a seed with missing 4 words.
hero member
Activity: 2702
Merit: 716
Nothing lasts forever
February 11, 2021, 11:23:36 AM
#27
I know it's possible bruteforce 4 missing words
Are you sure that this is possible? I know that for two electrum missing words, it can take around 20 seconds on an average pc. For three words it'll take 20*2048 = 40960 seconds which is equal with ~11.3 hours. But for 4 words... Oh boy. It'll take around 23,142 hours which is 964 days.

Whenever I am in such a situation where I have to perform the loops faster I do the below thing. Let me know if it can work well in this scenario as well.

So let's consider that 4 words will take 964 days according to you. So when we write a program to bruteforce the seed one program will take 964 days.
Now let's restrict that program to limited amount of combination and create another program with another set of combinations such that both programs have different set of combinations and are executed in parallel.
In this way both will be executed simultaneously and we will get the results in half of it's time meaning 964/2 = 482 days.

If we repeat this and create 10 such programs then the speed of results will be 10 times faster. Even if we don't run it in the same machine, we can run them in different machines.
So 964/10 = 96.4 days.

How do you find the above approach ?

Let's say that I don't know any of the words, but for 12 words (out of 13) I know the last one or two letters, and for two words I know the first letter.
I also know their order and even some of their length.
You can surely reduce it, by a lot. But still, brute forcing by not knowing 4 out of 12 words isn't meant to be found.

Well that's very true. Bruteforcing 4 words is completely waste of time unless we know a possible combination of words and the sequence in which those words might have been placed.
Otherwise it's completely a waster of time trying out so many combinations.
legendary
Activity: 952
Merit: 1385
February 11, 2021, 11:04:35 AM
#26
Edit:
I have 13 text files with possible seed words (1.txt, 2.txt...13.txt). most of the files contain 4-10 words.
I'm looking for a way to brute-force the seed based on those text files.

I know it's possible bruteforce 4 missing words, but how about the following scenario:

Let's say that I don't know any of the words, but for 12 words (out of 13) I know the last one or two letters, and for two words I know the first letter.
I also know their order and even some of their length.

Is there a known tool that I can play with to solve such puzzle?

Do you know the addresses you are looking for?
I think I could change my https://github.com/PawelGorny/lostword to implement solver of your problem, but it all makes no sense if you do not know what you are looking for.

Yes, I know the public address.

OK, I will try to do this tomorrow.

Did you have time to do that?
Thanks!


Yes, I have prepared a new release: https://github.com/PawelGorny/lostword/releases/tag/v0.8.0
Now it works with a candidates on the given positions, for example:
Code:
POOL
bc1q0v5q36eaculyrykjnjsyuey6ctd3802ft4jdcc
6
brother
window master canal cat
?
black master remove cat
pitch
hill
legendary
Activity: 1042
Merit: 2805
Bitcoin and C♯ Enthusiast
February 11, 2021, 04:14:08 AM
#25
~3secs for 10,000... or ~200,000/minute... just goes to show how "expensive" the "ENT to seed" process can be! Shocked Wink
FWIW if I comment out the mnemonic to BIP32 seed part and just validate the checksums of each permutation in FinderOuter, it can perform ~7.2 million checks per second on my corei3 CPU.

I should point out that the reason why the overall mnemonic recovery option is slow is because the algorithm that comes after checksum validation is expensive (to derive the child keys), and also there is an issue with my implementation of Elliptic Curve Cryptography that puts a lot of pressure on GC slowing down the process.
newbie
Activity: 18
Merit: 1
February 11, 2021, 03:30:58 AM
#24
Edit:
I have 13 text files with possible seed words (1.txt, 2.txt...13.txt). most of the files contain 4-10 words.
I'm looking for a way to brute-force the seed based on those text files.



I know it's possible bruteforce 4 missing words, but how about the following scenario:

Let's say that I don't know any of the words, but for 12 words (out of 13) I know the last one or two letters, and for two words I know the first letter.
I also know their order and even some of their length.

Is there a known tool that I can play with to solve such puzzle?

I couldn't understand the problem. Say you have 7.txt with 10 seed words. You just need two more words to make 12 word seed. After lotta combination and permutations and filtering bad seeds and getting final address and looking up on blockchain for utxo. Is that what you are doing?

There are couple of questions
1. Is each file corresponds to one wallet?
2. Do you know the positions of missing seed words in each file?
3. Are these 12 word mnemonic seeds or 15 or 24? (If they are 15 words or more and your answer to 2 is no then you can lose all hopes to crack them, most probably.)

If you can answer that, they it's kinda easy to comeup with bruteforce solutions...
HCP
legendary
Activity: 2086
Merit: 4361
February 11, 2021, 03:27:06 AM
#23
OOPS... I had accidentally left in the "to_seed()" call... which takes the mnemonic and generates the actual seed... including the computationally expensive code:
Code:
stretched = hashlib.pbkdf2_hmac(
    "sha512", mnemonic_bytes, passphrase_bytes, PBKDF2_ROUNDS
)

If I comment that part out... it runs "a little bit" faster Wink Roll Eyes

Quote
Start: 2021-02-11 21:13:07.520000
10000 Seeds 2021-02-11 21:13:10.346000
621 Valid Seeds 2021-02-11 21:13:10.346000
20000 Seeds 2021-02-11 21:13:13.092000
1204 Valid Seeds 2021-02-11 21:13:13.092000
30000 Seeds 2021-02-11 21:13:15.903000
1832 Valid Seeds 2021-02-11 21:13:15.903000
40000 Seeds 2021-02-11 21:13:18.769000
2442 Valid Seeds 2021-02-11 21:13:18.769000
50000 Seeds 2021-02-11 21:13:21.671000
3041 Valid Seeds 2021-02-11 21:13:21.671000
...

~3secs for 10,000... or ~200,000/minute... just goes to show how "expensive" the "ENT to seed" process can be! Shocked Wink
legendary
Activity: 3472
Merit: 10611
February 10, 2021, 11:04:47 PM
#22
The "library" I'm using for testing the validity of the mnemonic itself should not be too bad, it's the "to_entropy()" function from the Trezor "python-mnemonic" code: https://github.com/trezor/python-mnemonic/blob/master/mnemonic/mnemonic.py#L125
That's the problem with using a library to brute force, they are not designed to be the most efficient. They are just optimized to perform one operation in reasonable time. From a quick look this part is being repeated pointlessly and it has obvious expensive processes such as the memory lookup even if it is a binary search, the conversion between the words and bits, working with strings, unnecessary checks,...
https://github.com/trezor/python-mnemonic/blob/master/mnemonic/mnemonic.py#L126-L156

What you want to do is to first convert all your words to their binary representation without having any memory issues.
13 file * 10 word each * 4 byte to hold bits = 520 byte total
Now you create the permutation of these bits and compute their hashes. Working with integers is significantly faster than working with strings.
HCP
legendary
Activity: 2086
Merit: 4361
February 10, 2021, 07:02:21 PM
#21
f you are ONLY validating the checksum of each permutation then your speed must be in the millions per second rate. 10k is way too low for that because essentially you are just computing SHA256 (for BIP39) or SHA512 (for Electrum) and your CPU is capable of computing millions per second of these hashes.
Which would probably be true if you had a pre-existing list of all the seed word permutations and/or seed value...

But I ran into memory issues attempting to pre-compute everything, so it has to generate each permutation, then attempt to convert it to a seed while checking the checksum etc... I'm sure the way the algorithm is generating the seed permuations is probably pretty poor. It's basically just iterating some lists.

The "library" I'm using for testing the validity of the mnemonic itself should not be too bad, it's the "to_entropy()" function from the Trezor "python-mnemonic" code: https://github.com/trezor/python-mnemonic/blob/master/mnemonic/mnemonic.py#L125

But I'm sure there are probably ways to make that more efficient.


For what it's worth... the script finished overnight:
Quote
...
67100000 Seeds 2021-02-11 10:22:16.525000
1049432 Valid Seeds 2021-02-11 10:22:16.525000
End: 2021-02-11 10:22:17.004000
Total: 21:02:31.711000
21hrs and "found" 1,049,432 valid seeds Tongue


interestingly... the last 12.5 million or so seed permutations didn't create any valid seeds:
Quote
...
54520000 Seeds 2021-02-11 10:10:36.431000
1049059 Valid Seeds 2021-02-11 10:10:36.431000
54530000 Seeds 2021-02-11 10:11:02.463000
1049432 Valid Seeds 2021-02-11 10:11:02.463000
54540000 Seeds 2021-02-11 10:11:02.990000
1049432 Valid Seeds 2021-02-11 10:11:02.990000
54550000 Seeds 2021-02-11 10:11:03.514000
1049432 Valid Seeds 2021-02-11 10:11:03.514000
54560000 Seeds 2021-02-11 10:11:04.035000
...
I guess the limited number of words available in position 12 just didn't make any valid checksums with the other available words!
newbie
Activity: 47
Merit: 0
February 10, 2021, 02:17:22 PM
#20
Edit:
I have 13 text files with possible seed words (1.txt, 2.txt...13.txt). most of the files contain 4-10 words.
I'm looking for a way to brute-force the seed based on those text files.

I know it's possible bruteforce 4 missing words, but how about the following scenario:

Let's say that I don't know any of the words, but for 12 words (out of 13) I know the last one or two letters, and for two words I know the first letter.
I also know their order and even some of their length.

Is there a known tool that I can play with to solve such puzzle?

Do you know the addresses you are looking for?
I think I could change my https://github.com/PawelGorny/lostword to implement solver of your problem, but it all makes no sense if you do not know what you are looking for.

Yes, I know the public address.

OK, I will try to do this tomorrow.

Did you have time to do that?
Thanks!
newbie
Activity: 47
Merit: 0
February 10, 2021, 02:08:13 PM
#19
If I'm understanding what you're writing... you have something like:

- possible_word1(list of ~4-10 words)
- possible_word2(list of ~4-10 words)
- possible_word3(list of ~4-10 words)
...
- possible_word12(list of ~4-10 words)

and from the sounds of it, you actually 13 words? so, this is the "old" electrum seed format? Huh

It could also be a BIP39 seed with a one-word BIP38 password.  OP did not clarify if each file corresponds to a word in the seed so it's possible that the order is also unknown and this is no different from having one file full of words.

Actually, we don't even know which wordlist is used which makes a big difference if the seed phrase is for a custom wordlist (otherwise I would not see the point of having a file full of BIP39 or Electrum words  Huh)
 
With some optimisation, and maybe porting to C or something faster than Python, you'd probably gain some performance benefits... I'm sure it's in the realms of reality to be able to do it within a matter of days? Huh It really depends on how big your search space is... what are the exact number of words you have in each position? Huh

Ultra modern processors (AMD Ryzen/ anything using Zen microarchitecture and Intel ice lake 10xxx and later) have hardware accelerated SHA256 instructions which you can call from C using the __asm__ keyword: SHA256RNDS2, SHA256MSG1 and SHA256MSG2.

The seed is 13 words from the English dictionary of electrum 2.0.3 version.
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
February 10, 2021, 12:25:23 AM
#18
If I'm understanding what you're writing... you have something like:

- possible_word1(list of ~4-10 words)
- possible_word2(list of ~4-10 words)
- possible_word3(list of ~4-10 words)
...
- possible_word12(list of ~4-10 words)

and from the sounds of it, you actually 13 words? so, this is the "old" electrum seed format? Huh

It could also be a BIP39 seed with a one-word BIP38 password.  OP did not clarify if each file corresponds to a word in the seed so it's possible that the order is also unknown and this is no different from having one file full of words.

Actually, we don't even know which wordlist is used which makes a big difference if the seed phrase is for a custom wordlist (otherwise I would not see the point of having a file full of BIP39 or Electrum words  Huh)
 
With some optimisation, and maybe porting to C or something faster than Python, you'd probably gain some performance benefits... I'm sure it's in the realms of reality to be able to do it within a matter of days? Huh It really depends on how big your search space is... what are the exact number of words you have in each position? Huh

Ultra modern processors (AMD Ryzen/ anything using Zen microarchitecture and Intel ice lake 10xxx and later) have hardware accelerated SHA256 instructions which you can call from C using the __asm__ keyword: SHA256RNDS2, SHA256MSG1 and SHA256MSG2.
legendary
Activity: 3472
Merit: 10611
February 09, 2021, 11:38:32 PM
#17
So, at ~10k seeds/minute... maybe ~1677 minutes to go through the full search space (only 4 words per slot, 12 slots)... which is around 28hours. Undecided

The main slow downs will be:
- More words in a given position... that's going to ramp up the total search space
- Actually generating private keys/addresses to check, as these are computationally a lot slower than just checking if a seed is valid.

With some optimisation, and maybe porting to C or something faster than Python, you'd probably gain some performance benefits...
If you are ONLY validating the checksum of each permutation then your speed must be in the millions per second rate. 10k is way too low for that because essentially you are just computing SHA256 (for BIP39) or SHA512 (for Electrum) and your CPU is capable of computing millions per second of these hashes. Translating your code to C shouldn't change much about its speed because the slowness is most probably in the algorithm you are using not the language.
HCP
legendary
Activity: 2086
Merit: 4361
February 09, 2021, 07:35:48 PM
#16
If I'm understanding what you're writing... you have something like:

- possible_word1(list of ~4-10 words)
- possible_word2(list of ~4-10 words)
- possible_word3(list of ~4-10 words)
...
- possible_word12(list of ~4-10 words)

and from the sounds of it, you actually 13 words? so, this is the "old" electrum seed format? Huh

Anyway, assuming just 12 words for now, your possible seed combinations are: #word1_words * #word2_words * #word3_words * ... * #word12_words. Doesn't sound like much, until you do the math... even with "best case" of just 4 possible words in each position... that's 4^12 or 16,777,216 permutations.

Rough experience tells me that around ~6% of those will be "valid" (ie. the checksum matches)... so you're looking at something like ~1,006,632 "valid" seeds that you'd need to check for a match against your public address.


Anyway, I wrote a quick python script that takes in 12 files named 1.txt, 2.txt, 3.txt, ..., 12.txt (for my demo, each file has only 4 words) and then just iterates through the words in each position... it generates over 10,000 seeds/minute... and is finding (as expected) slightly more than 6% of those to be "valid".
Quote
Start: 2021-02-10 13:19:45.293000
10000 Seeds 2021-02-10 13:20:29.445000
621 Valid Seeds 2021-02-10 13:20:29.445000
20000 Seeds 2021-02-10 13:21:11.167000
1204 Valid Seeds 2021-02-10 13:21:11.167000
30000 Seeds 2021-02-10 13:21:55.523000
1832 Valid Seeds 2021-02-10 13:21:55.523000
40000 Seeds 2021-02-10 13:22:38.955000
2442 Valid Seeds 2021-02-10 13:22:38.955000
50000 Seeds 2021-02-10 13:23:23.073000
3041 Valid Seeds 2021-02-10 13:23:23.073000
60000 Seeds 2021-02-10 13:24:08.397000
3670 Valid Seeds 2021-02-10 13:24:08.397000
70000 Seeds 2021-02-10 13:24:53.846000
4308 Valid Seeds 2021-02-10 13:24:53.846000
80000 Seeds 2021-02-10 13:25:37.352000
4911 Valid Seeds 2021-02-10 13:25:37.352000
90000 Seeds 2021-02-10 13:26:23.622000
5543 Valid Seeds 2021-02-10 13:26:23.622000
100000 Seeds 2021-02-10 13:27:09.700000
6180 Valid Seeds 2021-02-10 13:27:09.700000
...

So, at ~10k seeds/minute... maybe ~1677 minutes to go through the full search space (only 4 words per slot, 12 slots)... which is around 28hours. Undecided

The main slow downs will be:
- More words in a given position... that's going to ramp up the total search space
- Actually generating private keys/addresses to check, as these are computationally a lot slower than just checking if a seed is valid.

With some optimisation, and maybe porting to C or something faster than Python, you'd probably gain some performance benefits... I'm sure it's in the realms of reality to be able to do it within a matter of days? Huh It really depends on how big your search space is... what are the exact number of words you have in each position? Huh
legendary
Activity: 1042
Merit: 2805
Bitcoin and C♯ Enthusiast
February 08, 2021, 11:32:27 PM
#15
I don't see an option to use them in FinderOuter, any idea?
The option is found under "Missing mnemonic" and you have to select the "Electrum" type from the second drop  down on the right, then select the wallet type that Electrum was using from the drop down that appears after that. Make sure to enter the correct BIP32 path. Example 6 and 7 are Electrum mnemonics missing 1 and 2 words respectively.


A couple of points:
* Electrum mnemonics are 12 words by default
* You have to know the position of the missing word. But if it is only 1 word you can manually change the position from start to end and click Find 12 times, it should all take 12 seconds.
* You have to know a derived child from that wallet (it can be a child private key, public key or an address)
* Setting the path correctly is very important. When setting the wallet type (Electrum mnemonic type) FinderOuter automatically sets the correct path for that type but you have to enter the index of the child key you entered (eg. if you entered the 10th address you have to add 9 to the end of the path)
legendary
Activity: 952
Merit: 1385
February 08, 2021, 05:31:59 PM
#14
Edit:
I have 13 text files with possible seed words (1.txt, 2.txt...13.txt). most of the files contain 4-10 words.
I'm looking for a way to brute-force the seed based on those text files.

I know it's possible bruteforce 4 missing words, but how about the following scenario:

Let's say that I don't know any of the words, but for 12 words (out of 13) I know the last one or two letters, and for two words I know the first letter.
I also know their order and even some of their length.

Is there a known tool that I can play with to solve such puzzle?

Do you know the addresses you are looking for?
I think I could change my https://github.com/PawelGorny/lostword to implement solver of your problem, but it all makes no sense if you do not know what you are looking for.

Yes, I know the public address.

OK, I will try to do this tomorrow.
newbie
Activity: 47
Merit: 0
February 08, 2021, 05:31:12 PM
#13
Edit:
I have 13 text files with possible seed words (1.txt, 2.txt...13.txt). most of the files contain 4-10 words.
I'm looking for a way to brute-force the seed based on those text files.

I know it's possible bruteforce 4 missing words, but how about the following scenario:

Let's say that I don't know any of the words, but for 12 words (out of 13) I know the last one or two letters, and for two words I know the first letter.
I also know their order and even some of their length.

Is there a known tool that I can play with to solve such puzzle?

Do you know the addresses you are looking for?
I think I could change my https://github.com/PawelGorny/lostword to implement solver of your problem, but it all makes no sense if you do not know what you are looking for.

Yes, I know the public address.
legendary
Activity: 952
Merit: 1385
February 08, 2021, 05:22:58 PM
#12
Edit:
I have 13 text files with possible seed words (1.txt, 2.txt...13.txt). most of the files contain 4-10 words.
I'm looking for a way to brute-force the seed based on those text files.

I know it's possible bruteforce 4 missing words, but how about the following scenario:

Let's say that I don't know any of the words, but for 12 words (out of 13) I know the last one or two letters, and for two words I know the first letter.
I also know their order and even some of their length.

Is there a known tool that I can play with to solve such puzzle?

Do you know the addresses you are looking for?
I think I could change my https://github.com/PawelGorny/lostword to implement solver of your problem, but it all makes no sense if you do not know what you are looking for.
member
Activity: 180
Merit: 38
February 08, 2021, 05:20:40 PM
#11
Well you could do it in Linux but i'm using JavaScript mostly.
You can also use NodeJS or Python.
Node is build on Chrome's V8 JavaScript engine. 

You can also just type the words into a hardcoded array that would be the easiest way so you can omit the filereader.
Then you can use a library like bitcoinjs or jsbtc either one will work.
All the functions are readily available.

Here are some examples for bitcoinjs-lib: https://github.com/bitcoinjs/bip39 <--- i prefer this one
You can see here some examples for jsbtc: https://github.com/bitaps-com/jsbtc/blob/master/test/jsbtc.test.js
newbie
Activity: 47
Merit: 0
February 08, 2021, 05:09:07 PM
#10
Here is how i do.
Put all words in a text file and load it into the filereader.
Split() strip() and trim() into array.
Then you rotate the words randomly and validate the resulting mnemonic.
If true verify it to the target address.

https://i.ibb.co/16k8tBb/mnemonics.png

I thought about doing that, but I don't know how to do it (newbie in Linux..).
1. How to make a combined text to be checked? note that I have 13 text files, each text file has a different word list.
2. How to check the text with all the possible seeds?
Pages:
Jump to: