Author

Topic: What's the expected speed for BIP39 passphrase recovery using CPU? (Read 273 times)

legendary
Activity: 1042
Merit: 2805
Bitcoin and C♯ Enthusiast
May I ask what did you user to write your code? I mean - which documentation, description of algorithm etc.?
There is no documentation, I make it up as I go. Basically I've already implemented a lot of cryptography algorithms (SHA2, SHA3, HMAC-SHA, scrypt, Blake, PBKDF2, etc.) so I am fairly familiar with how they work under the hood. All I do in FinderOuter is "messing" with the algorithm to increase the efficiency.

For example in BIP38, the first step is to use scrypt which uses PBKDF2 which uses HMACSHA256 to initially derive a 8192 byte long key. But HMAC-SHA-2 algorithms are designed in a way that the first block of each round is the same (inner and outer pads) so it can be pre-computed and that way we can easily skip 510 block compressions per password.
legendary
Activity: 952
Merit: 1386
May I ask what did you user to write your code? I mean - which documentation, description of algorithm etc.?
legendary
Activity: 1042
Merit: 2805
Bitcoin and C♯ Enthusiast
I finally managed to get some free time and after 7 days of nonstop work I finally implemented and optimized BIP38 password recovery option (21148ed, 9a270ff). I've decided to not implement AES and use .net implementation. I will probably add an alternative during the CPU intrinsic changes (v1.0+).

For now it lacks parallelism but it will be added before releasing FinderOuter 0.13.0.

The speed is 3.5 checks per second per thread on my core i3 6100. I would love to benchmark it on newer CPUs but I suppose a 12th generation Intel core-i9 should be able to compute 70 passwords/sec after parallelism is added. But that's a pure guess based on number of cores and CPU clock.
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
I'm not familiar with wallet file passwords but it sounds like something that would also include AES.

It does, but in the case of wallet passwords, the output of the PBKDF2 are the AES keys and IVs. The private keys which are generated later are encrypted with these, so there's no AES encryption/decryption involved during the key-stretching phase.
legendary
Activity: 1042
Merit: 2805
Bitcoin and C♯ Enthusiast
For comparison, high-end GPUs can brute-force PBKDF2 (the wallet.dat password-to-keys stretching) at 4.5k-6k passes per second.
I'm not familiar with wallet file passwords but it sounds like something that would also include AES.

Your measure on CPU isn't so bad,
To be clear the speed above is not just PBKDF2, it is PBKDF2 + BIP32 KDF including EC mult + child Private to Public key + RIPEMD160 & SHA256 Hash (for address address comparison) among other smaller things.

Speed gains are mainly found by optimizing the SHA512 implementation you are using - often this involves writing it in C with asm directives. The rest of the KDF simply call a bunch of HMACs which themselves call SHA512 thousands of times.
Although that's true but I have to disagree here. I consider optimization like above to be the least effective and only to be used in the "last squeeze" step when we want to get every last bit of performance.
In other words optimization does not start from the code or language, it starts from the algorithm. For example in this case it doesn't matter much if SHA512 is optimized any more than it is now if the "HMACs are simply called", instead we can modify the algorithm to easily double the speed by avoiding 4,096 block compressions out of 8,192 each round, and that's just part of it.
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
For comparison, high-end GPUs can brute-force PBKDF2 (the wallet.dat password-to-keys stretching) at 4.5k-6k passes per second. Your measure on CPU isn't so bad, it just shows that PBKDF2 can't really be accelerated much using accelerated floating-point units like GPUs or FPGAs. i.e. it doesn't scale well vertically, only horizontally with more devices.

Speed gains are mainly found by optimizing the SHA512 implementation you are using - often this involves writing it in C with asm directives. The rest of the KDF simply call a bunch of HMACs which themselves call SHA512 thousands of times. This paper is a good start: https://core.ac.uk/download/pdf/186473296.pdf
HCP
legendary
Activity: 2086
Merit: 4363
Honestly, those numbers really do seem surprisingly "good"... I guess that is not entirely unexpected given that, while it's not explicitly stated in the BIP itself, I vaguely recall reading somewhere that they were trying to balance "cost" with "user experience".

The idea being that it would "slow" attempts to bruteforce, without making an enduser have to wait for stupid amounts of time while just trying to use the system "normally"... remembering that regardless of whether or not a user explicitly specifies a passphrase, there is still a default "passphrase" value used.
legendary
Activity: 1042
Merit: 2805
Bitcoin and C♯ Enthusiast
3rdIteration have few benchmark on BIP 39,
https://github.com/3rdIteration/BIP39-Passphrase-Testing
https://github.com/3rdIteration/btcrecover/blob/master/docs/GPU_Acceleration.md

It uses different metric with different detail, so i've no idea whether it's faster than yours.
Thanks, that is very useful. I think the values can be converted.
From the looks of it this is brute forcing the passphrase assuming it consists of words (international+accessories+university), FinderOuter currently has only one option and it assumes your passphrase is a random set of characters (H?s=3) which would make a very small difference in my implementation; and the rest is the same. Basically how the initial data for inner SHA512 compression is set has to be changed to copy bytes from a predefined array (won't be expensive).

So the first picture here with corei5-3380 translates to 625 pass/sec and second to 1218 pass/sec. If my understanding is correct then I'm quite happy with my ~1500 pass/sec on corei3-6100.

The only thing I had to adjust was the BIP32 path which slows things down a bit since it has 2 extra indexes compared to my initial benchmark. I believe that ETH addresses are less expensive to compute since they use Keccak256 instead of RIPEMD160 of SHA256, ECC is the same since it uses the same curve as bitcoin.
17,576 pass in 12 sec is 1,464 pass/sec. The rate will be the same regardless of the passphrase length.
legendary
Activity: 2870
Merit: 7490
Crypto Swap Exchange
I recently added the BIP39 passphrase recovery option to FinderOuter and am now working on its parallelism, while testing the speed I'm getting a surprisingly low speed of 1500 pass/sec. Although my CPU is old and it is expected to see a very low speed due to the algorithm being "expensive" but I'm wondering how does this compare to other tools out there.

3rdIteration have few benchmark on BIP 39,
https://github.com/3rdIteration/BIP39-Passphrase-Testing
https://github.com/3rdIteration/btcrecover/blob/master/docs/GPU_Acceleration.md

It uses different metric with different detail, so i've no idea whether it's faster than yours.
legendary
Activity: 1042
Merit: 2805
Bitcoin and C♯ Enthusiast
@ranochigo is correct, but the main reason for BIP38 slowness is using a much more expensive key derivation function called scrypt which also unlike PBKDF2 can't be exploited for optimization. On top of that the costparam of 16384 is meant to make it even more expensive.
For comparison, on an un-specialized code scrypt can generate 12 keys per second on my very old CPU. This also doesn't include AES and the extra steps of BIP38.

Although I haven't started on BIP38 option yet, I believe there are ways to optimize it. With a quick look at bip38-cracker project on GitHub the code doesn't seem to be "specialized" for this matter. Hopefully for v0.13.0.
legendary
Activity: 3038
Merit: 4418
Crypto Swap Exchange
Have you seen I'm BIP38 curious, please help me out!?
I can try ~20 passwords per second with my current setup
The fastest cracker we have, Dirbaio, can do 20 tries/second.
Actually there was this other repo that was linked earlier ( https://github.com/cscott/bip38-cracker ) that is quite faster, probably because it uses scrypt-jane. ~

It took ~20 hours on three n1-highcpu-16 machines on Google Compute. Each one did ~50 passwords per second, 150 total.
It cost around $38 overall.
BIP38 isn't the same as BIP39. HMAC-SHA512 is involved with the hashing of the mnemonic. Passphrase for BIP39 is included in the salt of that.

BIP38 encrypts the string.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
I'm getting a surprisingly low speed of 1500 pass/sec.
That's already a lot more than I expected.

Have you seen I'm BIP38 curious, please help me out!?
I can try ~20 passwords per second with my current setup
The fastest cracker we have, Dirbaio, can do 20 tries/second.
Actually there was this other repo that was linked earlier ( https://github.com/cscott/bip38-cracker ) that is quite faster, probably because it uses scrypt-jane. ~

It took ~20 hours on three n1-highcpu-16 machines on Google Compute. Each one did ~50 passwords per second, 150 total.
It cost around $38 overall.



BIP38 isn't the same as BIP39.
You're absolutely right. Shame on me for confusing those. I had too much on my mind this morning.
legendary
Activity: 1042
Merit: 2805
Bitcoin and C♯ Enthusiast
If you have used any BIP39 passphrase (extra/extension words) recovery tools using your CPU please let me know what was the speed you got, preferably in terms of pass/second.

I recently added the BIP39 passphrase recovery option to FinderOuter and am now working on its parallelism, while testing the speed I'm getting a surprisingly low speed of 1500 pass/sec. Although my CPU is old and it is expected to see a very low speed due to the algorithm being "expensive" but I'm wondering how does this compare to other tools out there.
This is surprising because I'm already exploiting PBKDF2 weaknesses and reducing the work by almost 45% so I don't think there is that much room left for further optimization.
Jump to: