BIP38 python problem - page 4. | Bitcointalksearch.org

pooya87

legendary

Activity: 3472

Merit: 10611

Quote from: n0nce on September 03, 2021, 05:18:23 PM

I would like to inform you that a BIP is a bitcoin improvement proposal, so more like a protocol specification, and does not need a reference implementation in any language, especially not in Python, since Bitcoin Core is mostly written in C and C++.
Like, do you expect for every BIP that is 'taken seriously' to have a reference implementation in all programming languages? Wink

You are confusing the "reference implementation of Bitcoin" with "reference implementation of BIPs" they are not the same, ergo the BIPs reference implementation doesn't have to be in C++. In fact it should be in a langauge that the author of the BIP is most familiar with so that they can write the best readable code possible without any bugs.

The other implementations in other languages are usually done by other developers doing it as volunteers to let readers of the BIP choose whatever language they like to understand the algorithm better. If a BIP doesn't have other implementations it just means there weren't any volunteers yet and you can act on it if you want to contribute.

n0nce

hero member

Activity: 924

Merit: 5950

not your keys, not your coins!

Quote from: larry_vw_1955 on July 25, 2021, 09:24:01 PM

I guess the real problem and let me get on my soapbox here is that Bip38 wasn't taken very seriously and thus they don't even have a standard reference implementation for current versions of python lol.

I would like to inform you that a BIP is a bitcoin improvement proposal, so more like a protocol specification, and does not need a reference implementation in any language, especially not in Python, since Bitcoin Core is mostly written in C and C++.
Like, do you expect for every BIP that is 'taken seriously' to have a reference implementation in all programming languages? Wink

larry_vw_1955

sr. member

Activity: 1190

Merit: 469

Quote from: pooya87 on August 06, 2021, 10:48:05 PM

Well that is something that someone who knows python and can see all your code can answer. I don't know python, write it in C# and show me the code and I'll tell you where you went wrong Tongue

Everything seems to be working now, including that unicode test vector. It's working ! Shocked

Easteregg69

sr. member

Activity: 1579

Merit: 267

\u03D2\u0301\u0000\U00010400\U0001F4A9

Superstring in the wild if you ask me.

larry_vw_1955

sr. member

Activity: 1190

Merit: 469

Quote from: NotATether on August 07, 2021, 08:31:19 AM

Break down the problem and test the Unicode NFC normalization functionality independently from the AES unit test.

The Unicode NFC normalization is handled by a python package.

larry_vw_1955

sr. member

Activity: 1190

Merit: 469

Quote from: bob123 on August 07, 2021, 07:55:52 AM

Then you should not implement it yourself. Especially given that you don't really have a clue what you are doing.

I already have implemented it myself. It's just not working with one of the test vectors that's all.

Quote

No offense here, but if you have to ask what AES means or what an initialization vector is, you shouldn't implement this yourself.
Either use an existing library or pay someone who understands what he is doing to create such a tool for you.

Yeah I know. But I'm trying to figure out what the problem is with that one test vector why it's not working while all the other ones are. And I highly suspect it has to do with the fact that the test vector uses a unicode passphrase! Here take a look:

\u03D2\u0301\u0000\U00010400\U0001F4A9

Then they even said something even more confusing which is this:


Note: The non-standard UTF-8 characters in this passphrase should be NFC normalized to result in a passphrase of 0xcf9300f0909080f09f92a9 before further processing

Whoever wrote that BIP really must have been high as a kite. Because I'm telling you, it doesn't make any sense at all. It's like they're speaking in a different language or something.

NotATether

legendary

Activity: 1568

Merit: 6660

bitcoincleanup.com / bitmixlist.org

Quote from: larry_vw_1955 on August 06, 2021, 02:59:52 AM

I need it to work for all possible things it's supposed to including unicode strings of arbitrary complexity.

Break down the problem and test the Unicode NFC normalization functionality independently from the AES unit test.

The AES test should not care whether the input is a valid ascii/windows-1252/latin1/UTF-8/16/32/"Unicode" (you should not call character input that) because it operates at the byte level. Therefore, it makes sense for AES test vectors to be completely random bytes.

bob123

legendary

Activity: 1624

Merit: 2509

Quote from: larry_vw_1955 on August 06, 2021, 02:59:52 AM

... maybe some people are happy enough when it works with passphrases like "satoshi" but I need it to work for all possible things it's supposed to including unicode strings of arbitrary complexity.

Then you should not implement it yourself. Especially given that you don't really have a clue what you are doing.

No offense here, but if you have to ask what AES means or what an initialization vector is, you shouldn't implement this yourself.
Either use an existing library or pay someone who understands what he is doing to create such a tool for you.

pooya87

legendary

Activity: 3472

Merit: 10611

Quote from: larry_vw_1955 on August 06, 2021, 02:59:52 AM

well how come if I normalized it in nfc form and then utf-8 encoded it, it didn't work where as all the other test vectors did work?

Well that is something that someone who knows python and can see all your code can answer. I don't know python, write it in C# and show me the code and I'll tell you where you went wrong Tongue

larry_vw_1955

sr. member

Activity: 1190

Merit: 469

Quote from: pooya87 on August 05, 2021, 10:21:33 PM

Quote from: ?? on ??

Once again the lack of explanation,documentation, thouroughness in things rears its ugly head.

And once again the BIP is not supposed to explain everything, specially the basic stuff. For example it didn't explain how to do a Base58 encoding, you already knew that or if you didn't you sought the document explaining it somewhere else.
String normalization is the same. You should just google how it works, programming languages usually have an option to perform it easily for instance in c# we simply call an extension method called Normalize(mode).
https://en.wikipedia.org/wiki/Unicode_equivalence
https://stackoverflow.com/questions/47094155/how-to-normalize-python-3-unicode-string

well how come if I normalized it in nfc form and then utf-8 encoded it, it didn't work where as all the other test vectors did work? luckily they put a test vector in that used unicode otherwise i woulda thought it worked completely 100% that is until ... maybe some people are happy enough when it works with passphrases like "satoshi" but I need it to work for all possible things it's supposed to including unicode strings of arbitrary complexity.

pooya87

legendary

Activity: 3472

Merit: 10611

Quote from: ?? on ??

Once again the lack of explanation,documentation, thouroughness in things rears its ugly head.

And once again the BIP is not supposed to explain everything, specially the basic stuff. For example it didn't explain how to do a Base58 encoding, you already knew that or if you didn't you sought the document explaining it somewhere else.
String normalization is the same. You should just google how it works, programming languages usually have an option to perform it easily for instance in c# we simply call an extension method called Normalize(mode).
https://en.wikipedia.org/wiki/Unicode_equivalence
https://stackoverflow.com/questions/47094155/how-to-normalize-python-3-unicode-string

larry_vw_1955

sr. member

Activity: 1190

Merit: 469

Quote from: pooya87 on August 04, 2021, 11:52:13 PM

Quote from: ?? on ??

I think it may be a bug in python Shocked

There is no bug in python, the bug is in your usage of AES. You are using one of the modes that does not encrypt each block individually so when you encrypt the first block (first 16 bytes) you get the correct result but when you encrypt the second block (second 16 bytes) you get the wrong result. Then you concatenate the two parts and encode it with base58 so your string ends up looking like that.
Use the mode I told you above (ECB) and it should fix your issue.
Learn about AES modes here: https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#Electronic_codebook_(ECB)

You can see the data in your base58, the last part is the second block you encrypt which is different:

Code:

0142-c0-e957a24a-d357fafb81c71f8375a9a4d0ac02bad5-f6c87c4b459fabe34c0c314b33708ec3
0142-c0-e957a24a-d357fafb81c71f8375a9a4d0ac02bad5-30d5c2d250fed0ce62b993841bb5ccac

Good call! It was just a typo.

I did:
Do AES256Encrypt(block = bitcoinprivkey[16...31] xor derivedhalf1[0...15], key = derivedhalf2), call the 16-byte result encryptedhalf2

when it was supposed to be:
Do AES256Encrypt(block = bitcoinprivkey[16...31] xor derivedhalf1[16...31], key = derivedhalf2), call the 16-byte result encryptedhalf2



Passphrase: TestingOneTwoThree
Encrypted key: 6PRVWUbkzzsbcVac2qwfssoUJAN1Xhrg6bNk8J7Nzm5H7kxEbn2Nh2ZoGg
WIF key: 5KN7MzqK5wt2TP1fQCYyHBtDrXdJuXbUzm4A9rKAteGu3Qi5CVR
HEX key: CBF4B9F70470856BB4F40F80B87EDB90865997FFEE6DF315AB166D713AF433A5
Computed BTC address: 1Jq6MksXQVWzrznvZzxkV6oY57oWXD9TXB
Encryption steps: 


1. Compute the Bitcoin address (ASCII), and take the first four bytes of SHA256(SHA256()) of it. Let's call this addresshash.
addresshash is  b'\xe9W\xa2J'

Function name: normalize_string using unicodedata.normalize package
-------------------------

2. Derive a key from the passphrase using scrypt
Parameters: passphrase is the passphrase itself encoded in UTF-8 and normalized using Unicode Normalization Form C (NFC). salt is addresshash from the earlier step, n=16384, r=8, p=8, length=64 (n, r, p are provisional and subject to consensus)
scrypt =  b'\xf8vH\xa6\xb4/\xdd\x86\xefh7\xa2I\xcd\xe1S\x18\xf2d\xd4:\x85\x9ba\x0ex\xeac\xd5\x1c\xb2\xd3\xe6\x0b\xf4K\xfb)\xd5C\xbb\xa2J\xfc\xcc\xfa\xdb\xfcn\xf91/\xcc\xcfX\x9f\xa5\xea\x13f\xec!\xe4\xc0'
scrypt_hash_hex =  f87648a6b42fdd86ef6837a249cde15318f264d43a859b610e78ea63d51cb2d3e60bf44bfb29d54 3bba24afcccfadbfc6ef9312fcccf589fa5ea1366ec21e4c0
Let's split the resulting 64 bytes in half, and call them derivedhalf1 and derivedhalf2.
derivedhalf1 =  b'\xf8vH\xa6\xb4/\xdd\x86\xefh7\xa2I\xcd\xe1S\x18\xf2d\xd4:\x85\x9ba\x0ex\xeac\xd5\x1c\xb2\xd3'
derivedhalf2 =  b'\xe6\x0b\xf4K\xfb)\xd5C\xbb\xa2J\xfc\xcc\xfa\xdb\xfcn\xf91/\xcc\xcfX\x9f\xa5\xea\x13f\xec!\xe4\xc0'

3. Do AES256Encrypt(block = bitcoinprivkey[0...15] xor derivedhalf1[0...15], key = derivedhalf2), call the 16-byte result encryptedhalf1
key =  b'\xe6\x0b\xf4K\xfb)\xd5C\xbb\xa2J\xfc\xcc\xfa\xdb\xfcn\xf91/\xcc\xcfX\x9f\xa5\xea\x13f\xec!\xe4\xc0'
data=  68470520909420510445936813703370586819
encryptedhalf1=  b'\xd3W\xfa\xfb\x81\xc7\x1f\x83u\xa9\xa4\xd0\xac\x02\xba\xd5'

4. Do AES256Encrypt(block = bitcoinprivkey[16...31] xor derivedhalf1[16...31], key = derivedhalf2), call the 16-byte result encryptedhalf2
data2=  210910838195062625062692659411739509110
enryptedhalf2=  b'\xf6\xc8|KE\x9f\xab\xe3L\x0c1K3p\x8e\xc3'

5. The encrypted private key is the Base58Check-encoded concatenation of the following, which totals 39 bytes without Base58 checksum: 0x01 0x42 + flagbyte + salt + encryptedhalf1 + encryptedhalf2
Using uncompressed address
object_id_prefix=  b'\x01B'
flagbyte_byte=  b'\xc0'
salt=addresshash= b'\xe9W\xa2J'
private_key_encrypted_bytes=  b'\x01B\xc0\xe9W\xa2J\xd3W\xfa\xfb\x81\xc7\x1f\x83u\xa9\xa4\xd0\xac\x02\xba\xd5\xf6\xc8|KE\x9f\xab\xe3L\x0c1K3p\x8e\xc3'

Base 58 encoded encrypted private key 
6PRVWUbkzzsbcVac2qwfssoUJAN1Xhrg6bNk8J7Nzm5H7kxEbn2Nh2ZoGg
58

Known encrypted key 
6PRVWUbkzzsbcVac2qwfssoUJAN1Xhrg6bNk8J7Nzm5H7kxEbn2Nh2ZoGg
58
SUCCESS keys match


Passphrase: TestingOneTwoThree
Encrypted key: 6PYNKZ1EAgYgmQfmNVamxyXVWHzK5s6DGhwP4J5o44cvXdoY7sRzhtpUeo
WIF key: L44B5gGEpqEDRS9vVPz7QT35jcBG2r3CZwSwQ4fCewXAhAhqGVpP
HEX key: CBF4B9F70470856BB4F40F80B87EDB90865997FFEE6DF315AB166D713AF433A5
Computed BTC address: 164MQi977u9GUteHr4EPH27VkkdxmfCvGW
Encryption steps: 


1. Compute the Bitcoin address (ASCII), and take the first four bytes of SHA256(SHA256()) of it. Let's call this addresshash.
addresshash is  b'C\xbeAy'

Function name: normalize_string using unicodedata.normalize package
-------------------------

2. Derive a key from the passphrase using scrypt
Parameters: passphrase is the passphrase itself encoded in UTF-8 and normalized using Unicode Normalization Form C (NFC). salt is addresshash from the earlier step, n=16384, r=8, p=8, length=64 (n, r, p are provisional and subject to consensus)
scrypt =  b's\x1e\xf3\xc77\xb5]\xf4\x99\x8bD\xfa\x8aTz?8\xdfBM\xa2@\xde8\x9b\x11\xd1\x87[\xa4wg//\xe8\x1b\x052\xb5\x95\x0e>\xa6\xff\xf9,e\xd4g\xaa}\x05Ii\x82\x1d\xe24Oz\x86\xd4%i'
scrypt_hash_hex =  731ef3c737b55df4998b44fa8a547a3f38df424da240de389b11d1875ba477672f2fe81b0532b59 50e3ea6fff92c65d467aa7d054969821de2344f7a86d42569
Let's split the resulting 64 bytes in half, and call them derivedhalf1 and derivedhalf2.
derivedhalf1 =  b's\x1e\xf3\xc77\xb5]\xf4\x99\x8bD\xfa\x8aTz?8\xdfBM\xa2@\xde8\x9b\x11\xd1\x87[\xa4wg'
derivedhalf2 =  b'//\xe8\x1b\x052\xb5\x95\x0e>\xa6\xff\xf9,e\xd4g\xaa}\x05Ii\x82\x1d\xe24Oz\x86\xd4%i'

3. Do AES256Encrypt(block = bitcoinprivkey[0...15] xor derivedhalf1[0...15], key = derivedhalf2), call the 16-byte result encryptedhalf1
key =  b'//\xe8\x1b\x052\xb5\x95\x0e>\xa6\xff\xf9,e\xd4g\xaa}\x05Ii\x82\x1d\xe24Oz\x86\xd4%i'
data=  245794453406607058042499921876840260015
encryptedhalf1=  b'p\xe4\xa0\x80_\x15\xa7~\xfcs\x8fy@h\xd8\x83'

4. Do AES256Encrypt(block = bitcoinprivkey[16...31] xor derivedhalf1[16...31], key = derivedhalf2), call the 16-byte result encryptedhalf2
data2=  253253421257611663847159858968675763394
enryptedhalf2=  b'|)\x85\xa6\x94_\x7f\xe0\xdb?u\xdc0^\xaf|'

5. The encrypted private key is the Base58Check-encoded concatenation of the following, which totals 39 bytes without Base58 checksum: 0x01 0x42 + flagbyte + salt + encryptedhalf1 + encryptedhalf2
Using compressed address
object_id_prefix=  b'\x01B'
flagbyte_byte=  b'\xe0'
salt=addresshash= b'C\xbeAy'
private_key_encrypted_bytes=  b'\x01B\xe0C\xbeAyp\xe4\xa0\x80_\x15\xa7~\xfcs\x8fy@h\xd8\x83|)\x85\xa6\x94_\x7f\xe0\xdb?u\xdc0^\xaf|'

Base 58 encoded encrypted private key 
6PYNKZ1EAgYgmQfmNVamxyXVWHzK5s6DGhwP4J5o44cvXdoY7sRzhtpUeo
58

Known encrypted key 
6PYNKZ1EAgYgmQfmNVamxyXVWHzK5s6DGhwP4J5o44cvXdoY7sRzhtpUeo
58
SUCCESS keys match

The good news is i got it working with "No compression, no EC multiply" test vectors 1 and 2 as well as their related "Compression, no EC multiply" counterparts.

The bad news is it doesn't work with "No compression, no EC multiply" Test vector 3. That one uses a strange passphrase.

Passphrase ϓ␀hankey (\u03D2\u0301\u0000\U00010400\U0001F4A9; GREEK UPSILON WITH HOOK, COMBINING ACUTE ACCENT, NULL, DESERET CAPITAL LETTER LONG I, PILE OF POO)

along with the following note:

Note: The non-standard UTF-8 characters in this passphrase should be NFC normalized to result in a passphrase of 0xcf9300f0909080f09f92a9 before further processing

None of that really makes any sense to me. 0xcf9300f0909080f09f92a9 is a hexadecimal number, not a string. Once again the lack of explanation,documentation, thouroughness in things rears its ugly head.

[moderator's note: consecutive posts merged]

pooya87

legendary

Activity: 3472

Merit: 10611

Quote from: ?? on ??

I think it may be a bug in python Shocked

There is no bug in python, the bug is in your usage of AES. You are using one of the modes that does not encrypt each block individually so when you encrypt the first block (first 16 bytes) you get the correct result but when you encrypt the second block (second 16 bytes) you get the wrong result. Then you concatenate the two parts and encode it with base58 so your string ends up looking like that.
Use the mode I told you above (ECB) and it should fix your issue.
Learn about AES modes here: https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#Electronic_codebook_(ECB)

You can see the data in your base58, the last part is the second block you encrypt which is different:

Code:

0142-c0-e957a24a-d357fafb81c71f8375a9a4d0ac02bad5-f6c87c4b459fabe34c0c314b33708ec3
0142-c0-e957a24a-d357fafb81c71f8375a9a4d0ac02bad5-30d5c2d250fed0ce62b993841bb5ccac

larry_vw_1955

sr. member

Activity: 1190

Merit: 469

Quote from: HCP on August 03, 2021, 07:18:52 PM

Unfortunately, most of the test vectors I've seen don't get as finely grained as the individual steps in an algorithm... instead they'll focus on the "initial input" and the "final output" Undecided

In any case, you might want to have a look at this: https://github.com/steve-vincent/bip38

Granted, it's Python 2.7... but it's a "working" BIP38 python project and might offer some insights as a starting point.

Yeah thanks for the suggestion about trying to use an older version of python implementation, I was hoping not to have to install an old version of python just to see how thing thing works. That's a pretty sad state of affairs but at least it might yield some better insight. Sigh, I really did not want to have to install an old version of python lol.

Quote from: pooya87 on August 03, 2021, 11:19:05 PM

Technically since BIP38 is just using a couple of other functions you don't need the intermediary values for each step, the initial input and final output are more than enough. This is also why there aren't that many test vectors either.

Yeah but how do you program it in python? You see python did some crazy thing where they just changed things alot and there was a version 3 and then an old version 2 and they aren't even alike and ...

Quote from: NotATether on August 04, 2021, 06:40:57 AM

What the hay, maybe I'll just write the missing test vectors myself. I don't have much else to do this week and I've also seen stuff like BIP32 & BIP39 missing test vectors for some of their steps as well so it would be beneficial for the community. I might even be able to merge these by PR into the official BIP spec.

That would be very excellent, thanks.

[moderator's note: consecutive posts merged]

NotATether

legendary

Activity: 1568

Merit: 6660

bitcoincleanup.com / bitmixlist.org

Quote from: HCP on August 03, 2021, 07:18:52 PM

Unfortunately, most of the test vectors I've seen don't get as finely grained as the individual steps in an algorithm... instead they'll focus on the "initial input" and the "final output" Undecided

In any case, you might want to have a look at this: https://github.com/steve-vincent/bip38

Granted, it's Python 2.7... but it's a "working" BIP38 python project and might offer some insights as a starting point.

What the hay, maybe I'll just write the missing test vectors myself. I don't have much else to do this week and I've also seen stuff like BIP32 & BIP39 missing test vectors for some of their steps as well so it would be beneficial for the community. I might even be able to merge these by PR into the official BIP spec.

pooya87

legendary

Activity: 3472

Merit: 10611

Technically since BIP38 is just using a couple of other functions you don't need the intermediary values for each step, the initial input and final output are more than enough. This is also why there aren't that many test vectors either.

But you can always create your own test vectors if you like, just take any of the existing implementations that is tested and add break points at different lines or print the values if the language has that option, then use those values.

HCP

legendary

Activity: 2086

Merit: 4363

Unfortunately, most of the test vectors I've seen don't get as finely grained as the individual steps in an algorithm... instead they'll focus on the "initial input" and the "final output" Undecided

In any case, you might want to have a look at this: https://github.com/steve-vincent/bip38

Granted, it's Python 2.7... but it's a "working" BIP38 python project and might offer some insights as a starting point.

larry_vw_1955

sr. member

Activity: 1190

Merit: 469

Quote from: NotATether on July 27, 2021, 11:09:49 AM

Don't try to code AES encrypt/decrypt calls on Python unless you have dozens of man-hours to spare on debugging. It's literally a pain in the !!! and I know that because I coded this sort of thing before.

Thanks for the advice but it got to me too late, I've already spent way too many hours struggling with it but the main problem I run into is there's not a single test vector I can find that shows the output from each step of the algorithm so that I can pinpoint where I'm going wrong. Can you imagine that? Not a single test vector that shows each step and its output. Bip38 is something else!

I mean the spec seems innocent enough how they listed it out step by step. Problem is, when you can't see an example it really sucks balls.

NotATether

legendary

Activity: 1568

Merit: 6660

bitcoincleanup.com / bitmixlist.org

Don't try to code AES encrypt/decrypt calls on Python unless you have dozens of man-hours to spare on debugging. It's literally a pain in the !!! and I know that because I coded this sort of thing before.

First of all if you don't have a fixed-width key then you have to select a character to pad the key with since for AES256 it must be 32 chars long and then you have to somehow remove the padding when you decrypt it (impossible to do if the key text is completely random since no characters are suitable for padding in the first place, not even \x00).

Then you have the issue of serializing this to bytes() because the encryption function will choke on regular strings, and if you're not careful with coding and muck with the wrong encode() and decode() codecs you'll end up making an encrypted text that can't be decrypted by the decryption function because the some (maybe even just one) bytes are wrong.

You cannot just work with bytes because a large number of functions will not work with that type and require the str type such as input and of course padding (since you can't mix strings with bytes either!)

This is complicated even further in Python 3 where all strings are now UTF-8 encoded instead of just ASCII encoded.

Finally there are a myriad of "pycrypto" clone libraries that all conflict with each other if they are installed in the same environment causing ImportErrors and all kinds of weird bugs which forces you to make a virtualenv to do pycrypto stuff in. Though I heard that people are using "pycryptodome" package nowadays.

larry_vw_1955

sr. member

Activity: 1190

Merit: 469

Quote from: pooya87 on July 25, 2021, 10:28:31 PM

What the quote means is that you use AES with the following settings:
key size = 256 (bit) (the derived key from scrypt)
mode = ECB (each block is encrypted individually)
IV = new byte[16] (empty initialization vector)
padding = none

I will definitely give that a try. Not sure why they couldn't make that more clear in the specification though I guess they expect mind readers to be looking over their docs. oh and thanks! i'll report back if i get it working.

Topic: BIP38 python problem - page 4. (Read 1052 times)