Author

Topic: Algorithm for going from a paper backup code to actual keys (Read 1516 times)

legendary
Activity: 1232
Merit: 1094
I consider it one of the reasons there's never been a money-destroying bug in Armory.

Fair enough.  The only situation where compressed keys would happen would be if they are generated by a non-Armory utility.
legendary
Activity: 1428
Merit: 1093
Core Armory Developer
It looks like Armory addresses use uncompressed keys when generating addresses. 

Wouldn't it be better to use compressed ones? 

It should definitely detect money sent to both (for example, if "someone" was generating the key sequences manually :p).

When I finally get around to updating the wallet format, I will support compressed public keys.  Despite its simplicity, Armory's wallet file format and code has been extremely well-tested and I didn't want to risk breaking something to make that update.    I consider it one of the reasons there's never been a money-destroying bug in Armory.

At the time that I started writing the wallet code, I didn't even know compressed public keys existed.  It was especially frustrating, because I went through the effort to implement a Satoshi-wallet-migration process into Armory, and then it became useless since Armory couldn't handle the private key format.  Oh well.
legendary
Activity: 1232
Merit: 1094
It looks like Armory addresses use uncompressed keys when generating addresses. 

Wouldn't it be better to use compressed ones? 

It should definitely detect money sent to both (for example, if "someone" was generating the key sequences manually :p).
legendary
Activity: 1232
Merit: 1094
You can get something really close to what you're looking for by:

(1) Go into "Expert" usermode
(2) Go into your wallet and click on "Addresses Used"
(3) Create 1,000 new addresses
(4) Go back to your wallet and "Backup Individual Keys"
(5) Click "Include Unused (Address Pool)"
(6) Mess with the options to get the display you want

I know that window doesn't have all the best options for displaying things, but it sounds like it has enough for you. 

Thanks, I will have a look.
legendary
Activity: 1428
Merit: 1093
Core Armory Developer
I have written some java code to convert paper backup codes into a sequence of keys.

Would there be an easy way to generate a file with the 4 lines of text, the unique ID and then the first 1000 public keys (in address format)?

I did 2 "manually" and the first 5 addresses matched (it would be nice if you could copy/paste the 4 lines Smiley)

I could read in those files and then check that I generate the right sequence of public keys (and that the unique id is correct).

Ofc, simply tests that I generate a valid sequence of public/private pairs is enough to make sure most isn't lost.

You can get something really close to what you're looking for by:

(1) Go into "Expert" usermode
(2) Go into your wallet and click on "Addresses Used"
(3) Create 1,000 new addresses
(4) Go back to your wallet and "Backup Individual Keys"
(5) Click "Include Unused (Address Pool)"
(6) Mess with the options to get the display you want

I know that window doesn't have all the best options for displaying things, but it sounds like it has enough for you. 
legendary
Activity: 1232
Merit: 1094
I have written some java code to convert paper backup codes into a sequence of keys.

Would there be an easy way to generate a file with the 4 lines of text, the unique ID and then the first 1000 public keys (in address format)?

I did 2 "manually" and the first 5 addresses matched (it would be nice if you could copy/paste the 4 lines Smiley)

I could read in those files and then check that I generate the right sequence of public keys (and that the unique id is correct).

Ofc, simply tests that I generate a valid sequence of public/private pairs is enough to make sure most isn't lost.
legendary
Activity: 1232
Merit: 1094
Ahh, very cool.  Very similar to Shamir's Secret Sharing.  I implemented the finite-field math in Armory to do SSS but it's extremely ineffiicient and was more of an exercise in highly SLOC-efficient arbitrary-integer math in python (it's basically all list comprehensions and recursion... extremely compact).  It is in the "backupcenter" branch which will hopefuly be merged soon.  But I never quite figured out how to do it with arbitrary data, since I can't use 28 since it's not a prime.

I read a suggestion to use 9 bits per symbol.

You could have

64 bytes (since they are your data, 256 never occurs)
8 * 9 bits (so 9 bytes)

This would let you correct up to 4 symbol errors.  It would also let you correct up to 8 symbol errors, if you knew which 8.

You only need to count from

0x00 to 0xE0.  You would only need one extra symbol.  It would be illegal in the lower nibble and occur very rarely in the upper nibble (and again only in the CRC part).

Alternatively, if you increased your base to 23 symbol, the you could represent 0 to 528.  This would allow you to have a parity bit for every byte.

Illegal parity symbols could be immediately marked as "erased", which means that it counts as a known error (and you can have 8 of them).

Quote
Is there a composite prime (p*q) such that (p-1)*(q-1) == 256?

Finite fields work for 2^8, but you have to multiply polynomials.  I didn't look massively into it, but I think the concept of polynomial is important for finding where there errors are.

Having said that, it may not be worth the effort.  Detecting errors and correcting 2 symbol errors should be sufficient in practice, unless people are memorizing the key.
legendary
Activity: 1428
Merit: 1093
Core Armory Developer
I know that in some variants/parameters of RS, you end up changing all the bytes, and you need an RS library to decode it.  I'd really prefer the code to simply be appended data (and I'm pretty sure there's a way to do it with RS).

I think the big benefit of RS is that it can correct runs of errors.  Things like parity checking can only handle a few bits at a time.

It is a finite field thing so it only works if you have a prime power as a mod (so your symbols could be base 2^8).  However, non-prime mods makes calculations harder.

You can do it without changing the data.  However, I am not sure if that breaks the error finding part of the correction.

The idea is that you create a polynomial that passes through all the points in your data (x, y) for (x = 0 to x = 63).  You then compute the value of the polynomial for x = 64 to 67, for 4 extra symbols.

When receiving you do the same thing.  If the extra bytes fall on the polynomial that shows there is no errors.

If 3 of the 4 are on the polynomial, then that means that one of them is probably wrong.  You have 67 out of 68 agreement on the answer.

If they are all wrong, then some of the errors are in the main data.

You then need to find which ones are wrong.  This is where all the various mappings come into it.

Ahh, very cool.  Very similar to Shamir's Secret Sharing.  I implemented the finite-field math in Armory to do SSS but it's extremely ineffiicient and was more of an exercise in highly SLOC-efficient arbitrary-integer math in python (it's basically all list comprehensions and recursion... extremely compact).  It is in the "backupcenter" branch which will hopefuly be merged soon.  But I never quite figured out how to do it with arbitrary data, since I can't use 28 since it's not a prime.

Is there a composite prime (p*q) such that (p-1)*(q-1) == 256?  Seems like that would work.  But I'm also not here to reinvent anything, I'm just curious for my own mathematical understanding (I'm not familar with RS at all before now).

EDIT: Of course there is:  p=q=17, leads to a finite field with modulus 289, and should have (17-1)*(17-1) = 256 elements. EDIT2: oops, p and q have to be different... nm
legendary
Activity: 1232
Merit: 1094
I know that in some variants/parameters of RS, you end up changing all the bytes, and you need an RS library to decode it.  I'd really prefer the code to simply be appended data (and I'm pretty sure there's a way to do it with RS).

I think the big benefit of RS is that it can correct runs of errors.  Things like parity checking can only handle a few bits at a time.

It is a finite field thing so it only works if you have a prime power as a mod (so your symbols could be base 2^8).  However, non-prime mods makes calculations harder.

You can do it without changing the data.  However, I am not sure if that breaks the error finding part of the correction.

The idea is that you create a polynomial that passes through all the points in your data (x, y) for (x = 0 to x = 63).  You then compute the value of the polynomial for x = 64 to 67, for 4 extra symbols.

When receiving you do the same thing.  If the extra bytes fall on the polynomial that shows there is no errors.

If 3 of the 4 are on the polynomial, then that means that one of them is probably wrong.  You have 67 out of 68 agreement on the answer.

If they are all wrong, then some of the errors are in the main data.

You then need to find which ones are wrong.  This is where all the various mappings come into it.
legendary
Activity: 1428
Merit: 1093
Core Armory Developer
Indeed.  Two errors is kinda tough to deal with.  But that's why there's also a WalletID check after you're done entering stuff.  So even if Armory corrects wrong, the user manually error-checks the result.

By the way, this isn't exactly "CRC".  It's just the simple error correction used in addresses.   However, I'll be updating the backup stuff soon (feel free to checkout the backupcenter branch and test it for me!).  I don't see why I can't switch the error catching code.  I had wanted to do Reed-Solomon, but I didn't want to pull in a huge extra library for it (and I didn't find a lot of good ones in python).  RS in C++ would be easy to pull in through SWIG, so that would work if I needed it.    My only requirement for the error-correcting code is that the original data be unmodified.  I know that in some variants/parameters of RS, you end up changing all the bytes, and you need an RS library to decode it.  I'd really prefer the code to simply be appended data (and I'm pretty sure there's a way to do it with RS).
legendary
Activity: 1232
Merit: 1094
I was looking into writing a decoder and was going to extend it to handle 2 character errors.

Select 2 characters (36 * 35 options)
Set those characters to anything (15 * 15 options)

This gives 283500 checks.  This isn't that much of a processor load.

However, the CRC is only 16 bits, which is only 65536 options.

This means that fixing 2 character errors is always capable to moving this code to a valid code.

I setup the dialog background to

White: needs more letters
Yellow: 1 error
Orange: 2 errors
Red: >2 errors

However, it almost never gets past orange Smiley, since it can find a solution within 2 changes.
legendary
Activity: 1428
Merit: 1093
Core Armory Developer
The persistent blockchain stuff will solve about 27 problems all at once, and also improve flexibility in that respect.

So, there will be 2 full databases of the blockchain, or is the plan to drop bitcoind as verifier?

For the first version, there will be two full databases.  Then once that's stable (because that's easiest, and has some other uses), then I will implement a partial-DB version which will store just a fraction of the data.  In the future that will include pruning, too.  Though, pruning is being bypassed for now (though I'm putting the hooks in) because I rely on having the history to produce the full transaction ledger.  I can't turn on pruning until the wallet (or some other file) starts storing that history.
legendary
Activity: 1232
Merit: 1094
The persistent blockchain stuff will solve about 27 problems all at once, and also improve flexibility in that respect.

So, there will be 2 full databases of the blockchain, or is the plan to drop bitcoind as verifier?
legendary
Activity: 1428
Merit: 1093
Core Armory Developer
Slightly off topic, have you seen this commit.

It allows raw block dumps over the RPC interface (to/from hex strings).  Would that be worth using to save you from having to manually read the files from the disk?

Armory doesn't even use the RPC interface, unless you use the auto-bitcoind feature.  It communicates with bitcoind/bitcoin-qt as a peer (over localhost).  So "getblock" messages have always been available to Armory, it just hasn't been needed yet.  Armory doesn't store the whole blocks, it only saves the block and tx hashes and where to find that data in the blk*.dat files.  Thus, I have no choice but to scan the blockfiles.

However, the "persistent blockchain" stuff will change that.  Now Armory will maintain it's own block database, and I will no longer have any reliance on blk*.dat files.  In fact,  switching to getblock network messages was exactly how I was planning to get the data.  It's because direct reliance on blk*.dat files has caused countless problems, in addition to forcing bitcoind/bitcoin-Qt to have to be run on the same system running Armory (though some users made it work remotely using network mounts, like sshfs).  The persistent blockchain stuff will solve about 27 problems all at once, and also improve flexibility in that respect.

But of course, it's a lot... of ... work...
legendary
Activity: 1232
Merit: 1094
Slightly off topic, have you seen this commit.

It allows raw block dumps over the RPC interface (to/from hex strings).  Would that be worth using to save you from having to manually read the files from the disk?
legendary
Activity: 1428
Merit: 1093
Core Armory Developer
Thanks.

I didn't realise that the info was so direct.  I thought you did some post processing to go from the info in the paper wallet to the actual wallet keys.  I guess that's a waste since the paper wallet has no protection.

You do have a system for the "printer-safe" version?

So, basically, for the paper wallets;

Each line is a hex string (with different letter to number mappings)
36 letters = 32 (value) + 4 (hash)

This means that an error will be missed one time in 65536, since the hash is 2 bytes, so the checking the wallet id is important.

If the CRC mismatches, it tries every possible value for each letter, keeping the others constant, and rechecks the hash.  This fixes 1 letter errors.

This is 16 bytes per line which give 32 bytes each for the chain code and private key.

Do you object to people using the system for other software?

Oh, that was just the deterministic wallet algorithm.  The code for actually converting the paperbackup info to the PrivKey(0) and Chaincode is:

Code:
###### Typing-friendly Base16 #####
#  Implements "hexadecimal" encoding but using only easy-to-type
#  characters in the alphabet.  Hex usually includes the digits 0-9
#  which can be slow to type, even for good typists.  On the other
#  hand, by changing the alphabet to common, easily distinguishable,
#  lowercase characters, typing such strings will become dramatically
#  faster.  Additionally, some default encodings of QRCodes do not
#  preserve the capitalization of the letters, meaning that Base58
#  is not a feasible options
NORMALCHARS  = '0123 4567 89ab cdef'.replace(' ','')
EASY16CHARS  = 'asdf ghjk wert uion'.replace(' ','')
hex_to_base16_map = {}
base16_to_hex_map = {}
for n,b in zip(NORMALCHARS,EASY16CHARS):
   hex_to_base16_map[n] = b
   base16_to_hex_map[b] = n

def binary_to_easyType16(binstr):
   return ''.join([hex_to_base16_map[c] for c in binary_to_hex(binstr)])

def easyType16_to_binary(b16str):
   return hex_to_binary(''.join([base16_to_hex_map[c] for c in b16str]))


def makeSixteenBytesEasy(b16):
   if not len(b16)==16:
      raise ValueError, 'Must supply 16-byte input'
   chk2 = computeChecksum(b16, nBytes=2)
   et18 = binary_to_easyType16(b16 + chk2)
   return ' '.join([et18[i*4:(i+1)*4] for i in range(9)])

def readSixteenEasyBytes(et18):
   b18 = easyType16_to_binary(et18.strip().replace(' ',''))
   b16 = b18[:16]
   chk = b18[ 16:]
   b16new = verifyChecksum(b16, chk)
   if len(b16new)==0:
      return ('','Error_2+')
   elif not b16new==b16:
      return (b16new,'Fixed_1')
   else:
      return (b16new,None)

The SecurePrint stuff hasn't been released yet, but the basis of it is here.  It's a 56-bit random key with an 8-bit checksum, forced through the same KDF that is used for wallet encryption.  But it uses a hardcoded 16 MB of RAM to guarantee sufficient stretching.  On my i5-2500k that it takes a little over 0.5s to stretch the SecurePrint key into the encryption key.   I questioned whether 56-bits was sufficient, but I needed to avoid having a key that was as long as the data it was protecting (BIP 32 will use only a 128-bit seed, so the SecurePrint code is half of that).  Given the stretching, I determined that it was. 
legendary
Activity: 1232
Merit: 1094
Thanks.

I didn't realise that the info was so direct.  I thought you did some post processing to go from the info in the paper wallet to the actual wallet keys.  I guess that's a waste since the paper wallet has no protection.

You do have a system for the "printer-safe" version?

So, basically, for the paper wallets;

Each line is a hex string (with different letter to number mappings)
36 letters = 32 (value) + 4 (hash)

This means that an error will be missed one time in 65536, since the hash is 2 bytes, so the checking the wallet id is important.

If the CRC mismatches, it tries every possible value for each letter, keeping the others constant, and rechecks the hash.  This fixes 1 letter errors.

This is 16 bytes per line which give 32 bytes each for the chain code and private key.

Do you object to people using the system for other software?
legendary
Activity: 1428
Merit: 1093
Core Armory Developer
The code that does it is here:

https://github.com/etotheipi/BitcoinArmory/blob/master/cppForSwig/EncryptionUtils.cpp#L672

Unlike BIP 32, each private key is dependent on the previous key, instead of dependent on the parent.  The exact equation is:

PrivKey(i+1) = (ChainCode XOR Hash256(PubKey(i)) * PrivKey(i)
PubKey(i+1) = (ChainCode XOR Hash256(PubKey(i)) * PubKey(i)

Where * represents multiplication-mod-N (where N is the order of the secp256k1 group), and * represents scalar multiplication of an elliptic curve point.  Recall that PrivKey is a 32-byte integer, and PubKey is a (x,y) pair of 32-byte integers representing a point on the secp256k1 elliptic curve.

I do not remember why I chose [Chain XOR Hash(Pub)] instead of [Hash(Chain XOR Pub)].  The latter would've been preferable, but the one used is still sufficient.
legendary
Activity: 1232
Merit: 1094
Is there a description of the algorithm somewhere?

This is obviously a 2 step process.

- paper backup codes to root private key
- root private key to deterministic key[ i ]

I know the latest version has a system where the paper backup code is combined with a "printer safe" system.
Jump to: