Pages:
Author

Topic: Proposal: Base58 encoded HD Wallet root key with optional encryption - page 3. (Read 21001 times)

member
Activity: 116
Merit: 10
This thread hasn't had a lot of activity recently. Here's what I'd say:

Bloom filter based on SHA(SHA(privkey)) is interesting. Allows for plausible deniability by encoding multiple wallets (different wallets are accessed by using a different password). If OP wants to change the spec to a simple bloom filter, I'd be in support of that.

I guess 2 passwords is sufficient and the collision chance is significantly lower than the 4 password proposal I gave in this thread. However, I would like some more feedback from others before making this change.

English description:
Create a real password P1 and a "fake" password P2. The user can specify P2 if they want a "fake" password that will decrypt to a valid wallet. If the user doesn't want a "fake" password, P2 is to be randomly generated (so all users have a similar bloom filter). For each password P0...Pn:
    1. Decrypt the wallet and calculate the master secret key Kn
    2. Calculate SHA(SHA(Kn)) and call this Hn  (Hn[0:4] is currently used in the spec as a wallet checksum)
    3. For each of the first 11 bytes in Hn (let's call each byte Hn_i):
         a. Take the lowest 5 bits of Hn_i. Call this number (which ranges from 0 to 31) D
         b. Set the Dth bit of the bloom filter to 1

Seems simple enough.

I think the existing date field is sufficient for optimization. It doubt it would be worth the effort to put more detail in the wallet spec. I would like to hear from gmaxwell and other client devs on this, since they're the authority on SPV optimization. Yes, we can optimize scan times by putting restrictions on the order of wallet generation, but I think that belongs in a separate BIP.

I posted a question to the bitcoin-dev mailing list asking for feedback on HD wallet import strategies. I haven't had any replies.

member
Activity: 98
Merit: 10
This thread hasn't had a lot of activity recently. Here's what I'd say:

Bloom filter based on SHA(SHA(privkey)) is interesting. Allows for plausible deniability by encoding multiple wallets (different wallets are accessed by using a different password). If OP wants to change the spec to a simple bloom filter, I'd be in support of that. Proposed algorithm:

let m (number of bits in field) = 32
let n (number of passwords inserted) = 2 (assuming 1 "real" password and 1 "fake" password. This is only for optimization; the algorithm supports more)

To minimize the probability of a false collision, we want to use (m/n)ln(2) "hash functions" per password, so

let k (number of set bits per password) = (32/2)ln(2) = 11.09 = ~11

The probability of false positives is (1-e^(-kn/m))^k = (1-e^(-22/32))^11 =  ~.000458

Now, this is much larger than the chance of a false positive with just straight up SHA(SHA(privkey)), but the ability to add arbitrary passwords is pretty awesome. I would like to encourage discussion on this. I am honestly on the fence about this one; both are good solutions.

SHA(SHA(privkey)):
    pro: Very low chance of the user accidentally opening wallet with wrong password
    con: If compelled to decrypt wallet, there's no easy way to avoid actually giving them your real wallet info
Bloom filter:
    pro: Very easy to add a fake/secondary wallet (which you can actually put a little bit of money in) with any fake password you want. Plausible deniability.
    con: Higher chance of user incorrectly entering password.

Pseudocode:
Code:
filter = 0 # 32 bit integer
valid_passwords = [user_password, fake_password] # Can be any number of passwords. To preserve plausible deniability of all users, the spec should mandate a randomly generated fake password if the user doesn't want one
for password in valid_passwords:
     privkey = decrypt_wallet(password).calculate_master_secret()
     hash = SHA(SHA(privkey))
     for i in range(0,11):
          filter |= 1 << (hash[i] & 0x1F) # Sets a random bit in the filter to 1

English description:
Create a real password P1 and a "fake" password P2. The user can specify P2 if they want a "fake" password that will decrypt to a valid wallet. If the user doesn't want a "fake" password, P2 is to be randomly generated (so all users have a similar bloom filter). For each password P0...Pn:
    1. Decrypt the wallet and calculate the master secret key Kn
    2. Calculate SHA(SHA(Kn)) and call this Hn  (Hn[0:4] is currently used in the spec as a wallet checksum)
    3. For each of the first 11 bytes in Hn (let's call each byte Hn_i):
         a. Take the lowest 5 bits of Hn_i. Call this number (which ranges from 0 to 31) D
         b. Set the Dth bit of the bloom filter to 1


I think the existing date field is sufficient for optimization. It doubt it would be worth the effort to put more detail in the wallet spec. I would like to hear from gmaxwell and other client devs on this, since they're the authority on SPV optimization. Yes, we can optimize scan times by putting restrictions on the order of wallet generation, but I think that belongs in a separate BIP.
legendary
Activity: 1400
Merit: 1013
I look forward to your blog post. Please also address my concern there that the customer would need more sophisticated wallet maintaining his master public at the business and his last used sequence across his devices / wallet instances otherwise it converges to a more complicated, but equivalent model of the business providing the address.
http://bitcoinism.blogspot.com/2014/01/business-accounting-and-bitcoin-privacy.html
hero member
Activity: 836
Merit: 1030
bits of proof
This all seems needlessly complex just to facilitate reasonable scan times. How does Electrum do it from just a seed?

You do not need to get the details why this is hard, just recognize that the only reason we have key birth date is because it reduces scan time. This lesson was learned while importing ordinary (WIF) keys or re-scanning the wallet.

Now, developers who implement a HD scan will recognize that birth date is not sufficient for reasonable scan time, but at least a look ahead window is needed to master the task and that eventually birth date might be re-set if start-index is also there. Certainly these assumptions require that the Wallet software that uses HD complies with that. I do not see why this would be a problem why is it important to protect freedom of key sequence use. The point of HD is to avoid key reuse and loss through missing backups. Both can be achieved even if there are restrictions on what key to use next.
member
Activity: 98
Merit: 10
This all seems needlessly complex just to facilitate reasonable scan times. How does Electrum do it from just a seed?

Re. bloom filters: I think we may have a different idea of how to implement that. My suggestion:
SHA(SHA(privkey)) is used, instead of as a checksum, as a bloom filter element.
The user enters any other passwords they want to "allow", and the subsequent private key is generated. SHA(SHA(privkey2)) is also added to the bloom filter.

The user, then, can choose an arbitrary number of wallets to encode, with arbitrary passwords. This makes memorization easy, and they won't have to tell anyone "Oh yeah, my password is #!fb3$". They can just make it "jesus" or "1234" or whatever and but a few mBTC in that wallet. Fewer wallets means better password checking reliability, of course.
hero member
Activity: 836
Merit: 1030
bits of proof
But instead of making us ask, why not just share your scanning algorithm here for the people reading along?

The algorithm is outlined in this thread and I open sourced the BOP Bitcoin implementation that includes this. I continue improving it free of charge for the community and even implemented this proposal there. Can I substantiate my claims better?
hero member
Activity: 836
Merit: 1030
bits of proof
Certainly, my solution uses a minuscule key space, to have that is the whole point of its assumptions.

Well, I retreat from arguing with an algorithm that is not implemented and postpone discussion of this topic until I see an actual implementation of what you describe or until you recognize the nature of the problem.

Regarding brute forcing the check sum: It does not really matter if it takes days or weeks. It is a precautionary measure to create a fake solution in addition to the real one. One only have to have it ready in case the plausible denial should be exercised, which is not a panned event in the near future. The harder the problem is the more plausible the denial. As said earlier passphrase input could be BIP39 encoded, this way any brute forced bit pattern will look equally plausible.
member
Activity: 116
Merit: 10
So you create 10 million addresses of the HD root - lets ignore that this is already a computation intensive task -, then you ask a giant bloom filter you previously populated with addresses used on the block chain and bingo you figure that 100.000 (1%) of them is possible used on the block chain. Does that tell you anything about the funds on them, nope. You still need to fetch 100.000 using some traditional database index on address. Does that tell you the funds on them ? nope. Since even those you feteched might be spent already and you would need to also fetch transaction eventually spending those outputs. Is this sufficient? nope You are marely done with 1/400th of the key space. This is you call feasible algorithm? me not.

There is no need to populate the scanner with 10M addresses, I merely used that number to indicate that it scales very well countering your statement on page 4 of this thread:

I can not see how you implement scaleable discovery of funds associated with a root without these or similar assumptions.

The argument that I have to scan the entire 2^32 keyspace is silly. It only needs to scan / move the window along until it reaches the top block and you're not even going to get close to the 2^32 address limit in any dimension of the tree. I'm quite sure your solution doesn't scan the 2^32 keyspace either.

BTW, I am familiar with the bloom filter and use in my Bitcoin implementation not only for SPV support.

The way you asked the question and the fact that you referred to a bloom filter as a 'complex concept' made me think you hadn't used bloom filters before. My bad.

If brute forcing the checksum for an alternate password is not feasible with 4 bytes then reduce the checksum length instead of introducing a more complex concept.

In any event, brute forcing a password to fit a checksum is silly. This could take hours, even days. I'd have to reduce the checksum down to 2 or even 1 byte depending on the used KDF to make this a feasible method of finding an alternate password.

You should rather be more curious of how I solved the problem of figuring funds for HD root instead of proclaiming without any practical experience that the parameters I propose are not needed.

I'm not 'proclaiming' anything. I'm merely offering an algorithm that scales fine. I'm sure your implementation works magnificently for your HD wallet usage, that doesn't mean that it works for everyone.  But instead of making us ask, why not just share your scanning algorithm here for the people reading along?

hero member
Activity: 836
Merit: 1030
bits of proof
Just a note for thread readability to others trying to follow this:
payee = recipient ; payor = sender of payment

Payee appears to be used to refer to the payor in this thread?

My fault. I used customer = payee although the customer in the discussion was payor.
hero member
Activity: 836
Merit: 1030
bits of proof
member
Activity: 130
Merit: 10
Just a note for thread readability to others trying to follow this:
payee = recipient ; payor = sender of payment

Payee appears to be used to refer to the payor in this thread?
member
Activity: 116
Merit: 10
What do you build the bloom filter on? What does a hit tell you and how does that help you to reconstruct funds accessible to keys derivable from the HD root? Forget the tree for the moment just enlighten me on how you would handle a flat key space of size 2^32 used in random order.

Granted, you wouldn't be able to handle a 2^32 space in random order without constructing a bloom filter of that size, but when you create a ridiculously large 'window' of, say, 10 million addresses, and start scanning with that, you basically no longer care much about the fact that payments may come in out of sequence even if you give the addresses out in sequence (mostly because the out-of-order-ness will be in the 100's or 1000's range).

You start constructing a second bloom filter when you start to hit addresses that are getting close to the upper limit that you're currently scanning (say, 500,000 under your 10m boundary) and slowly keep adding to the second filter. When you haven't found a positive hit in the first filter, for X blocks / transactions / whatever metric you want to use, you can discard it and start working on the third filter, etc.

As for what you build a bloom filter on, it's the same bloom filter that SPV clients use to request transactions from peers. They send a bit field with randomly set bits that correspond to indices generated by hash values based on the address used (either sender or receiver). The number of indices (what I called hash functions previously) and the bloom filter size can be varied. In the case of Bitcoin-qt / bitcoinj, the bloom is limited to 36,000 bytes maximum. It's a flat array, so lookups are constant regardless of size or number of elements covered.

The filter guarantees that all addresses you're looking for will pass the filter, but the reverse is not true. It's possible that unrelated addresses can pass the filter (and in bitcoin-qt / bitcoinj this is actually used to preserve privacy, by dirtying up the filter a bit).

It's a really fast and lightweight way to filter big amounts of data down to mostly what you're looking for plus some cruft. Then you can move on to more expensive filters, like hash tables and finally doing exact matches.

Here's the bloom filter wiki if you want to read up on it some more: http://en.wikipedia.org/wiki/Bloom_filter
hero member
Activity: 836
Merit: 1030
bits of proof
We have not a 1 or 10 million possible addresses but 2^32 that is 429 times 10 million. Scale your numbers accordingly. Also keep in mind that retrieving those 1% false positives is not cheap plus lookup is usually not just for transactions benefiting an address but for transactions that spend that output.

I think you will revert to the suggestions after you attempted to implement a HD scan. I did.

There's no reason to scan all possible addresses at the same time. You could make a sliding window with a bloom filter the same way you would with your 100 address lookahead. That 1% will be culled mostly by that hash table that you've got after it passes the bloom filter. And the last step with the branch reconstruction culls 100% of the false positives that managed to get through.

Either way, there's no need to include a lookahead count in the encoding.

As for the sequence start, what's the value of having a non zero sequence start number?

I do not get it.

What do you build the bloom filter on? What does a hit tell you and how does that help you to reconstruct funds accessible to keys derivable from the HD root? Forget the tree for the moment just enlighten me on how you would handle a flat key space of size 2^32 used in random order.
member
Activity: 116
Merit: 10
We have not a 1 or 10 million possible addresses but 2^32 that is 429 times 10 million. Scale your numbers accordingly. Also keep in mind that retrieving those 1% false positives is not cheap plus lookup is usually not just for transactions benefiting an address but for transactions that spend that output.

I think you will revert to the suggestions after you attempted to implement a HD scan. I did.

There's no reason to scan all possible addresses at the same time. You could make a sliding window with a bloom filter the same way you would with your 100 address lookahead. That 1% will be culled mostly by that hash table that you've got after it passes the bloom filter. And the last step with the branch reconstruction culls 100% of the false positives that managed to get through.

Either way, there's no need to include a lookahead count in the encoding.

As for the sequence start, what's the value of having a non zero sequence start number?

hero member
Activity: 836
Merit: 1030
bits of proof
BTW even instantiating 2^32 addresses is a significant task (2^32 HmacSHA512 + 2^32 * EC add + 2^31 EC multiplication + 2 * SHA256 + RIPEM160) then you have a few hashes to look up a bloom filter of 4 GB to identify 1% false positives, that is 42 million
random accesses to a database.

And all you have with that are transactions that deposited to the address. Then you need further passes to find those transactions spending those deposits.

go for it.
hero member
Activity: 836
Merit: 1030
bits of proof
I think it's impossible to enforce that addresses are used in sequence, considering that customers won't pay in sequence, refund addresses given to suppliers could be funded months after issuance, etc.
Assuming that the master public is not disclosed, then it is only the owner of the master key capable to add to the set of known members of the sequence. He can impose a maximum gap by not disclosing more than w new addresses until one of them is used on the block chain.

So how would you scan the blockchain to reconstruct the tree? Well, it's an issue of scale mostly. Using lookahead windows is a quite narrow view and will cause problems very quickly.

So instead, construct a bloom filter that will filter the blockchain a few days since birth of the root key. Constructing a bloom filter that scans for a million possible addresses would take 1,198,132 bytes, with 6 hash functions for a false positive hit rate of 1%. No need to keep all those addresses in memory.

Second, you can construct a hash table with a 4 byte partial hash of the address as key, that will then give you the hierarchy where the key lives. If you get a hit on the hash table, you can then reconstruct the branch and do a full address match. Should it hit, you can keep the branch in memory, and even expand your coverage, should the address live near an edge of the tree currently in the bloom filter / hash table.

Even if you were to expand this to 10 million addresses, you'd still only need 11,981,322 bytes of memory for the bloom filter. The hash table is a little heavier, but is still quite manageable.

In other words, this scales quite well and does a far better job at finding relevant transactions than any kind of window based scanning can ever do.

We have not a 1 or 10 million possible addresses but 2^32 that is 429 times 10 million. Scale your numbers accordingly. Also keep in mind that retrieving those 1% false positives is not cheap plus lookup is usually not just for transactions benefiting an address but for transactions that spend that output.

I think you will revert to the suggestions after you attempted to implement a HD scan. I did.
member
Activity: 116
Merit: 10
I think it's impossible to enforce that addresses are used in sequence, considering that customers won't pay in sequence, refund addresses given to suppliers could be funded months after issuance, etc.

So how would you scan the blockchain to reconstruct the tree? Well, it's an issue of scale mostly. Using lookahead windows is a quite narrow view and will cause problems very quickly.

So instead, construct a bloom filter that will filter the blockchain a few days since birth of the root key. Constructing a bloom filter that scans for a million possible addresses would take 1,198,132 bytes, with 6 hash functions for a false positive hit rate of 1%. No need to keep all those addresses in memory.

Second, you can construct a hash table with a 4 byte partial hash of the address as key, that will then give you the hierarchy where the key lives. If you get a hit on the hash table, you can then reconstruct the branch and do a full address match. Should it hit, you can keep the branch in memory, and even expand your coverage, should the address live near an edge of the tree currently in the bloom filter / hash table.

Even if you were to expand this to 10 million addresses, you'd still only need 11,981,322 bytes of memory for the bloom filter. The hash table is a little heavier, but is still quite manageable.

In other words, this scales quite well and does a far better job at finding relevant transactions than any kind of window based scanning can ever do.
hero member
Activity: 836
Merit: 1030
bits of proof
There is an other reason why the receiver has to provide a payment address with the bill: By doing so, receiver declares that he is in (exclusive) control of the key that belongs to that address, so a payment made to that is received by him. It is not deniable.

The receiver however has to be able to revoke that declaration for the future, since he might lose keys or might have legitimate reasons for using a new master key generating the addresses (be it for performance or a merger). The simplest way of implicit revoke is to agree that with any new bill, and new address with it, all previous addresses are no longer valid for payment. There should also be an explicit expiry for the bill, so a new one with new address is issued if the previous is not paid.

Add the requirement that new bills should indicate net outstanding amount and the option that a bill could offer more than one new address and we are all happy.

Last message on-topic:
https://bitcointalksearch.org/topic/m.4180842
hero member
Activity: 836
Merit: 1030
bits of proof
The business would also have to re-tell the master public in case the customer does not have it at hand and we are back to square one.
Not really. An xpub + index hint does not restrict the customer's client in the same way a single address or static list of addresses restricts the client.
I do not see the fundamental difference between xpub + index and an address list to choose from. The business has to limit the index range it monitors for performance reasons, that is effectively giving an address list.

Actually the business has to send xpub with every bill otherwise we would need yet an other more complex protocol for key revoke in case previously used key was compromised.

I think that for reasons of performance at the receiver side, simplicity and privacy at the payee side, it makes sense that the receiver generates a choice of addresses valid for any payment starting with the time point of that bill.

And yes, receiver should track outstanding amount by netting all previous bills and funds received on any address previously communicated. That however is a requirement of bookkeeping either.
legendary
Activity: 1400
Merit: 1013
The business would also have to re-tell the master public in case the customer does not have it at hand and we are back to square one.
Not really. An xpub + index hint does not restrict the customer's client in the same way a single address or static list of addresses restricts the client.
Pages:
Jump to: