Pages:
Author

Topic: Deterministic wallets - page 8. (Read 48413 times)

sr. member
Activity: 360
Merit: 251
April 23, 2013, 04:48:43 AM
This isn't exactly a use case, true. It is just something that is possible if I is independent of K_par. I still have to see an argument why I should depend on K_par. There may well be some arguments but I haven't seen any yet.

Well, one argument why we should depend on K_par is that with the updated BIP32 we allow either type-1
 or type-2 derivation from the current node.
With your proposal we will have a tree of hashchains c_i=hash(c_par,i) where you derive deterministically the keypair (k_i,K_i) from each c_i (BTW I'm not sure why you'd want to split I to I_L and I_R instead of having single 256-bits I that's used both for deriving the current keypair and the next chaincode), so indeed it gives the benefit over the OP because you can share some c_i in the middle of the hashchain with another person.
But if you unlink the chaincodes and from the keys, then you canot disallow type-2 derivation when the input to the CKD is just the pseudorandom hash value c_i, meaning that logically k_i=CKD(k_par,c_i), unless you could tag c_i somehow, or enforce the derivation type via CKD(k_par,c_par,i) as BIP32 does now, but then the property that the chaincodes and keys are unlinked becomes problematic.
In general, if the chaincodes and keys are unlinked then you could prepare an entire wallet layout starting from a master chaincode without any keys, and then for each master privkey that you plug in you would get different keys in the tree. I'm not so sure whether having the ability to do such things is a good idea.

It doesn't matter how K'_par is created by A, whether deterministically or not. The point is that B reuses the same I_L (or I'_L=I_L) to derive K'_i as he did to derive K_i.

This isn't exactly a use case, true. It is just something that is possible if I is independent of K_par. I still have to see an argument why I should depend on K_par. There may well be some arguments but I haven't seen any yet.

If you want a use case: A has (Kpar,cpar) and publishes Kpar. B instantiates itself and contacts A: "I want to be your sales agent". A increments i for the new agent, computes I = HMAC-SHA512(cpar,i), sends I to B (encrypted and authenticated communication). Now B is fully instantiated. B considers I as his random "name" that A has chosen for him. A has a table of number-name pairs (i,I). This table can be fully reconstructed from cpar. B can derive his public key K_i that he will work with by himself, from his "name" I and the public Kpar. Later in time A may come up with a new line of business, which required separate accounting. That's when A creates a new K'par (deterministally or not, as he choses). A publishes K'par. But A doesn't want to create new "names" for his sales agents. The sales agents that choose to participate in the new business line can start right away without further communication with A. They just derive their K'i from K'par and their name I.

I still don't see how exactly it could be that "A creates a new K'par" and follow the same old hashchain of chaincodes with the new K'par, since if we use (say) the multiplicative variant then there's a correspondence between the keys and chaincodes, so the chaincode that created Kpar doesn't create K'par, and if K'par is created with a different chaincode then the subsequent hashchain would also be different. I think that your proposal is simply missing the relevant details on how wallet structures can be created. If you fully specify what exactly is allowed to be created in wallet.dat, and then show how your practical use case applies, then it'd be easier for us to consider your ideas.
member
Activity: 104
Merit: 10
April 23, 2013, 01:33:10 AM
Scenario is this: Server A is holding (K_par,c_par). Server B is being set-up in a different location and is supposed to hold (K_i,c_i). All chaincodes are to be kept secret by the servers holding them, i.e. c_par shall only be known to A (not B) and c_i shall only be known to A and B (no third party). It is irrelevant here who knows K_par and K_i. So B does not derive anything itself. Instead, A does the derivation of (K_i,c_i) and sends the result to B. Of course, the communication of A to B must be authenticated. The point is that on top of that at least c_i must (in the general case) be sent in encrypted form.

So in this scenario, why wouldn't you need to send c_i in encrypted form with your proposal?

In the scenario itself you always need to send c_i, and if you send c_i then it has to be encrypted. I am assuming that the scenario above has been completed and, at a later time, A wants to establish a second tree of pubkeys (for a different purpose, or update). First of all, I made a mistake in my proposal. It should be
Code:
(IL,IR) = HMAC-SHA512(cpar,i)
so neither of IL,ci=IR depends on Kpar. Let's assume multiplicative derivation, i.e. K_i=I_L*K_par.
In the above scenario I forgot to say that B knows "his" I_L, i.e. the I_L for which K_i=I_L*K_par. This implies that B knows K_par (before, I said this was irrelevant - its not). In fact, I assume K_par is allowed to be public. With this proposal, A creates a new K'_par and sends it to B, and B computes K'_i=I_L*K'_par, re-using his I_L. We do not need an encryption simply because c_i is not re-transmitted, only K'_par is. And I argue that K'_par does not need to be encrypted.

What do you mean by "A creates a new K'_par and sends it to B" ?
Do you mean that A uses c_parpar to compute HMAC-SHA512(c_parpar,i') to get I_L' and I_R' and derive K'_par from I_L' ? If so, B wouldn't have I_L' and I_R', so the encrypted I_R' has to be sent to B ? If I misunderstood then please try to explain it a bit more clearly?
Also, what you're describing isn't exactly a "use case", could you please elaborate on the practical objective that we'd achieve here?

It doesn't matter how K'_par is created by A, whether deterministically or not. The point is that B reuses the same I_L (or I'_L=I_L) to derive K'_i as he did to derive K_i.

This isn't exactly a use case, true. It is just something that is possible if I is independent of K_par. I still have to see an argument why I should depend on K_par. There may well be some arguments but I haven't seen any yet.

If you want a use case: A has (Kpar,cpar) and publishes Kpar. B instantiates itself and contacts A: "I want to be your sales agent". A increments i for the new agent, computes I = HMAC-SHA512(cpar,i), sends I to B (encrypted and authenticated communication). Now B is fully instantiated. B considers I as his random "name" that A has chosen for him. A has a table of number-name pairs (i,I). This table can be fully reconstructed from cpar. B can derive his public key K_i that he will work with by himself, from his "name" I and the public Kpar. Later in time A may come up with a new line of business, which required separate accounting. That's when A creates a new K'par (deterministally or not, as he choses). A publishes K'par. But A doesn't want to create new "names" for his sales agents. The sales agents that choose to participate in the new business line can start right away without further communication with A. They just derive their K'i from K'par and their name I.
sr. member
Activity: 360
Merit: 251
April 22, 2013, 03:11:38 PM
Scenario is this: Server A is holding (K_par,c_par). Server B is being set-up in a different location and is supposed to hold (K_i,c_i). All chaincodes are to be kept secret by the servers holding them, i.e. c_par shall only be known to A (not B) and c_i shall only be known to A and B (no third party). It is irrelevant here who knows K_par and K_i. So B does not derive anything itself. Instead, A does the derivation of (K_i,c_i) and sends the result to B. Of course, the communication of A to B must be authenticated. The point is that on top of that at least c_i must (in the general case) be sent in encrypted form.

So in this scenario, why wouldn't you need to send c_i in encrypted form with your proposal?

In the scenario itself you always need to send c_i, and if you send c_i then it has to be encrypted. I am assuming that the scenario above has been completed and, at a later time, A wants to establish a second tree of pubkeys (for a different purpose, or update). First of all, I made a mistake in my proposal. It should be
Code:
(IL,IR) = HMAC-SHA512(cpar,i)
so neither of IL,ci=IR depends on Kpar. Let's assume multiplicative derivation, i.e. K_i=I_L*K_par.
In the above scenario I forgot to say that B knows "his" I_L, i.e. the I_L for which K_i=I_L*K_par. This implies that B knows K_par (before, I said this was irrelevant - its not). In fact, I assume K_par is allowed to be public. With this proposal, A creates a new K'_par and sends it to B, and B computes K'_i=I_L*K'_par, re-using his I_L. We do not need an encryption simply because c_i is not re-transmitted, only K'_par is. And I argue that K'_par does not need to be encrypted.

What do you mean by "A creates a new K'_par and sends it to B" ?
Do you mean that A uses c_parpar to compute HMAC-SHA512(c_parpar,i') to get I_L' and I_R' and derive K'_par from I_L' ? If so, B wouldn't have I_L' and I_R', so the encrypted I_R' has to be sent to B ? If I misunderstood then please try to explain it a bit more clearly?
Also, what you're describing isn't exactly a "use case", could you please elaborate on the practical objective that we'd achieve here?
hero member
Activity: 836
Merit: 1030
bits of proof
April 22, 2013, 11:23:08 AM
And here are the unit test cases

https://github.com/bitsofproof/supernode/blob/lean/api/src/test/resources/ExtendedKey.json

of serialization both production and test, using two hierarchy levels and indices 0x80000000, 0x80000001, 0, 1
hero member
Activity: 836
Merit: 1030
bits of proof
April 22, 2013, 11:05:39 AM
I give up arguing for being explicit, and will also follow those conventions just as if they were specified.

Please don't - we really need to work together to get the conventions and specifications at least "singing from the same hymn book".

(maybe "guidelines" to go along with the specifications?)

No worries. I am willing to comply with what is written into the proposal. I tried to include more, if not applicable for the spec, then lets add a recommendations section enumerating, that:

1. use consecutive indexes at generation
2. record highest generated index per level
3. do not use more than 3 levels
4. record time point an extend key (level) was created

Any other?

I actually implemented the spec as close as it can get here: (not yet merged to main of bits of proof):

https://github.com/bitsofproof/supernode/blob/lean/api/src/main/java/com/bitsofproof/supernode/api/ExtendedKey.java
legendary
Activity: 1072
Merit: 1189
April 22, 2013, 10:58:53 AM
I don't want to specify how wallet software should work.
OK, you do not want to specify but expect that they follow conventions.
I give up arguing for being explicit, and will also follow those conventions just as if they were specified.

If you have a reason to deviate, do so. If you don't have a reason to deviate, don't.

I'm just saying that applications and their expectations will differ, and it is unreasonable that they will all behave in the same way to the smallest detail. And that's just fine.

I feel that how different applications deal with backups and recovery is such a detail. Perhaps it makes sense to add a section with recommendations about this, but I can't expect everyone to follow them. An application may just start with a new seed after a time and tell the user to make a new backup. It may keep track of the necessary gap to do a scanning recovery, and warn the user when some maximal configured gap is exceeded. It may rely on a local fast-accessible indexed UTXO to search a huge amount of addresses. It may automatically backup the highest used indexes to dropbox. I don't think we know the best practices here, so I don't see a reason to fix them.
legendary
Activity: 1890
Merit: 1086
Ian Knowles - CIYAM Lead Developer
April 22, 2013, 10:53:45 AM
I give up arguing for being explicit, and will also follow those conventions just as if they were specified.

Please don't - we really need to work together to get the conventions and specifications at least "singing from the same hymn book".

(maybe "guidelines" to go along with the specifications?)
hero member
Activity: 836
Merit: 1030
bits of proof
April 22, 2013, 10:49:27 AM
I don't want to specify how wallet software should work.
OK, you do not want to specify but expect that they follow conventions.
I give up arguing for being explicit, and will also follow those conventions just as if they were specified.
legendary
Activity: 1072
Merit: 1189
April 22, 2013, 10:20:40 AM
I don't want to specify how wallet software should work.
legendary
Activity: 1890
Merit: 1086
Ian Knowles - CIYAM Lead Developer
April 22, 2013, 10:19:52 AM
This does strike me as heading towards "over-engineering" - I think the simpler it can be kept the better.

Great designs *allow* for extension but don't try and be everything to everyone all at once.
hero member
Activity: 836
Merit: 1030
bits of proof
April 22, 2013, 10:17:08 AM
Keys generated within a single sub account might also use any random index.

That's just complicating things for yourself, but yes, you can.

grau, I think the part you are missing is that there is a "convention" tied in with BIP 32. 

I think the proposal misses to make these conventions explicit, and I tried to point out what problems this imposes. In fact I would want to go further, not calling them conventions but be explicit part of the "standard".

The suggestions I made would put further limits on how the proposal is used for the sake of better scan performance and simpler backup procedures.
legendary
Activity: 1428
Merit: 1093
Core Armory Developer
April 22, 2013, 09:51:57 AM
Keys generated within a single sub account might also use any random index.

That's just complicating things for yourself, but yes, you can.

grau, I think the part you are missing is that there is a "convention" tied in with BIP 32.  You could do arbitrary things with BIP 32, and use key gaps of 1,000,000 keys, or use arbitrary 32-byte strings for your child indices, or go deeper than three levels.

But a "standard" implementation of BIP 32 will not do that.  BIP 32 specifies a set of tools that let you do all these arbitrarily complex things, but also describes a convention that avoids the problems you are talking about.  Don't skip wallets, don't use arbitrary 32-byte IDs, don't go past level 3.  If you violate any of these things, you won't be compatible with other implementations of BIP 32, and you'll be forced to address all the problems you are bringing up.  

Look at BIP 32 as a flexible, extensible language for addresses tress and also a prescription for using that is generally accepted.  By generally accepted, I mean that if you follow this prescription, you should be able to import your root private key into Armory, Bitcoin-Qt, Electrum, Multibit, etc, and it will work.  

EDIT: But the simplicity and flexibility of the design allows for us to expand into more complicated things, later, or for devs to do their own "proprietary" deviation of this prescription, and not have to come up with a whole new deterministic wallet algorithm (if they don't mind being incompatible with other apps). 
legendary
Activity: 1072
Merit: 1189
April 22, 2013, 09:45:41 AM
Keys generated within a single sub account might also use any random index.

That's just complicating things for yourself, but yes, you can.
member
Activity: 104
Merit: 10
April 22, 2013, 09:15:00 AM
I expect additive and multiplicative derivation to be equivalent in all "theoretical" aspects of security and anonymity. By "theoretical" I mean disregarding timing attacks, complexity, etc.

Maybe this was too fast. If we re-use multipliers with different pubkeys there seems to be a difference. Consider
Code:
K_i=I_L*K_par, K'_i=I_L*K'_par
vs
Code:
K_i=I_L*G+K_par, K'_i=I_L*G+K'_par
Assume all these pubkeys appear on the blockchain. By computing all differences of all pairs of pubkeys on the blockchain an attacker who does not know I_L can link the pair (K_i,K'_i) to (K_par,K'_par) in the second scheme because K_i-K'_i=K_par-K'_par, but not in the first scheme where K_i-K'_i=I_L*(K_par-K'_par).
hero member
Activity: 836
Merit: 1030
bits of proof
April 22, 2013, 09:02:37 AM
The time point a key was created would (if serialized) also greatly aid the reconstruction of the wallet since it would reduce the scan to blocks after that time point.
member
Activity: 104
Merit: 10
April 22, 2013, 08:59:53 AM
Scenario is this: Server A is holding (K_par,c_par). Server B is being set-up in a different location and is supposed to hold (K_i,c_i). All chaincodes are to be kept secret by the servers holding them, i.e. c_par shall only be known to A (not B) and c_i shall only be known to A and B (no third party). It is irrelevant here who knows K_par and K_i. So B does not derive anything itself. Instead, A does the derivation of (K_i,c_i) and sends the result to B. Of course, the communication of A to B must be authenticated. The point is that on top of that at least c_i must (in the general case) be sent in encrypted form.

So in this scenario, why wouldn't you need to send c_i in encrypted form with your proposal?

In the scenario itself you always need to send c_i, and if you send c_i then it has to be encrypted. I am assuming that the scenario above has been completed and, at a later time, A wants to establish a second tree of pubkeys (for a different purpose, or update). First of all, I made a mistake in my proposal. It should be
Code:
(IL,IR) = HMAC-SHA512(cpar,i)
so neither of IL,ci=IR depends on Kpar. Let's assume multiplicative derivation, i.e. K_i=I_L*K_par.
In the above scenario I forgot to say that B knows "his" I_L, i.e. the I_L for which K_i=I_L*K_par. This implies that B knows K_par (before, I said this was irrelevant - its not). In fact, I assume K_par is allowed to be public. With this proposal, A creates a new K'_par and sends it to B, and B computes K'_i=I_L*K'_par, re-using his I_L. We do not need an encryption simply because c_i is not re-transmitted, only K'_par is. And I argue that K'_par does not need to be encrypted.
hero member
Activity: 836
Merit: 1030
bits of proof
April 22, 2013, 08:57:42 AM
Where does the 2^32 figure come from?
The proposal does not require consecutive generation of the keys but defines a 32 bit integer as index.

Sure, but I was talking about the default wallet layout, with consecutive indices (1,2,3,...) for each subaccount.
And I believe that other layouts that users might choose to use would be just as easy to scan.
Keys generated within a single sub account might also use any random index.
sr. member
Activity: 360
Merit: 251
April 22, 2013, 08:51:41 AM
Where does the 2^32 figure come from?
The proposal does not require consecutive generation of the keys but defines a 32 bit integer as index.

Sure, but I was talking about the default wallet layout, with consecutive indices (1,2,3,...) for each subaccount.
And I believe that other layouts that users might choose to use would be just as easy to scan.
hero member
Activity: 836
Merit: 1030
bits of proof
April 22, 2013, 08:38:51 AM
Where does the 2^32 figure come from?
The proposal does not require consecutive generation of the keys but defines a 32 bit integer as index.

The reason I don't like putting a maximum index in the serialized form is because that doesn't tell you what to do when you run out of indices.

An answer could be: re-use keys. This reduces privacy, but it does not have to be without prior warning: e.g. the client could warn if 90% of key interval is used that the interval should be increased and wallet backup created to avoid reduced privacy as pool exhausted.

I see that the look ahead scan you describe would also practically work (provided increasing not random indices are used), but like the maximum better because it supports the use case of an audit as the serialized key would provide the set of potentially used keys without heuristics involving the block chain content.
sr. member
Activity: 360
Merit: 251
April 22, 2013, 08:27:19 AM
I still believe that for the default wallet layout (subaccounts at level 1 and linear path for each subaccount) and for other non-pathological layouts, you could let your computer run for a couple of hours and retrieve all your privkeys, because the depths of the branches (including branches that have keys that were never active on the blockchain) would be quite small.

I guess a couple of hours will not cut it. Even if you have an UTXO indexed by keys (that is not even the case for the Satoshi client), it is still prohibitive to scan 2^32 possible generated keys. I believe that you either store what much smaller subset was potentially used, or you can not practically reconstruct the set of keys used, which defeats a purpose of the proposal, that is not to have the need of regular backups.

Where does the 2^32 figure come from? I thought that we're talking about a user who has a backup of the master node, and he just needs to scan his subaccount (depth 1) branches to a reasonable depth (say 10000 or 100000 or 1000000 payment addresses in each subaccount, assuming that the user didn't have more than 1 million customers). Why would such a task take 2^32 time?
Pages:
Jump to: