Pages:
Author

Topic: Deterministic wallets - page 9. (Read 48392 times)

sr. member
Activity: 360
Merit: 251
April 22, 2013, 08:13:30 AM
Seems to me that additive vs multiplicative derivation is not the point. Instead it's about how i and c_par are combined. Using c_par^i instead of i*c_par looks different at first sight.

I don't think that c_par^i would give you pseudorandom keypairs. These thought experiments didn't explicitly specify how to derive c_i from c_par, but it appears that you would be claiming that the sequence ((c_{i-1})^i for consecutive i=1,2,3,... is indistinguishable from random data, which isn't true. If we cannot argue that the sequence is pseudorandom, then we cannot prove that we're secure against related-key attacks. So the bottom line is that we'd have to stick with HMAC-SHA2 anyway...
legendary
Activity: 1072
Merit: 1181
April 22, 2013, 08:02:24 AM
I still believe that for the default wallet layout (subaccounts at level 1 and linear path for each subaccount) and for other non-pathological layouts, you could let your computer run for a couple of hours and retrieve all your privkeys, because the depths of the branches (including branches that have keys that were never active on the blockchain) would be quite small.

I guess a couple of hours will not cut it. Even if you have an UTXO indexed by keys (that is not even the case for the Satoshi client), it is still prohibitive to scan 2^32 possible generated keys. I believe that you either store what much smaller subset was potentially used, or you can not practically reconstruct the set of keys used, which defeats a purpose of the proposal, that is not to have the need of regular backups.

My purpose for BIP32 was indeed to make using many keys viable, and not needing frequent backup. That means backups must protect you against loss of coins. It doesn't mean it should be trivial to restore from  a backup, or to reduce your wallet to the blockchain.

When recovering from a backup, use some lookahead like 100 or 1000 keys when scanning (depending on the purpose). If a gap larger than the lookahead is used, I'm sure you'll notice transactions/money is missing after some date. Just retry with a larger lookahead and you're fine. Also note that a large lookahead is only necessary for external subaccounts (so not for change addresses, as you don't expect gaps there at all).

Alternatively, an automatic and low-privacy-risky backup with just the maximal indexes can be used as well, as like iddo suggests. I'm not sure that's necessary in practice though.

The reason I don't like putting a maximum index in the serialized form is because that doesn't tell you what to do when you run out of indices.
sr. member
Activity: 360
Merit: 251
April 22, 2013, 07:53:09 AM
I'm not sure that I understood your use case, could you please explain the exact scenario more clearly? if you had an ancestor chaincode and pubkey, then you can derive the descendant chaincode and there's no necessity to encrypt and send it to you, and if you don't have any ancestor chaincode then an encrypted chaincode must be sent to you, no?

Maybe we do gain some security by having ci depend on Kpar, because commonly just the hashed address of Kpar is published (until the coins get spent).

Scenario is this: Server A is holding (K_par,c_par). Server B is being set-up in a different location and is supposed to hold (K_i,c_i). All chaincodes are to be kept secret by the servers holding them, i.e. c_par shall only be known to A (not B) and c_i shall only be known to A and B (no third party). It is irrelevant here who knows K_par and K_i. So B does not derive anything itself. Instead, A does the derivation of (K_i,c_i) and sends the result to B. Of course, the communication of A to B must be authenticated. The point is that on top of that at least c_i must (in the general case) be sent in encrypted form.

So in this scenario, why wouldn't you need to send c_i in encrypted form with your proposal?
member
Activity: 104
Merit: 10
April 22, 2013, 07:45:50 AM
This makes me wonder whether it's still worth changing, though the implementation complexity argument still holds.

I expect additive and multiplicative derivation to be equivalent in all "theoretical" aspects of security and anonymity. By "theoretical" I mean disregarding timing attacks, complexity, etc.

I've seen people (not only newbies) confuse the discrete log problem with computing n-th roots. The latter problem is easy: from n*K and n you can compute K. This kind of confusion may lead people to wrongly assume that it was impossible to infer K_par from c_i*K_par and c_i, and they would take this as some sort of "anonymity feature" of multiplicative derivation. With additive derivation everybody can see immediately that K_par can be inferred from K_par + c_i*G and c_i.

This would be an argument for additive derivation: it can fool less users, or its properties (and non-properties) are easier to explain.
hero member
Activity: 836
Merit: 1030
bits of proof
April 22, 2013, 07:44:11 AM
I still believe that for the default wallet layout (subaccounts at level 1 and linear path for each subaccount) and for other non-pathological layouts, you could let your computer run for a couple of hours and retrieve all your privkeys, because the depths of the branches (including branches that have keys that were never active on the blockchain) would be quite small.

I guess a couple of hours will not cut it. Even if you have an UTXO indexed by keys (that is not even the case for the Satoshi client), it is still prohibitive to scan 2^32 possible generated keys. I believe that you either store what much smaller subset was potentially used, or you can not practically reconstruct the set of keys used, which defeats a purpose of the proposal, that is not to have the need of regular backups.
sr. member
Activity: 360
Merit: 251
April 22, 2013, 07:25:24 AM
I suggest to set and serialize maximum i for each extended key. Such serialized string would define the wallet keys without the need of a regular backup until maximum number of keys not exceeded.

Huh? Why would regular backups be needed? The major advantage of deterministic wallets is that backups aren't needed (see here).
Assume you lost your wallet storing the unspent transactions, but you have the root key for the BIP32 generator from a cold store. Theoretically nothing is lost since you can scan the block chain for all outputs that can be spent with descendants of the root key, but practically you are not able to do this if you do not know what sequences were used.

Therefore you either have to backup the sequences used or simpler set a limit to them in beforehand that is a practicable magnitude for a scan.

As Pieter mentioned in the link that I gave you, it should be very practical to scan in mostly all cases of wallet layouts, the branching factor and depth are quite tiny.

The assumption there is that if a key with a sequence number is not on the block chain, then higher sequences were not used, right?

I think that would not hold. A shop usually offers unique addresses to receive payment for individual orders. Some orders never get paid, therefore their key will not be present in the block chain, however keys with higher sequence number might be. A reconstruction of the wallet would fail at such a gap.

Right, good point.

I still believe that for the default wallet layout (subaccounts at level 1 and linear path for each subaccount) and for other non-pathological layouts, you could let your computer run for a couple of hours and retrieve all your privkeys, because the depths of the branches (including branches that have keys that were never active on the blockchain) would be quite small.

I'd say that if the user chooses a peculiar wallet layout, then it's up to him to consider whether future backups might be important.

Furthermore, the client software could let you do "public-insensitive" backup that contains just your wallet layout (just the indices i, or the indices i and the pubkeys) which doesn't contain any sensitive info (chaincodes and privkeys), so you could backup just the layout from time to time, if you wish to be extra safe.
hero member
Activity: 836
Merit: 1030
bits of proof
April 22, 2013, 07:04:26 AM
I say this is all implementation specific, and doesn't belong in the serialization.
I thought the advantage of this proposal would be to have a wallet that is able to provide

1. a high number of keys to support privacy
2. does not require regular backup, at least to avoid key loss.

If serialization format (that is the ultimate backup) fails to deliver enough information to recreate all keys that might have output in a practical time span, then it fails the second goal.

If you would add a maximum sequence to the serialization of the extended key then backups would only be needed if a new extended child key is created (sub account), or if the maximum is exhausted and need to be increased to retain the same privacy level.

The client could also warn of loss of privacy and re-use already used keys if hitting the maximum sequence of already known keys, until the user increases the maximum and creates a backup.
legendary
Activity: 1072
Merit: 1181
April 22, 2013, 06:57:23 AM
If K_i = (i*c_par)*K_par then you get the same phenomenon, right? The difference would be (i-i')*c_par*K_par, which is recognizable if (K_par,c_par) is reused.

You're right. Even here, addition doesn't really change anything.
legendary
Activity: 1072
Merit: 1181
April 22, 2013, 06:54:50 AM
Why does constant time public derivation give security against timing attacks? Isn't it the case that public derivation doesn't use the privkeys, and therefore cannot leak privkeys? Maybe you meant security against leakage of just the chaincodes? I probably simply misunderstood what you're saying there, so it'd be helpful if you explain where exactly we need protection from timing attacks.

I was wrong about this, as the problem of key exposure doesn't really exist for public key derivation, where these operations are required. It doesn't hold for chaincodes either, as these aren't tweaked.

This makes me wonder whether it's still worth changing, though the implementation complexity argument still holds.
legendary
Activity: 1072
Merit: 1181
April 22, 2013, 06:46:44 AM
I don't think storing the maximal branching inside the serialized format is the right way to go. That doesn't mean wallets can't keep such information separately, but the problem is that you'll always hit more cases that aren't covered. Will you have separate limits of public and private derivation? Will you limit the number of levels below a node? The specific branching factors of all of these nodes? What if someone wants to use a non-standard more complex structure? Inherently, how the tree structure is used as a wallet, and how keys are picked in it, depends on implementation anyway.

For example, say you do this, and limit only at the direct level what the largest index of a subnode is. You're giving out addresses to pay to, a unique one for each invoice. Some are never paid, so you end up with gaps. At some point, you reach the maximal index. What now? Will you switch to a new account? Create an extra tree level? Tell the user to create a new backup? All those problems occur are exactly the same for the case where you didn't know the maximum index, but did have a maximal gap. When it gets exceeded, tell the user his gap is now larger and he needs a new backup.

I say this is all implementation specific, and doesn't belong in the serialization.
hero member
Activity: 836
Merit: 1030
bits of proof
April 22, 2013, 06:39:47 AM
I suggest to set and serialize maximum i for each extended key. Such serialized string would define the wallet keys without the need of a regular backup until maximum number of keys not exceeded.

Huh? Why would regular backups be needed? The major advantage of deterministic wallets is that backups aren't needed (see here).
Assume you lost your wallet storing the unspent transactions, but you have the root key for the BIP32 generator from a cold store. Theoretically nothing is lost since you can scan the block chain for all outputs that can be spent with descendants of the root key, but practically you are not able to do this if you do not know what sequences were used.

Therefore you either have to backup the sequences used or simpler set a limit to them in beforehand that is a practicable magnitude for a scan.

As Pieter mentioned in the link that I gave you, it should be very practical to scan in mostly all cases of wallet layouts, the branching factor and depth are quite tiny.

The assumption there is that if a key with a sequence number is not on the block chain, then higher sequences were not used, right?

I think that would not hold. A shop usually offers unique addresses to receive payment for individual orders. Some orders never get paid, therefore their key will not be present in the block chain, however keys with higher sequence number might be. A reconstruction of the wallet would fail at such a gap.
member
Activity: 104
Merit: 10
April 22, 2013, 06:26:56 AM
I'm not sure that I understood your use case, could you please explain the exact scenario more clearly? if you had an ancestor chaincode and pubkey, then you can derive the descendant chaincode and there's no necessity to encrypt and send it to you, and if you don't have any ancestor chaincode then an encrypted chaincode must be sent to you, no?

Maybe we do gain some security by having ci depend on Kpar, because commonly just the hashed address of Kpar is published (until the coins get spent).

Scenario is this: Server A is holding (K_par,c_par). Server B is being set-up in a different location and is supposed to hold (K_i,c_i). All chaincodes are to be kept secret by the servers holding them, i.e. c_par shall only be known to A (not B) and c_i shall only be known to A and B (no third party). It is irrelevant here who knows K_par and K_i. So B does not derive anything itself. Instead, A does the derivation of (K_i,c_i) and sends the result to B. Of course, the communication of A to B must be authenticated. The point is that on top of that at least c_i must (in the general case) be sent in encrypted form.

I am having problems answering you other questions because I'm not sure what exactly the word "you" refers to.
sr. member
Activity: 360
Merit: 251
April 22, 2013, 05:36:09 AM
With the additive variant, don't you get I_L*G from parent and a child pubkeys, rather than I_L itself? I'm still not sure what are the advantages of the multiplicative variant.

We actually just had a discussion about this, and I'm in favor of changing this. The proposal would be:
  • k_i = k_par + I_L
  • K_i = K_par + I_L*G

I've been holding back on changing this, as speed wasn't a priority really. However, it is a lot easier to do the required operations for public derivation in constant time, than for the multiplicative case, which means it's easier to secure against timing attacks (and we are, very intensively, dealing with private material here). Also, it means you only need almost the same primitives as needed for ECDSA in the first place.

It is not less secure, as you still need to solve the ECDLP to find I_L from I_L*G (=the public key, which is exposed). Assuming a weaker version of the scheme, where it's just K_i = K_par + (i*c_par)*G, and assuming there is a transaction with inputs corresponding to i values i1 and i1+1, and a transaction with values i2 and i2+1, an attacker could detect these were created by someone using the same chain code twice. This would not be possible in the multiplicative version, and also not in the real scheme where the 'tweak' I_L is computed using an HMAC.

Anyone objects to changing this?

Why does constant time public derivation give security against timing attacks? Isn't it the case that public derivation doesn't use the privkeys, and therefore cannot leak privkeys? Maybe you meant security against leakage of just the chaincodes? I probably simply misunderstood what you're saying there, so it'd be helpful if you explain where exactly we need protection from timing attacks.

Yes, obviously with a variant that uses K_i=K_par+(i*c_par)*G you could detect c_par*G from the sub-branch with i1,i1+2 compared to the sub-branch with i2,i2+1, and maybe you could do much worse stuff as well. The OP also uses the safe version with hash(seed|i) to derive the next values. You can see that the formal proof in post #62 fails for K_i=K_par+(i*c_par)*G because we wouldn't claim that the r_i that's used for (sk_i,pk_i)=keygen(r_i) is pseudorandom. As you mentioned in post #103, if the attacker learns I_L then it doesn't seem like such a big deal, and as mentioned here the attacker needs to solve ECDLP to learn even just I_L, so the additive variant appears to be provably secure, as far as I can see. But of course it wouldn't hurt to show the finalized BIP32 to some elliptic curves crypto experts and ask for more peer review, before including it in the Satoshi client.
member
Activity: 104
Merit: 10
April 22, 2013, 05:25:39 AM
Sure. I assume the attacker can only observe the blockchain. For any transaction with multiple inputs, he subtracts the pubkeys used in any pair of inputs. If he finds two transactions that result in the same difference, they are very likely keys that are generated from the same extended parent (the difference would be (i-i')*c_par*G. Note that he can't find c_par or i or i' itself, just a likelyhood that c_par is the same. Also note that this doesn't work in the real scheme as not i*c_par but HMAC-SHA512(key=c_par, msg=K_par || i) is used, which guarantees no tweak will be reused.

This was just a thought experiment to see how much we'd have to weaken the construction before the add-vs-multiply makes any difference.

How does the corresponding weaker multiplicative scheme look like that you are comparing it with?
If K_i = (i*c_par)*K_par then you get the same phenomenon, right? The difference would be (i-i')*c_par*K_par, which is recognizable if (K_par,c_par) is reused. Seems to me that additive vs multiplicative derivation is not the point. Instead it's about how i and c_par are combined. Using c_par^i instead of i*c_par looks different at first sight.
legendary
Activity: 1072
Merit: 1181
April 22, 2013, 04:44:08 AM
It is not less secure, as you still need to solve the ECDLP to find I_L from I_L*G (=the public key, which is exposed). Assuming a weaker version of the scheme, where it's just K_i = K_par + (i*c_par)*G, and assuming there is a transaction with inputs corresponding to i values i1 and i1+1, and a transaction with values i2 and i2+1, an attacker could detect these were created by someone using the same chain code twice. This would not be possible in the multiplicative version, and also not in the real scheme where the 'tweak' I_L is computed using an HMAC.

Can you specify what information your are assuming your attacker has? Does he know c_par and is just scanning all pubkeys he gets to see for a difference of c_par*G? Or how does his "detection" work?
[/quote]

Sure. I assume the attacker can only observe the blockchain. For any transaction with multiple inputs, he subtracts the pubkeys used in any pair of inputs. If he finds two transactions that result in the same difference, they are very likely keys that are generated from the same extended parent (the difference would be (i-i')*c_par*G. Note that he can't find c_par or i or i' itself, just a likelyhood that c_par is the same. Also note that this doesn't work in the real scheme as not i*c_par but HMAC-SHA512(key=c_par, msg=K_par || i) is used, which guarantees no tweak will be reused.

This was just a thought experiment to see how much we'd have to weaken the construction before the add-vs-multiply makes any difference.
member
Activity: 104
Merit: 10
April 22, 2013, 03:09:24 AM
The proposal would be:
  • k_i = k_par + I_L
  • K_i = K_par + I_L*G
[ ... ]
It is not less secure, as you still need to solve the ECDLP to find I_L from I_L*G (=the public key, which is exposed). Assuming a weaker version of the scheme, where it's just K_i = K_par + (i*c_par)*G, and assuming there is a transaction with inputs corresponding to i values i1 and i1+1, and a transaction with values i2 and i2+1, an attacker could detect these were created by someone using the same chain code twice. This would not be possible in the multiplicative version, and also not in the real scheme where the 'tweak' I_L is computed using an HMAC.

Can you specify what information your are assuming your attacker has? Does he know c_par and is just scanning all pubkeys he gets to see for a difference of c_par*G? Or how does his "detection" work?
sr. member
Activity: 360
Merit: 251
April 22, 2013, 02:59:21 AM
I suggest to set and serialize maximum i for each extended key. Such serialized string would define the wallet keys without the need of a regular backup until maximum number of keys not exceeded.

Huh? Why would regular backups be needed? The major advantage of deterministic wallets is that backups aren't needed (see here).
Assume you lost your wallet storing the unspent transactions, but you have the root key for the BIP32 generator from a cold store. Theoretically nothing is lost since you can scan the block chain for all outputs that can be spent with descendants of the root key, but practically you are not able to do this if you do not know what sequences were used.

Therefore you either have to backup the sequences used or simpler set a limit to them in beforehand that is a practicable magnitude for a scan.

As Pieter mentioned in the link that I gave you, it should be very practical to scan in mostly all cases of wallet layouts, the branching factor and depth are quite tiny.
hero member
Activity: 836
Merit: 1030
bits of proof
April 22, 2013, 02:10:22 AM
I suggest to set and serialize maximum i for each extended key. Such serialized string would define the wallet keys without the need of a regular backup until maximum number of keys not exceeded.

Huh? Why would regular backups be needed? The major advantage of deterministic wallets is that backups aren't needed (see here).
Assume you lost your wallet storing the unspent transactions, but you have the root key for the BIP32 generator from a cold store. Theoretically nothing is lost since you can scan the block chain for all outputs that can be spent with descendants of the root key, but practically you are not able to do this if you do not know what sequences were used.

Therefore you either have to backup the sequences used or simpler set a limit to them in beforehand that is a practicable magnitude for a scan.
legendary
Activity: 1072
Merit: 1181
April 21, 2013, 06:21:38 PM
Pieter: what's the difference between what you just posted and what iddo posted?
Nothing, timing coincidence.

Quote
Also, aren't timing attacks supposed to already be "covered" if you are using a vetted implementation?  I'm not opposed to making it easier to implement, but I thought that was part of the reason to use, say, OpenSSL, because they already diligently covered a bunch of these side-channel concerns.

OpenSSL's ECDSA_sign probably guarantees this (I haven't verified), but as generic EC multiplication isn't required by ECDSA_sign (only by ECDSA_verify), it's not guaranteed to have a constant-time version (and a variant-time one is certainly faster).
legendary
Activity: 1428
Merit: 1093
Core Armory Developer
April 21, 2013, 06:02:26 PM
It's described by gmaxwell in the OP of this thread.

Thanks for that.   Just for myself, I copied it below, adding sub tags for subscripts, and bolding EC points (but not scalars)

As currently in BIP32 you would still have I=HMAC-SHA512(Key=cpar, Data=Kpar || i)
And then:

ki=kpar+IL
Ki=Kpar+G*IL
ci=IR


Pieter: what's the difference between what you just posted and what iddo posted?

Also, aren't timing attacks supposed to already be "covered" if you are using a vetted implementation?  I'm not opposed to making it easier to implement, but I thought that was part of the reason to use, say, OpenSSL, because they already diligently covered a bunch of these side-channel concerns.

Pages:
Jump to: