Pages:
Author

Topic: Hardware Bitcoin wallet - a minimal Bitcoin wallet for embedded devices - page 2. (Read 44696 times)

member
Activity: 78
Merit: 11
Chris Chua
An update on features: I've added support for multiple wallets and hidden (plausibly deniable) wallets. 2 KB of EEPROM should be able to store 10 wallets; 4 KB of EEPROM should be able to store 23 wallets. The LPC11U24 comes in 2 KB and 4 KB variants.

If you decide to use TAPR for a hardware license you're making the hardware design more restrictive than the software design. Hardware is expensive to develop, and not always all that reusable. You also have the issue where the copyrightable parts of hardware, the board layout and schematic, are actually no-where near as important as the design calculations that went into them. So licensing for hardware doesn't buy you much. For firmware though by putting your code under the GPL, and especially the GPL-3 with the patent provisions, you make sure that someone who decides to develop the "hardware bitcoin wallet 2" will contribute back to the community in a way that permissive licenses don't.

I did intentionally choose a permissive licence because I wanted other people to be able to reuse portions of my code with few restrictions. I don't think someone else will release a closed-source wallet since the Bitcoin community isn't going to accept the whole "trust me, there isn't a backdoor in this binary" thing. Maybe the hardware should be licensed under a more permissive licence (than TAPR) to match the firmware?

One thing to keep in mind is that the output impedance of the whole circuit is different than the zener itself. If you look at the footnote on page 19 of the TL431 datasheet it points out that the dynamic impedance of the circuit in two-resistor mode is actually Z=|Zka|(1-R1/R2), essentially because the reference resistors to set the output voltage are influenced by a change in the voltage they are supposed to set. That equation is also a ratio - it's totally determined by the required output voltage. Vref=Vout*R2/(R1+R2) -> R1/R2=Vout/Vref-1 -> Z=|Zka|(2-Vout/Vref) -> Z=|0.5ohms max|(2-3.3V/2.5V)=0.34ohms and 0.136ohms typical (0.2ohm Zka)

So basically in our case we can go for an even worse-performing active zener than expected.

Your other issue will be the reference voltage temperature coefficient. A change in output voltage implies a change in input current of course so what we really want is what is the change in reference voltage due to change in temperature due to change in current, three chained derivatives. You'd also need to work out the thermal intertia of the chip+board. However a quick glance at page 22 of that TL431 datasheet pretty quickly shows this really shouldn't be an issue. With the SOT-89 package your change in temperature from 0mA to 100mA is only a few degrees, which at worst seems to imply a change in voltage of only a few mV. Also, the change in temperature due to the crypto isn't terribly predictable measured against equally big changes in ambient temperature from the room, as well as conducted into the device from the computer. I don't see any reason to worry about it. Just make sure the load resistor used in conjunction with the zener has similarly good properties, which is easy as resistors with downright magical specs are cheap and available.
Looking at TI's datasheet (http://www.ti.com/lit/ds/symlink/tl431a.pdf), I see a "+" instead of a "-" on page 19. Am I missing something? But if dynamic impedance is a problem, what if we used an extra PNP transistor (like on Figure 21 of the datasheet) as the shunt? If I understand the circuit correctly, the transistor amplifies current changes from the TL431, theoretically decreasing dynamic impedance by the current gain h_FE of the transistor. That would result in an insanely low dynamic impedance, though there are probably stability problems involved with putting an amplifier in the TL431's control loop. As an additional bonus, it would further decrease the thermal effects you mention.



I suggest "mikey", following connotations:

 * usb key form factor
 * stores "my key" to my money
 * holds mikes (street name for a micro-btc)



˙ɐıןɐɹʇsnɐ uı pǝsn pɹɐɔʇɹɐɯs ʇɹodsuɐɹʇ ɔıןqnd ɐ '(/nɐ˙ɯoɔ˙ıʞʎɯ˙ʍʍʍ//:dʇʇɥ) "ıʞʎɯ" oʇ ɹɐןıɯıs ooʇ s,ʇı 'ʎןǝʇɐunʇɹnoɟun


FTFY
To the rest of the world "mikey" would be a good name.  Cheesy
How about "CoinKey" or "BitKey"? Both are google-able, and both have .org domain names available. "CoinKey" also suggests that the wallet can be used with an alternative cryptocurrency such as Namecoin, since it (presumably) also uses ECDSA for coin transfer.



I have one comment related to your choice of CPUs.

It seems like you have a wide power margin in your power budget, in fact so wide that you are thinking of plainly burning this power to increase the resistance to DPA & DFA (Differential Power Analysis & Differential Fault Analysis).

I propose that you could burn that power in a more usefull way. If you choose a CPU that supports SIMD instructions you can greatly increase the resistance to those side-channel attacks. I'm aware of AVR32 controllers with SIMD extensions (apparently mature/obsoleted) as well as versions of ARM processors with NEON extensions.

The basic attack defense mode is to use a 4-way SIMD streams to simultaneously compute:

1) two copies of the desired cryptographic result
2) two copies of a benchmark cryptographic problem with a known result

The benchmark problem is randomly selected from a library of known solutions, and the allocation of the problems to the vector registers is also made randomly.

The papers I remember seeing about the above techniques were concerned with symmetric crypto as well as RSA assymetric crypto. I would presume that the results carry over to the elliptic assymetric crypto.

If the ARMs with NEON are too expensive or otherwise unsuitable the techniques are still valid when used in a plain CPU. In that case care must be taken to prevent the compilers from optimizing and reordering the instruction streams bythe  use of the appropriate "__builtin" functions (in GCC).
Those ARM processors with NEON are in a much more powerful (and much more expensive) class than the Cortex-M0 microcontrollers I'm looking at using. I think what you're suggesting is to use SIMD to detect fault analysis, since any injected faults will affect the benchmark problem. But I'm not that worried about fault analysis, because the attack model I have in my head is that of a remotely compromised host computer. This means that timing attacks and power analysis attacks are an issue, because a typical computer has the ability to measure time and power consumption. But I can't think of any way a typical computer could inject faults into a USB device, especially if that device has its own clock and filters its power supply.



Wow! I was just thinking about something like that the other day. Thanks for the link! BTW, the OP is a great idea too Wink
I should probably say this now: the idea for a hardware Bitcoin wallet isn't mine, nor are the core ideas (like the use of a deterministic wallet). This sort of thing has been suggested multiple times on this forum. I was expecting someone to beat me to an actual implementation, and was surprised when it didn't happen.
donator
Activity: 1736
Merit: 1014
Let's talk governance, lipstick, and pigs.
Wow! I was just thinking about something like that the other day. Thanks for the link! BTW, the OP is a great idea too Wink
donator
Activity: 2772
Merit: 1019
I suggest "mikey", following connotations:

 * usb key form factor
 * stores "my key" to my money
 * holds mikes (street name for a micro-btc)



Unfourtunately, it's too similar to "myki" (http://www.myki.com.au/), a public transport smartcard used in Australia.

or the "makey makey": https://www.youtube.com/watch?feature=player_embedded&v=rfQqh7iCcOU#!

I couldn't care less, screw trademarks, before you find a name that way you're old as Stallman with a white beard.



legendary
Activity: 2128
Merit: 1073
I have one comment related to your choice of CPUs.

It seems like you have a wide power margin in your power budget, in fact so wide that you are thinking of plainly burning this power to increase the resistance to DPA & DFA (Differential Power Analysis & Differential Fault Analysis).

I propose that you could burn that power in a more usefull way. If you choose a CPU that supports SIMD instructions you can greatly increase the resistance to those side-channel attacks. I'm aware of AVR32 controllers with SIMD extensions (apparently mature/obsoleted) as well as versions of ARM processors with NEON extensions.

The basic attack defense mode is to use a 4-way SIMD streams to simultaneously compute:

1) two copies of the desired cryptographic result
2) two copies of a benchmark cryptographic problem with a known result

The benchmark problem is randomly selected from a library of known solutions, and the allocation of the problems to the vector registers is also made randomly.

The papers I remember seeing about the above techniques were concerned with symmetric crypto as well as RSA assymetric crypto. I would presume that the results carry over to the elliptic assymetric crypto.

If the ARMs with NEON are too expensive or otherwise unsuitable the techniques are still valid when used in a plain CPU. In that case care must be taken to prevent the compilers from optimizing and reordering the instruction streams bythe  use of the appropriate "__builtin" functions (in GCC).
legendary
Activity: 1120
Merit: 1164
I suggest "mikey", following connotations:

 * usb key form factor
 * stores "my key" to my money
 * holds mikes (street name for a micro-btc)



Unfourtunately, it's too similar to "myki" (http://www.myki.com.au/), a public transport smartcard used in Australia.

Spelled differently though, that's enough for google. I know what you mean about names though... I've got a project of my own and the best I've come up with is STEGanographic BAKups...

Sounds good. I'd also suggest you use the GPL-2 or -3 for the firmware license. In a project like this the bulk of the work is really the firmware, so reasonably strong protections are probably a good thing, especially the patent clauses in the GPL-3. You're current license is both permissive, yet by being non-standard can still cause problems for other projects trying to re-use the code.
I chose the BSD "2-clause" licence because it was one of the most permissive licences accepted by the Open Source Initiative. I believe it exactly the same in spirit to the MIT/expat licence (maybe I should use that?). If software patents are a problem, why not use the Apache License v2.0? It's compatible with GPL-3 (http://www.gnu.org/licenses/license-list.html#apache2), so there would be less re-use problems.

That's a good idea I think. However you still have the problem that someone can take the firmware, put it in their own product, and then distribute it under their own license *not* subject to the patent provisions; it only protects you against patents that contributors assert.

If you decide to use TAPR for a hardware license you're making the hardware design more restrictive than the software design. Hardware is expensive to develop, and not always all that reusable. You also have the issue where the copyrightable parts of hardware, the board layout and schematic, are actually no-where near as important as the design calculations that went into them. So licensing for hardware doesn't buy you much. For firmware though by putting your code under the GPL, and especially the GPL-3 with the patent provisions, you make sure that someone who decides to develop the "hardware bitcoin wallet 2" will contribute back to the community in a way that permissive licenses don't.

However, those LPC11U's are really hard to get; everything better than 32KB of program memory seems to involve long lead times. It's a new product and they're probably having  a hard time ramping up production. They've probably also pre-sold most of their initial production as well.

Picking something with a dev kit available at sparkfun is probably a good idea. That said price is an issue. How much *ram* do you think you can live with?
I was looking at the LPC11U24, which seems quite widely available (there are also easy-to-use dev kits out there eg. http://www.sparkfun.com/products/11045). Though it only has 32KB of program memory, I've found that Cortex-M0 code is much smaller than AVR 8-bit. For example, the platform-independent portion of hardware Bitcoin wallet compiles to 22.8KB on AVR 8-bit, but compiles to only 11.5 KB on Cortex-M0 (using the yagarto toolchain).

As for RAM requirements, anything greater than 2 KB is probably enough. Currently, I've calculated maximum RAM usage to be ~1.5 KB.

Ah, that's much more reasonable then, still maybe a bit tight, but if you can live with it for a few months I'm sure the better LPC's will be available then.

Going back to the zener idea, I forgot about synthetic active zeners. Chips like the LT1431 have as little as 0.1ohms dynamic resistance, which would be plenty good enough. Reasonably cheap too, $3.76 to $1.91 in bulk and it'd replace the 3.3V regulator you'd need anyway. Disconnecting the power supply would be more like $0.5 though.
Is there any reason why the dirt-cheap TL431 couldn't be used? Looking at datasheets, the TL431s from various manufacturers have a typical output impedance of between 100 and 300 mohm. And they're cheaper than a 3.3V LDO which I was going to have to use anyway.

Good idea. The Texas Instruments TL431CPK, the SOT-89 package, is only $0.43 in singles and $0.16 in bulk, and the SOT-89 package has a very good 52degC/W thermal resistance to ambient.

One thing to keep in mind is that the output impedance of the whole circuit is different than the zener itself. If you look at the footnote on page 19 of the TL431 datasheet it points out that the dynamic impedance of the circuit in two-resistor mode is actually Z=|Zka|(1-R1/R2), essentially because the reference resistors to set the output voltage are influenced by a change in the voltage they are supposed to set. That equation is also a ratio - it's totally determined by the required output voltage. Vref=Vout*R2/(R1+R2) -> R1/R2=Vout/Vref-1 -> Z=|Zka|(2-Vout/Vref) -> Z=|0.5ohms max|(2-3.3V/2.5V)=0.34ohms and 0.136ohms typical (0.2ohm Zka)

So basically in our case we can go for an even worse-performing active zener than expected.

Your other issue will be the reference voltage temperature coefficient. A change in output voltage implies a change in input current of course so what we really want is what is the change in reference voltage due to change in temperature due to change in current, three chained derivatives. You'd also need to work out the thermal intertia of the chip+board. However a quick glance at page 22 of that TL431 datasheet pretty quickly shows this really shouldn't be an issue. With the SOT-89 package your change in temperature from 0mA to 100mA is only a few degrees, which at worst seems to imply a change in voltage of only a few mV. Also, the change in temperature due to the crypto isn't terribly predictable measured against equally big changes in ambient temperature from the room, as well as conducted into the device from the computer. I don't see any reason to worry about it. Just make sure the load resistor used in conjunction with the zener has similarly good properties, which is easy as resistors with downright magical specs are cheap and available.
donator
Activity: 1736
Merit: 1014
Let's talk governance, lipstick, and pigs.
I suggest "mikey", following connotations:

 * usb key form factor
 * stores "my key" to my money
 * holds mikes (street name for a micro-btc)



˙ɐıןɐɹʇsnɐ uı pǝsn pɹɐɔʇɹɐɯs ʇɹodsuɐɹʇ ɔıןqnd ɐ '(/nɐ˙ɯoɔ˙ıʞʎɯ˙ʍʍʍ//:dʇʇɥ) "ıʞʎɯ" oʇ ɹɐןıɯıs ooʇ s,ʇı 'ʎןǝʇɐunʇɹnoɟun


FTFY
To the rest of the world "mikey" would be a good name.  Cheesy
member
Activity: 78
Merit: 11
Chris Chua
I suggest "mikey", following connotations:

 * usb key form factor
 * stores "my key" to my money
 * holds mikes (street name for a micro-btc)



Unfourtunately, it's too similar to "myki" (http://www.myki.com.au/), a public transport smartcard used in Australia.



Sounds good. I'd also suggest you use the GPL-2 or -3 for the firmware license. In a project like this the bulk of the work is really the firmware, so reasonably strong protections are probably a good thing, especially the patent clauses in the GPL-3. You're current license is both permissive, yet by being non-standard can still cause problems for other projects trying to re-use the code.
I chose the BSD "2-clause" licence because it was one of the most permissive licences accepted by the Open Source Initiative. I believe it exactly the same in spirit to the MIT/expat licence (maybe I should use that?). If software patents are a problem, why not use the Apache License v2.0? It's compatible with GPL-3 (http://www.gnu.org/licenses/license-list.html#apache2), so there would be less re-use problems.

However, those LPC11U's are really hard to get; everything better than 32KB of program memory seems to involve long lead times. It's a new product and they're probably having  a hard time ramping up production. They've probably also pre-sold most of their initial production as well.

Picking something with a dev kit available at sparkfun is probably a good idea. That said price is an issue. How much *ram* do you think you can live with?
I was looking at the LPC11U24, which seems quite widely available (there are also easy-to-use dev kits out there eg. http://www.sparkfun.com/products/11045). Though it only has 32KB of program memory, I've found that Cortex-M0 code is much smaller than AVR 8-bit. For example, the platform-independent portion of hardware Bitcoin wallet compiles to 22.8KB on AVR 8-bit, but compiles to only 11.5 KB on Cortex-M0 (using the yagarto toolchain).

As for RAM requirements, anything greater than 2 KB is probably enough. Currently, I've calculated maximum RAM usage to be ~1.5 KB.

Going back to the zener idea, I forgot about synthetic active zeners. Chips like the LT1431 have as little as 0.1ohms dynamic resistance, which would be plenty good enough. Reasonably cheap too, $3.76 to $1.91 in bulk and it'd replace the 3.3V regulator you'd need anyway. Disconnecting the power supply would be more like $0.5 though.
Is there any reason why the dirt-cheap TL431 couldn't be used? Looking at datasheets, the TL431s from various manufacturers have a typical output impedance of between 100 and 300 mohm. And they're cheaper than a 3.3V LDO which I was going to have to use anyway.
legendary
Activity: 1120
Merit: 1164

I've finally implemented the persistent entropy pool. It was more work than I thought because the persistent entropy pool doesn't play nice when trying to format non-volatile storage - if the entropy pool is stored in non-volatile storage, it gets mangled in the process, so I've added a way to temporarily store it in RAM.

Cool! I'm actually kinda surprised how directly my code actually translated over.

Nope. It's something which is dependent on which microcontroller we choose, so measurements should probably be done after a choice is made.

Fair enough. Probably quite uC dependent. It'd be nice if we could compare different uC's, but that'd likely be a fair amount of effort better devoted elsewhere.

With hardware, I consider myself merely a hobbyist. I have designed and built many circuits in the past, but I've never designed anything which was "mass-produced". That's another reason why I wanted to keep things simple; I don't think I could design something with exotic features like self-destruct on tamper detection. It was always my intent to open-source everything, so TAPR looks good.

Sounds good. I'd also suggest you use the GPL-2 or -3 for the firmware license. In a project like this the bulk of the work is really the firmware, so reasonably strong protections are probably a good thing, especially the patent clauses in the GPL-3. You're current license is both permissive, yet by being non-standard can still cause problems for other projects trying to re-use the code.

I have thought a lot about the requirements and it's quite clear that the ATmega328 isn't going to cut it. Even now, I'm close to running out of program flash. Having a look around, I reckon the LPC11Uxx series from NXP (see http://www.nxp.com/products/microcontrollers/cortex_m0/lpc11u00/) is appropriate: low power, low cost, integrated USB, integrated EEPROM and good development tools. What do you think?

I like the idea of 32-bit ARM for futureproofing and making it easier to re-use code.

However, those LPC11U's are really hard to get; everything better than 32KB of program memory seems to involve long lead times. It's a new product and they're probably having  a hard time ramping up production. They've probably also pre-sold most of their initial production as well.

Picking something with a dev kit available at sparkfun is probably a good idea. That said price is an issue. How much *ram* do you think you can live with?

I looked around on digikey for stuff meeting the criteria of ARM, 128KB flash, 48-LQFP package, and 3.3V power requirements and there actually are only about a dozen different chips available, and not all have USB and ADC's. That said, you can probably use a bit-bang USB implementation if you can find one, and for the ADC as long as your random error source has a decent voltage swing you can use a capture and compare module instead.

I thought of another way to mask power signatures. With some transistors and a large-ish (> 100 uF) capacitor, a device can temporarily disconnect its own USB power line. It can then operate in two phases:
  • A discharging phase, where the device does cryptographic operations while USB power is disconnected, using the capacitor as a temporary power supply
  • and a charging phase, where the device sleeps while USB power charges the capacitor.
For example, if the microcontroller/regulator can tolerate a 1 volt drop and current consumption is 15 milliamp, then a 100 uF capacitor allows the discharging phase to be about ~7 ms. If the capacitor can tolerate a current of 100 milliamp, then the charging phase is only 1 ms. This results in a small (13%) reduction in overall performance, but I don't think anyone will care if a signing operation takes 1.1 second instead of 1.0 second.

This method can be applied to any cryptographic algorithm. An attacker can still measure power consumption, but they can only measure the average power consumption over each discharging phase. For a 7 ms discharging phase, this equates to 100,000s of clock cycles. There's probably not a lot of useful information in that. If more masking is required, the microcontroller can dissipate some power in a resistor, either to compensate for the power consumption of its ALU, or to inject some noise into an attacker's measurements.

Well, the simple thing isn't to discharge power into a resistor, but rather measure your own power supply voltage and at the end of the crypto-cycle use a variable amount of power to reach a specified voltage at a specified time.

Note that any switch to disconnect the uC from the USB power will have capacitance from drain to source in the off state. That said, that capacitance is usually in the order of dozens to hundreds of pF and can easily be shunted to ground by a few capacitors.


Going back to the zener idea, I forgot about synthetic active zeners. Chips like the LT1431 have as little as 0.1ohms dynamic resistance, which would be plenty good enough. Reasonably cheap too, $3.76 to $1.91 in bulk and it'd replace the 3.3V regulator you'd need anyway. Disconnecting the power supply would be more like $0.5 though.
 

@someone42
You should give your device and protocol a name.

That way client software can advertise itself as supporting 'YourName' devices.
Also people can then clone your hardware and advertise their devices as 'YourName' compatible.
I'm bad at naming things. I chose "hardware Bitcoin wallet" as a working title because it's direct. At one time I thought "Solidcoin" (solid: it's a solid device you can physically hold in your hand, coin: from Bitcoin) would be a good name, but that name's already taken. If you or anyone else can come up with a good name, I'll adopt it.

mikey's actually kinda clever. "mikey" isn't googlable of course, but mikey bitcoin, or mikey bitcoin wallet doesn't have any hits.
legendary
Activity: 980
Merit: 1003
I'm not just any shaman, I'm a Sha256man
I'm bad at naming things. I chose "hardware Bitcoin wallet" as a working title because it's direct. At one time I thought "Solidcoin" (solid: it's a solid device you can physically hold in your hand, coin: from Bitcoin) would be a good name, but that name's already taken. If you or anyone else can come up with a good name, I'll adopt it.

Solidcoin would indeed be a very poor choice.

I suggest "mikey", following connotations:

 * usb key form factor
 * stores "my key" to my money
 * holds mikes (street name Drug dealing code word for a micro-btc)


fixed that for ya Tongue
donator
Activity: 2772
Merit: 1019
I'm bad at naming things. I chose "hardware Bitcoin wallet" as a working title because it's direct. At one time I thought "Solidcoin" (solid: it's a solid device you can physically hold in your hand, coin: from Bitcoin) would be a good name, but that name's already taken. If you or anyone else can come up with a good name, I'll adopt it.

Solidcoin would indeed be a very poor choice.

I suggest "mikey", following connotations:

 * usb key form factor
 * stores "my key" to my money
 * holds mikes (street name for a micro-btc)

donator
Activity: 2772
Merit: 1019
Definitely an interesting concept.

But before using any sort of embedded device to generate keys, I'd want some confidence in its random number generator. (I haven't looked at the hardware or code, and I don't have the expertise to evaluate its randomness even if I did.) But the research using EFF SSL data that indicates a lot of SSL certificates in the wild use bad random number generators, probably generated on embedded devices, makes me really hesitant to use an embedded device to generate my bitcoin keys without a lot of scrutiny.

well, if the device has capability to restore a seed, you could also generate the seed somewhere else.
member
Activity: 78
Merit: 11
Chris Chua
Regardless of how you collect your random data you should be also saving the state of the random number generator so you can restore it on the next power up. Currently the getRandom256() function collects new random bytes from the hardware generator, hashes them with sha256, and then returns that result. It really needs to operate with a persistent entropy pool to protect against a HWRNG failure. FWIW I just created an un-tested patch that implements this. See https://github.com/retep/hardware-bitcoin-wallet/compare/persistent-pool-prng
I've finally implemented the persistent entropy pool. It was more work than I thought because the persistent entropy pool doesn't play nice when trying to format non-volatile storage - if the entropy pool is stored in non-volatile storage, it gets mangled in the process, so I've added a way to temporarily store it in RAM.

Have you tried to measure the actual current consumption vs. time yet?
Nope. It's something which is dependent on which microcontroller we choose, so measurements should probably be done after a choice is made.

Sure thing. What hardware experience do you currently have? I would be interested in starting a set of specs and then schematics.

For licensing the TAPR Open Hardware License looks reasonable: http://www.tapr.org/ohl.html - out of respect for my employer, I do want to make it clear I'm not trying to make any money out of this project. At the same time, my employer is also quite clear that I own the IP to things I develop unrelated to my job.
With hardware, I consider myself merely a hobbyist. I have designed and built many circuits in the past, but I've never designed anything which was "mass-produced". That's another reason why I wanted to keep things simple; I don't think I could design something with exotic features like self-destruct on tamper detection. It was always my intent to open-source everything, so TAPR looks good.

I have thought a lot about the requirements and it's quite clear that the ATmega328 isn't going to cut it. Even now, I'm close to running out of program flash. Having a look around, I reckon the LPC11Uxx series from NXP (see http://www.nxp.com/products/microcontrollers/cortex_m0/lpc11u00/) is appropriate: low power, low cost, integrated USB, integrated EEPROM and good development tools. What do you think?

Using the system sound card as our threat model, we really have to filter out, well, practically everything. Even the high frequency stuff needs to be filtered as we can't predict what non-linearities exist in the system that would downconvert, say, 1MHz to 5KHz. That said, ferrites are cheap, and as you say, we've got a lot more room for stuff like that than on a smartcard. We'll have to come up with a plausible attack in detail to figure out just how many dB of reduction is required; unlike the projects I normally work on cost is a factor. Smiley
I thought of another way to mask power signatures. With some transistors and a large-ish (> 100 uF) capacitor, a device can temporarily disconnect its own USB power line. It can then operate in two phases:
  • A discharging phase, where the device does cryptographic operations while USB power is disconnected, using the capacitor as a temporary power supply
  • and a charging phase, where the device sleeps while USB power charges the capacitor.
For example, if the microcontroller/regulator can tolerate a 1 volt drop and current consumption is 15 milliamp, then a 100 uF capacitor allows the discharging phase to be about ~7 ms. If the capacitor can tolerate a current of 100 milliamp, then the charging phase is only 1 ms. This results in a small (13%) reduction in overall performance, but I don't think anyone will care if a signing operation takes 1.1 second instead of 1.0 second.

This method can be applied to any cryptographic algorithm. An attacker can still measure power consumption, but they can only measure the average power consumption over each discharging phase. For a 7 ms discharging phase, this equates to 100,000s of clock cycles. There's probably not a lot of useful information in that. If more masking is required, the microcontroller can dissipate some power in a resistor, either to compensate for the power consumption of its ALU, or to inject some noise into an attacker's measurements.

@someone42
You should give your device and protocol a name.

That way client software can advertise itself as supporting 'YourName' devices.
Also people can then clone your hardware and advertise their devices as 'YourName' compatible.
I'm bad at naming things. I chose "hardware Bitcoin wallet" as a working title because it's direct. At one time I thought "Solidcoin" (solid: it's a solid device you can physically hold in your hand, coin: from Bitcoin) would be a good name, but that name's already taken. If you or anyone else can come up with a good name, I'll adopt it.
legendary
Activity: 1120
Merit: 1164
Wow, sorry but you're talking Greek to me, retep.
What's the point in storing previously used random numbers? What's a HWRNG failure? I suppose the acronym means HardWare Random Number Generator, but still don't understand what you mean.

Electronics designers do like their greek letters. Smiley

So basically, HWRNG stands for HardWare Random Number Generator. There are a lot of ways to get hardware to generate random numbers, but the one thing in common with all of them is they can all fail, and you'll never know it if they do. So by storing a "seed" you have some pre-existing source of randomness. Every time a random number is generated you take the seed, and combine it with the allegedly random bits the hardware gives you to create a new random number. Specifically you do this with a cryptographic hash function like SHA256.

Now, lets suppose the seed turns out to not be random. This will happen the first time the device is powered up. No worries, the hardware will produce some randomness.

Now, lets suppose you've been using the device for a few years, and one day the hardware random number generator fails. Again, no worries! So long as the attacker doesn't know what your random seed is, you can use it to derive unpredictable, if not strictly speaking random, numbers for as long as you want. All you actually care about is that the attacker can't guess what number you used, that the entropy of the number is actually lower than specified isn't a problem.

Of course, there are nuances like how do you take the current seed, and generate a random number from it in such a way that the attacker can't guess the seed, but you get the idea.

For further explanation, look up the Yarrow algorithm: http://www.schneier.com/yarrow.html
legendary
Activity: 1120
Merit: 1164
The way my current implementation achieves constant time operation is to insert dummy operations instead of using branches. I believe this is like the "wait until maximum-time, while doing something cpu-intensive" approach, except that the waiting is interleaved instead of happening all at the end. However, it has the advantage that program flow is independent of secret data, so it also hides most (but not all) of the power consumption signature.

That's a good technique. Have you tried to measure the actual current consumption vs. time yet?

I had previously assumed that power analysis using a computer motherboard wouldn't be feasible for a hardware Bitcoin wallet. I assumed this because usually fast ADCs (at least the clock rate of the microcontroller) are required to do power analysis, and I doubt motherboards have 48 Msps ADCs in them. However, your comments have prompted me to question this. A point multiplication operation involves 256 point addition and point doubling operations which take place over some time (0.5 s to 4 s depending on the speed/power of the microcontroller). Sampling over each point addition/doubling would only require a < 1 ksps ADC and could possibly reveal information about secret data. Although I still think a power analysis virus is a bit exotic, it would be prudent to design some power analysis countermeasures into a hardware Bitcoin wallet.

It also occurred to me that another possible avenue of attack would be for the virus to record audio from the computer's line-in port. On most computers a lot of power-supply noise gets into the audio subsystem, and yet at the same time the ADCs connected to it are fast and precise.

About the hardware design: yes, I'd really appreciate help. Software mistakes can be fixed with a commit; hardware mistakes can require expensive redesign or rework.

Sure thing. What hardware experience do you currently have? I would be interested in starting a set of specs and then schematics.

For licensing the TAPR Open Hardware License looks reasonable: http://www.tapr.org/ohl.html - out of respect for my employer, I do want to make it clear I'm not trying to make any money out of this project. At the same time, my employer is also quite clear that I own the IP to things I develop unrelated to my job.

Yeah, using simulation of a simple zener regulator circuit, I couldn't get better than a factor of ~3 reduction in power consumption signature.

I think a mostly software solution is possible. A relatively large (compared to what's possible on a smartcard) amount of passive filtering can be used to mask high-frequency power consumption signatures (with a cutoff in the 10s of kHz).

Yeah, fix thing where it's easy, software, first, and then fix it where it's hard.

Using the system sound card as our threat model, we really have to filter out, well, practically everything. Even the high frequency stuff needs to be filtered as we can't predict what non-linearities exist in the system that would downconvert, say, 1MHz to 5KHz. That said, ferrites are cheap, and as you say, we've got a lot more room for stuff like that than on a smartcard. We'll have to come up with a plausible attack in detail to figure out just how many dB of reduction is required; unlike the projects I normally work on cost is a factor. Smiley

Of course, if cost is a problem, this can be a differentiation thing as well. Sell one version with "pretty-damn-good" security that leaves out the filtering and another "god-damn-bankvault" version that puts in back in.

The bankvault version can have all sorts of fun paranoia. For instance, if you have a sealed box, the best way to protect against it being opened is to put a light sensor and a light, in the box and trigger the alarm if the light level changes.

Say, for example, an addition: x + y requires i + d_i milliamp of current over one clock cycle, but the addition of ~x + ~y requires i - d_i milliamp of current over one clock cycle. As long as every necessary addition x + y is paired with a dummy addition ~x + ~y, total current consumption is always i milliamp of current over two clock cycles. Since the core clock of microcontrollers is in the Mhz range, the cycle-to-cycle variations in power consumption are not visible to an attacker; thanks to the filtering, only power consumption over hundreds of cycles is visible. The challenge is then to "pair" every necessary operation with a dummy operation that quickly compensates for the power comsumption of the necessary operation.

Yup. Pretty standard techniques, although we'll have to be careful to actually verify that the techniques really do work.

After seeing this: https://instruct1.cit.cornell.edu/courses/ee476/FinalProjects/s2007/blh36_cdl28_dct23/blh36_cdl28_dct23/index.html, I believe it is possible to bit-bang a USB host interface. So I will probably include a USB port on the wallet, for you to plug a USB keyboard into.

Nice! A keyboard is going to be a very slick interface, although do consider leaving a backup option for laptop users.

Incidentally, a keyboard makes possible a mode of operation that I think some users might like: brainwallet. The idea here is that you don't store the private key on the device at all. Rather each time you want to make a transaction, you connect the device up to a secure computer, type in your passphrase, have the device turn that into the wallet seed privkey, and then it tells the computer the public key portion of the seed. When you're done, the device forgets the privkey.

Now if the device falls into wrong hands, it can't even be used to show that you made transactions at all with that wallet, let alone steal coins from the wallet.
legendary
Activity: 1120
Merit: 1164
If I understand your patch correctly, you are adding a persistent (across power cycles) entropy pool so that in the event of an undetected HWRNG failure, getRandom256() will degrade gracefully instead of immediately giving crappy random numbers.

Exactly.

This is a good idea, but you're using the encrypted read/write functions to do save/restore the pool state. I understand this is to ensure that the pool state is secret. But it is a problem because the encryption key the encrypted read/write functions use depends on which wallet is loaded. In some cases (eg. formatting non-volatile storage), random numbers are required when a wallet isn't even loaded (i.e. no encryption key is set).

The most general and secure fix I could think of for this is to define a "global" wallet encryption key, which exists solely to encrypt the pool state and its checksum. Actually, now that I think about it, the global wallet encryption key can also be used to encrypt the wallet names/versions. But this will require some significant changes elsewhere in the codebase. I will fully incorporate your suggestions, but only after I add support for multiple wallets.

Ah, I see what you mean. That said, in the interm it's probably acceptable to simply store the random seed unencrypted. An attacker who gets hardware access to read it can only use that knowledge to find out what the random values will be in the future; past values are protected by the one-way property of SHA256. Any attacker who gains such access can do far worse things.

That said I should add some documentation to my patch making this clear, as well as think through this issue carefully.

Also, in general we should write a clear statement as to what types of attacks we're trying to prevent in the first place.
member
Activity: 78
Merit: 11
Chris Chua
Regardless of how you collect your random data you should be also saving the state of the random number generator so you can restore it on the next power up. Currently the getRandom256() function collects new random bytes from the hardware generator, hashes them with sha256, and then returns that result. It really needs to operate with a persistent entropy pool to protect against a HWRNG failure. FWIW I just created an un-tested patch that implements this. See https://github.com/retep/hardware-bitcoin-wallet/compare/persistent-pool-prng
If I understand your patch correctly, you are adding a persistent (across power cycles) entropy pool so that in the event of an undetected HWRNG failure, getRandom256() will degrade gracefully instead of immediately giving crappy random numbers.

This is a good idea, but you're using the encrypted read/write functions to do save/restore the pool state. I understand this is to ensure that the pool state is secret. But it is a problem because the encryption key the encrypted read/write functions use depends on which wallet is loaded. In some cases (eg. formatting non-volatile storage), random numbers are required when a wallet isn't even loaded (i.e. no encryption key is set).

The most general and secure fix I could think of for this is to define a "global" wallet encryption key, which exists solely to encrypt the pool state and its checksum. Actually, now that I think about it, the global wallet encryption key can also be used to encrypt the wallet names/versions. But this will require some significant changes elsewhere in the codebase. I will fully incorporate your suggestions, but only after I add support for multiple wallets.
member
Activity: 78
Merit: 11
Chris Chua
How small are you thinking a final version maybe? As small as a USB stick with 2-line display?
I won't really know until I choose a display and layout a PCB, but I estimate a final version will have a size of about 1.5x the length/width/height of a typical USB stick.


I'm looking through the code. I guess the ECDSA and bignum algorithms were implemented more or less from scratch? I might use some of you algorithms for the C library I'm making. I was looking at other libraries including OpenSSL which is what the C++ client uses but none easily allow me to include only the specific parts I need into my library.
If you do choose to use the code, keep in mind that:
  • bigMultiplyVariableSizeNoModulo() is horribly slow. If you are going to hand-optimise or reimplement one function, choose this.
  • The "constant execution time" property of the algorithms may not carry over to a contemporary CPU because of caches and branch prediction. I don't have to worry about caching/branch prediction because none of the microcontrollers I'm looking at have these features.



The problem with random sleep periods is that you can still determine the underlying time taken through simple statistical averaging in many circumstances. Possible a better solution would be to start a timer at the beginning of any cryptographic operation, and then once the operation finishes wait until a fixed maximum-time has elapsed. That way every operation takes the same amount of time. Adding a further random delay on top of this fixed delay wouldn't be a bad idea either, to help mask any residual leakage. (for instance, suppose that the time measurement turns out to be flawed, and there is still a remainder which it does not detect)

If the attacker can also determine what the power consumption of your hardware wallet you have to hide that as well. Unfortunately in the case of a USB-powered wallet this is actually a realistic attack as some motherboards can measure how much power a USB device is actually consuming. To protect against this, the "makeup" time period should do something cpu-intensive as well, perhaps an additional, randomly chosen, cryptographic operation. (don't pick the same operation, with the same keys, in case there turns out to be a key-dependent power consumption)

Finally, at the hardware level you can choose a power supply with a fixed current consumption on the input. A zener diode and resistor voltage regulator, operated correctly, has a nearly zero change in input current for a change in output current and you can do even better with something more sophisticated.

For the OP: I'm an electronics designer and I'd be interested in working further on the hardware design.
The way my current implementation achieves constant time operation is to insert dummy operations instead of using branches. I believe this is like the "wait until maximum-time, while doing something cpu-intensive" approach, except that the waiting is interleaved instead of happening all at the end. However, it has the advantage that program flow is independent of secret data, so it also hides most (but not all) of the power consumption signature.

I had previously assumed that power analysis using a computer motherboard wouldn't be feasible for a hardware Bitcoin wallet. I assumed this because usually fast ADCs (at least the clock rate of the microcontroller) are required to do power analysis, and I doubt motherboards have 48 Msps ADCs in them. However, your comments have prompted me to question this. A point multiplication operation involves 256 point addition and point doubling operations which take place over some time (0.5 s to 4 s depending on the speed/power of the microcontroller). Sampling over each point addition/doubling would only require a < 1 ksps ADC and could possibly reveal information about secret data. Although I still think a power analysis virus is a bit exotic, it would be prudent to design some power analysis countermeasures into a hardware Bitcoin wallet.

About the hardware design: yes, I'd really appreciate help. Software mistakes can be fixed with a commit; hardware mistakes can require expensive redesign or rework.

See, that's the catch, seemingly OK zeners can't meet this spec. For instance all of the BZX84 series zeners have maximum impedance of greater than 10Ohms, even if they look OK otherwise. Unfortunately, it looks like zeners that meet our requirements don't actually exist; the minimum zener impedance on digikey for a 3.3V zener is 3Ohms, and that spec only applies if you are drawing 380mA.

So lets go back to a software/hardware solution. Suppose we can get the change in current down to 0.25mA by clever software that carefully uses the same amount of power no matter what the cryptographic engine is doing. The above equation now gives us 18Ohms at the maximum, and as it turns out the $0.20 225mW MMBZ5226BLT1G is almost suitable at 28Ohms. (measured at 20mA) Solving for IR1 we get 0.18mA maximum, IE, this is the maximum change in current allowed before the host could detect it. So go work on the software some more!

Now what happens if the zener fails? Specifically, suppose the zener's impedance rises, such that the change in current through R1 increases. The only way we would know is if the voltage across R1 increases, so basically the uC now has to monitor the change in it's own voltage supply as it begins a cryptographic operation. That change would be equal to V=I(R1*Rz)/(R1+Rz)=3.4mV, tiny! Fortunately 10bit ADC's are quite common, so 3.3V/1024=3.22mV - we probably can't detect a subtle failure, but anything major is detectable. Such a design doesn't even have to be all that expensive if you use the other side of the R1 resistor as your Vref, which is OK because we only care about the difference between the too. Total implementation would be 4 0.1% resistors, $0.20 in volume.


Ugly 'eh? What looked so simple, just isn't. I gotta admit, I wasn't expecting a combined hardware/software implementation to be mandatory myself.
Yeah, using simulation of a simple zener regulator circuit, I couldn't get better than a factor of ~3 reduction in power consumption signature.

I think a mostly software solution is possible. A relatively large (compared to what's possible on a smartcard) amount of passive filtering can be used to mask high-frequency power consumption signatures (with a cutoff in the 10s of kHz).

Say, for example, an addition: x + y requires i + d_i milliamp of current over one clock cycle, but the addition of ~x + ~y requires i - d_i milliamp of current over one clock cycle. As long as every necessary addition x + y is paired with a dummy addition ~x + ~y, total current consumption is always i milliamp of current over two clock cycles. Since the core clock of microcontrollers is in the Mhz range, the cycle-to-cycle variations in power consumption are not visible to an attacker; thanks to the filtering, only power consumption over hundreds of cycles is visible. The challenge is then to "pair" every necessary operation with a dummy operation that quickly compensates for the power comsumption of the necessary operation.



Seriously, 0,40USD is nothing. Even if it ads more than 1USD per unit, it's still nothing. Just use USB, it's the current standard, people wouldn't need adapters, you could use the same USB port to connect to the device that will provide the transactions to be signed, and maybe one day you could make it capable of writing the encrypted seed out for backup purposes. Wink
After seeing this: https://instruct1.cit.cornell.edu/courses/ee476/FinalProjects/s2007/blh36_cdl28_dct23/blh36_cdl28_dct23/index.html, I believe it is possible to bit-bang a USB host interface. So I will probably include a USB port on the wallet, for you to plug a USB keyboard into.
hero member
Activity: 630
Merit: 500
Wow, sorry but you're talking Greek to me, retep.
What's the point in storing previously used random numbers? What's a HWRNG failure? I suppose the acronym means HardWare Random Number Generator, but still don't understand what you mean.
legendary
Activity: 1120
Merit: 1164
Good point to remember.
As suggested here, the device could have an embedded AM antenna and use the background noise as entropy source.

Regardless of how you collect your random data you should be also saving the state of the random number generator so you can restore it on the next power up. Currently the getRandom256() function collects new random bytes from the hardware generator, hashes them with sha256, and then returns that result. It really needs to operate with a persistent entropy pool to protect against a HWRNG failure. FWIW I just created an un-tested patch that implements this. See https://github.com/retep/hardware-bitcoin-wallet/compare/persistent-pool-prng
legendary
Activity: 1120
Merit: 1164
Thanks retep. I was thinking the same thing: Instead of a random delay, you could profile the algorithms to estimate the longest time and add a delay up until this time each time the algorithms are run. An additional random delay would be good also. As for the power problem, wouldn't it be best to focus on the hardware to mask this? With the algorithms there may be detectable power variations for the actual different operations computed right?

It depends on who the attacker is. If you assume they have access to your computer, as is the case with a virus, you're job is a lot easier and hardware or software solutions to keep the power draw constant are fine. However, if you are trying to defeat an attacker who has the actual device, then a hardware solution is trivial to attack by just taking the device apart. At that point though, there are a *lot* of possible attacks which this design can't prevent anyway.

I'd argue for fixing the problem in software first, and then once you've done that, determine if a cheap hardware fix is possible too. Having said that, a USB powered device with this fix is pretty easy to make as USB provides +5V and most uC's will run on 3.3V. If one of the easy-to-use FTDI FT-X chips is used for the uC<->USB communication we have no issue as well, as those chips have 3.3V-tolerant IO. The only issue left is the LCD screen, most of them run off of 5V. Still, simple level translation isn't hard and won't add much cost.

BTW, for the non-electrical engineers out there, this is how the zener-regulator works, and why it makes the current constant:



Vs represents the 5V supplied by the computer over USB, and R2 representing your uC. DZ is the zener. In operation, the effective value of R2 changes as the uC draws different amounts of current. At what extreme, R2 might be nearly infinite if the uC is in sleep mode, at the other extreme, the most current the uC can draw will look like an R2 of a few dozen ohms. What we want is for the current being drawn out of Vs to be fixed, regardless of the value of R2.

The zener diode has a property called breakdown where if the voltage applied to it in the reverse direction increases over a certain threshhold, it starts conducting current exponentially. Suppose we pick a 3.3V zener, and R2 is an open circuit. (the uC is off) If the zener is not conducting any current, the voltage across it will rise, and as the voltage reaches 3.3V, it will start conducting current again. This means we can pick the value of R1 assuming a given amount of current flows through it. Lets assume 30mA: R1=(Vs-Vz)/I=(5V-3.3V)/30mA=73Ohms. Now the zener is happily allowing 30mA of current to flow through it, which causes 30mA of current to flow through R1, and there is a voltage drop across R1 of 2.2V, just enough that 3.3V is across the zener.

Now lets suppose the uC starts doing a cryptographic operation. It's now consuming 20mA of current computing a bitcoin transaction. If the zener continues to draw 30mA of current, the total would be 50mA, and the voltage drop across R1 would be (20mA+50mA)*73Ohms=3.65V. However, 5V-3.65V=1.35Volts, way less than the 3.3V breakdown of the zener. Instead what happens is as the uC draws more current, the zener draws less, so the total current between the two stays the same. Now we have the zener drawing 10mA and the uC drawing 20mA.

An external observer, such as a wallet-stealing virus that can ask the computer for how much power the USB port is drawing, now has no idea how much power the uC is drawing no matter what it's doing. The disadvantage of course being that the power consumed is constant, even when the wallet isn't doing anything. Still, 150mW isn't a lot of power.


Of course, the above example is a bit hand-wavey, what actually happens is more subtle. A zener's current-to-voltage relationship is actually an exponential relationship, in the form I=k(V-Vth)^n with k and n being device-specific coefficients. Effectively, the total current drawn is *not* actually constant, it's just that the change in current draw is small.

Lets assume that the computer we're using has the ability to measure the current draw on an individual USB port with a precision (not accuracy!) of 0.1%, and the range of that measurement is one standard USB load, 100mA. 100mA * 0.1%=0.1mA, so if we keep our *change* in current draw to less than half of that, we can guarantee that there will be no detectable change in the current measured. We'll assume that uC draws 25mA fully on, or a change in uC current from off to on of 25mA. In the small signal model a zener looks like a resistor, so our circuit is now:



R1 is the 73Ohms we calculated earlier, and Rz is the impedance of the zener. (also known as the zener differential resistance) Our load now looks like a current source, and the input voltage disappears. What we want is for IR1 to be less than 0.05mA for I=25mA. So, I=IR1+IRz -> I=IR1+V/Rz Now what's V? Simply I*(R1*Rz)/(R1+Rz), so I=IR1+I(R1*Rz)/((R1+Rz)Rz) -> Rz=I*R1/(I-IR1) - R1=0.146Ohms

See, that's the catch, seemingly OK zeners can't meet this spec. For instance all of the BZX84 series zeners have maximum impedance of greater than 10Ohms, even if they look OK otherwise. Unfortunately, it looks like zeners that meet our requirements don't actually exist; the minimum zener impedance on digikey for a 3.3V zener is 3Ohms, and that spec only applies if you are drawing 380mA.

So lets go back to a software/hardware solution. Suppose we can get the change in current down to 0.25mA by clever software that carefully uses the same amount of power no matter what the cryptographic engine is doing. The above equation now gives us 18Ohms at the maximum, and as it turns out the $0.20 225mW MMBZ5226BLT1G is almost suitable at 28Ohms. (measured at 20mA) Solving for IR1 we get 0.18mA maximum, IE, this is the maximum change in current allowed before the host could detect it. So go work on the software some more!

Now what happens if the zener fails? Specifically, suppose the zener's impedance rises, such that the change in current through R1 increases. The only way we would know is if the voltage across R1 increases, so basically the uC now has to monitor the change in it's own voltage supply as it begins a cryptographic operation. That change would be equal to V=I(R1*Rz)/(R1+Rz)=3.4mV, tiny! Fortunately 10bit ADC's are quite common, so 3.3V/1024=3.22mV - we probably can't detect a subtle failure, but anything major is detectable. Such a design doesn't even have to be all that expensive if you use the other side of the R1 resistor as your Vref, which is OK because we only care about the difference between the too. Total implementation would be 4 0.1% resistors, $0.20 in volume.


Ugly 'eh? What looked so simple, just isn't. I gotta admit, I wasn't expecting a combined hardware/software implementation to be mandatory myself.
Pages:
Jump to: