Timeout
Strictly speaking, device itself is just a state machine. I don't think defining some timeout is required. Computer can reset the device at any time by sending "Initialize" message, as I defined above.
From computer perspective, some "Ok, nothing happened" is completely implementation dependent and we don't need to specify this. Confirmation dialog can be opened for unlimited time and nothing wrong happen...
In 'Standard message flow, first paragraph' you mention that the computer initiates the conversation but with the PIN and OTP the device sends the PinRequest / OTPRequest. Do these occur as responses to the GetEntropy computer request only ?
PinRequest and OtpRequest can basically appear as a response for any sensitive call. GetEntropy was just an one example.
I personally implement Pin/Otp handling in bitkeylib (python) independently to the call itself. When device respond with PinRequest or OtpRequest, library simply ask user for input.
C: GetEntropy()
D: Entropy(entropy)
or
C: GetEntropy()
D: OtpRequest()
C: OtpAck(otp)
D: Entropy(entropy)
or
C: GetEntropy()
D: OtpRequest()
C: OtpAck(otp)
D: PinRequest()
C: PinAck(pin)
D: Entropy(entropy)
also, is it valid to have the PinRequest/PinAck before the OtpRequest/OtpAck ?
Yes, all these combinations can happen. Although asking for Otp before Pin make sense, it makes guessing/bruteforcing of the pin almost impossible, because very attempt requires unique OTP...
Does the bytestream for USB have guaranteed integrity ? IE is it necessary to have a checksum for each of the 64 byte message or is that taken care of in the transport protocol ? If required we could simply have 1 checksum byte = XOR(payload bytes) i.e.
USB HID is very simple protocol and yes, it guarantee the order and integrity. So no need for adding it manually.
Also, you state that the PB message is chunked into 64 byte packets. When they are 'dechunked' how do you know the boundaries of the packets to stitch them back together to recreate the PB messages? Do you need "this protobuf message is chunked over X packets' at the beginning of the PB message.
You're right, I completely forgot to describe "magic character" and "message size" while encoding PB message into the stream. I'll complete the documentation above now.
I am just thinking of how to make the 'chunking' and 'dechunking' as simple and unambigous as possible. Ideally you want it a dumb transport layer that understands nothing of what is in the messages.
That's exactly how I defined it. There are three layers:
1. Transport - it just transport bytes from one side to another. I want to have only one transport based on USB HID in the specs, but technically it is possible to use serial port or system sockets as well. As I said, I'm already testing the protocol using named pipes. Part of transport specification is (for example) that one byte defining message length (report id in term of HID).
2. Stream encoding/decoding. It contains magic character + PB message length, to make decoding of the stream as easy as possible. I'll complete the docs above.
3. Payload encoding/decoding. I proposed Protocol Buffers, which seems to be pretty good choice for our needs.
Whole messaging stack doesn't need to understand transferred messages themselves, so everything is very flexible.