Inspired by BitMessage, I would like to develop an 'offline-messaging' solution that is NSA-resistant, and this is an extract from a very early draft of a paper I am writing. I would love some feedback
tl;dr- All tools that claim to keep people safe online must be studied at length by the community
- Any claim of anonymity should be evaluated objectively within the context of a clearly defined threat model
- My effort is to simplify the problem of a decentralised-messaging solution to that of a distributed hash table, something well understood and extensively studied in the literature.
- Motivated by recent advances (such as Octopus Secure & Anonymous DHT lookup http://hatswitch.org/~nikita/papers/octopus-icdcs12.pdf)
2. System modelThe networkWe model the system as a collection of N nodes. Each node maintains at least one connection to another, such that all N nodes can be arranged to form a connected graph.
Threat modelIn line with related work[refs] we opt to exclude consideration of a global active adversary – i.e. one who is capable of controlling the entire network and observing all network traffic. Our threat model, as in [Oct11] and [Salsa06], is that of a partial adversary. The adversary is able to control at most 20% of nodes on our network and act in an arbitrarily malicious manner. Communications can be intercepted, manipulated, forged or silently dropped on any links under the control of the adversary.
Design goalsThe major functional goal is to simply enable one party to efficiently send a message to another, where the recipient can be reasonably assumed to be ‘offline’. It follows that a sender will need to transmit a message to the network where it will be subsequently stored for later retrieval by the intended recipient.
The mode of message retrieval remains to be determined and we note that it could operate as either push (like SMS) or pull (like email). We shall therefore state that for the sake of simplicity, we will consider only a pull approach.
Our focus is on solving the messaging problem within the context of our specified threat model. End-users should be offered the best protection possible against leaks of their personal data and meta-data, on an increasingly surveilled and hostile internet. The ‘amount of protection’ offered by a solution will be evaluated objectively, using the properties of anonymity [Pfitz05].
3. A distributed hash table as an overlay networkIn this section we examine the general case where a DHT forms the underlying overlay network for asynchronous messaging systems.
A simple DHTAt this point, we deliberately refrain from specifying any single DHT implementation upon which to implement our solutions. There are many in the literature - of particular interest are the renowned Kademlia [Kadem] and the more recent Octopus [Oct11].
We opt to ignore the implementation details and model a simplistic DHT as one which provides the following primitives:
DHT-GET(key) # find the value associated with key
DHT-PUT(key, value) # store a value with the specified key
It is generally the case [Refs] that the key is derived directly from the value itself by performing some hash function. A key of this form will be denoted as a Content Hash Key (CHK), in common with [Freenet02].
Limitations of CHK based DHTsWhilst the simple DHT scales well and enables the efficient storage and retrieval of data, it is too restrictive to be used as-is for our overlay network. Critically we note that CHKs prevent the use cases which rely on having access to a key whilst having no knowledge about the value associated with it. Forcing recipients to have prior knowledge of each message before they can be retrieved is clearly counterproductive.
To solve this we must introduce another value-keying scheme – the Random Hash Key (RHK). RHKs bear no relation to the values with which they are associated. They can be considered as simply a random sequence of bytes.
Messaging primitives and message-slotsWe can introduce the following primitives for the sending and receiving of messages, implemented analogously to DHT-PUT and DHT-GET:
MSG-SEND(msg-slot, msg) # DHT-PUT(msg-slot, msg)
MSG-GET(msg-slot) # DHT-GET(msg-slot)
We should stress that each message is uniquely identified by its message-slot (RHK) and that in order for two parties to communicate they must previously agree on which message-slots to use.
For example, Alice can send a message to Bob, over a previously agreed message-slot (msg-slot
Bob), as follows:
Alice) MSG-SEND(msg-slot_Bob, “hello Bob”)
Bob) MSG-GET(msg-slot_Bob)
Deterministic message-slotsSince a message-slot can only ever be used once, it is important that ‘fresh’ message slots be communicated to the other party on a regular basis. A better solution would be to use a deterministic slot-generation mechanism, in which case only the initial ‘seed’ has to be shared. The following example shows how this could work:
- We assume that Alice previously received a random seed value from Bob (seedBob) and Bob similarly received seedAlice from Alice. The seed is an integer of at least 128 bits.
- The seed values allow Alice and Bob to agree on a message-slot sequence, [msg-slot0, msg-slot1, ... ], over which to communicate. A simple sequence can be defined as follows:
msg-slot_0 = SHA256( seed_Alice + 0 )
msg-slot_1 = SHA256( seed_Alice + 1 )
...
msg-slot_i = SHA256( seed_Alice + i )
Keeping messages privateSince all values stored in the DHT are readable by anyone, the only way to keep messages private is by employing encryption. Prior to engaging in communication, two parties must therefore agree on a message-slot sequence and a set of cryptographic keys to use.
In order to simplify the process we propose a pragmatic solution, based on the idea of ‘deterministic wallets’ in Bitcoin [Refs], where a sequence of public-private key pairs are deterministically generated from a seed value. To see how this could work in practice, let us reconsider the example of Alice sending a message to Bob:
- 1. Bob generates a random secret 256 bit number, which we denote as private-seedBob
- 2. He then calculates an elliptic curve point, public-seedBob and shares it with Alice
public-seed_Bob = private-seed_Bob × G # where G is the ECC generator
- 3. Alice calculates the sequence of public keys [public-key0, public-key1, ...]:
public-key_0 = public-seed_Bob + 1×G
public-key_1 = public-seed_Bob + 2×G
...
public-key_i = public-seed_Bob + (i+1)×G
- 4. Both parties can calculate the sequence of message-slots [msg-slot0, msg-slot1, ... ]:
msg-slot_i = SHA256( public-key_i ) # for any integer i
- 5. Alice encrypts a message with the next unused public-key and sends it to the appropriate message-slot
- 6. To decrypt messages, Bob calculates the sequence of private-keys, [private-key0, private-key1, ...], as follows:
private-key_0 = G×(private-seed_Bob + 1)
private-key_1 = G×(private-seed_Bob + 2)
...
private-key_i = G×(private-seed_Bob + i + 1)
Starting a conversation: handshake over a back-channelEvery communication medium has, to some small degree, a setup-phase where the conversation participants share enough information in order to proceed. For example, email conversations start by sharing knowledge of email addresses while users of BitMessage must share their public-keys. From the recipient’s point of view, the setup-phase is passive. It is enough to publish an email address or a BitMessage public-key to invite people to converse with you.
However, In our system the process requires a ‘handshake’. A recipient, Bob, has to actively provide seed values to anyone he wishes to receive messages from. This is especially advantageous as an anti-SPAM measure:
- One cannot send unsolicited messages
- The moment a recipient starts receiving SPAM, the conversation can be terminated by simply refusing to check for further messages from the sender.
The drawback of this approach is that it requires this extra handshake step before a message can be sent to a recipient. Although this is not a huge problem, it may be a cause of frustration for some users who just want a simple email-like paradigm, where a textual address is all that is needed to send someone a message. We will later show how this can be achieved through the use of message relay providers.
One-one communication (private messaging)Having previously exchanged their seed values, Bob can proceed to send a message to Alice.
- He will identify Alice’s next empty message-slot before perform a MSG-SEND operation on it.
- Alice will regularly check her messages by iterating through the message-slot sequence, retrieving all messages, until she arrives at an empty slot.
We assume that there is some back-channel over which Alice and Bob can ‘discover’ each other and exchange their respective message-slot/public-key generation seeds.
- We deliberately neglect to support this functionality at this stage to keep the semantics simple and easy to reason about.
- We will later argue how this limitation can be overcome, without bloating our semantics.
One-many communication (publishing)To publish a message to a group of people we need to transmit (over a back-channel) the public-seed value, as well as any cryptographic keys, to every intended recipient. This allows a recipient to know which message-slots to check and how to decipher any messages that arrive.
However, the problem with each member being aware of the message-slot sequence is that it leaves the publisher open to attack, namely:
- A member could attempt to forge messages from the publisher
- A member could perform a denial of service attack by purposely ‘using-up’ message-slots
To solve this we need to introduce a new DHT value-keying scheme called Signed Hash Keys (SHKs). Each SHK will include the public-key of the authorised entity, who has the sole ability to perform a DHT-PUT to keys of this form.
Many-many communication (forum)We can consider the ‘forum’ mode of operation, where there are many senders and recipients in a single conversation, as follows:
- A ‘forum’ is rooted at a single message
- The message includes a public-seed to describe the message-slot sequence of replies
- Recipients can not only read the original message, but all of the existing replies as well.
- New replies can also be created by anyone
- Which will also include a public-seed for its own replies
- Similar paradigm to Reddit [Ref]
The problem with this is that it could become quite chaotic. There is no protection against SPAM nor denial-of-service replies (maliciously using up large quantities of message-slots).
Some things we could do:
- Limit the ability to send lots of replies by requiring proof-of-work (as in [BitMessage])
- Empower the forum creator (the OP) with the ability to assign a forum authority who will control the ability to make replies
- The authority could be blind to the content of the reply and just prevent the user from sending too many replies too quickly.
4. Improvements- Message relay providers
- Proof of work / anti-spam
- Perfect forward secrecy
- Darknet via Message tunnels
...