Author

Topic: Detecting lost records in distributed hash tables (Read 1080 times)

legendary
Activity: 1232
Merit: 1094
Distributed hash tables (DHTs) are a way to store key -> value pairs in a distributed way.

In a Kademlia system, each node selects a random id.  The distance between 2 nodes is given by the XOR of their 2 ids.  Also, the distance between a node and a key is given by the XOR of the key and the node's id.

It uses a system for finding a value by asking 20 nodes that are nearer to the value's key to tell you about the nearest nodes to the key they know of.  This gets you the IPs of nodes that are nearer the target than the set of nodes you asked (each node has more detail about the other nodes near it).

A DHT could be used to store lots of data about the blockchain.  It is all key -> value.  A merkle tree is basically lots of key -> value pairs.  The key would be 256 bits and the values are the 2 children (so 512 bits).

Extra data could also be added to the DHT.  The keys would always be sha(sha(value)).

If the block header chain is used as root, then you can prove that a particular key is supposed to exist by giving the merkle chain from the header as root.  This is a key property to allow detecting missing key value pairs.

If the key is 256 bits, then the routing tables are 256 distances.  The distance ranges are powers of 2, so the further away, the larger the range.

Bucket 0 would be for exact matches
Bucket 1 would be for a distance of 1
Bucket 2 would be for distances of 2 - 3
Bucket 3 would be for distances of 4 - 7
and so on.

Each node could broadcast 2 routing distances to all neighbours.  This would give 2 routing tables.  One is the shortest distance to someone who wants a particular key and the other is the distance to someone who has a particular key.

If you ask a node for a value, it will ask a node nearer and so on.  You would include proof that the key is actually in the table by tracing back to a root.

If the node gets the value, then done. 

However, if after some timeout the node cannot get the value, you update your routing table to reflect that.  This may show a different neighbour as nearer the key.  You can then ask them.

Eventually, you either get the key value pair, or all neighbours fail to get the key for you.  Effectively, this sets your distance to infinity.  Also, since your neighbours are following the same process (you proved it was a valid key), the infinite distance to that key should flood the network.  If the key value pair is eventually found and you will get a copy as the routing tables refresh.

If not, this tells the whole network that a key (with proof) has gone missing.  The block containing the key is marked as bad and maybe the chain forked in front of it.  The age of the block would matter here.  A block that is a few months old might be grandfathered in.

The distance to someone who wants a key is so that you can send them new keys if you find them without having to flood the whole network.

A node might decide to check transactions at random while connected.  They could set how much CPU to allocate.  The would also upload their entire database to the network over and over again.  Again, they would set the maximum upload rate.

However, it requires adding to the system so that everything can be verified locally.  The root of a tree of the unspent transaction output set would need to be included in every block header, and a node can check that it was updated in a valid way.

This tree needs to allow summing of the branches.  This allows verifying that the total coin is as expected.

The system would also need to be adjusted so that it can handle chain re-orgs.  The assumption is that the header chain is a solid root reference.
Jump to: