Proof of Storage to make distributed resource consumption costly. - page 3.

d'aniel

sr. member

Activity: 461

Merit: 251

Quote from: Sergio_Demian_Lerner on October 15, 2013, 08:36:36 AM

Also it's important to note that C may not store all the tree, but only a precomputed top. To effectively detect if C is in fact storing almost all the tree or not, S would need to make many time measurements of C responses, averaging the elapsed time.

IIUC, for an n level tree, you'd save yourself from storing

(1 - 2^-n ) / (2 - 2^-n) ~ 1/2

the nodes in the tree. So could you could just assume everybody is cheating and require an extra level?

Sergio_Demian_Lerner

hero member

Activity: 555

Merit: 654

Gmaxwell: Your idea is excellent, but needs some changes to prevent an obvious query forwarding attack.

Suppose client C wants to prove the server S that he keeps the data. When client C connects to server S, it also connects to some client D, acting as a server. Then server S sends the seed to client C, client C forwards the same seed to client D. Also each query is forwarded.
S cannot detect C is cheating, because he is in fact checking D storage. Then the seed must be chosen by a fair coin-flip or similar protocol.

New proposed Setup Protocol

C chooses a seed c and sends it to S.
S chooses a seed s and sends it to C.
Both agree that seed = H (c | s)

Also it's important to note that C may not store all the tree, but only a precomputed top. To effectively detect if C is in fact storing almost all the tree or not, S would need to make many time measurements of C responses, averaging the elapsed time.

gmaxwell

staff

Activity: 4326

Merit: 8951

Quote from: justusranvier on October 13, 2013, 06:38:32 PM

So each node requires a gigabyte for their own storage table, plus one gigabyte for each peer they connect to?

One gigabyte (assuming that security level, it was a random number) per peer that they are using a storage proof to get priority access to. So they don't necessarily need to have one per outbound peer. The idea would be the your peers would prioritize inbound connections that were providing a storage proof, everyone else's connectivity would depend on other criteria.

There is no "their own" storage requirement the whole basis of this idea is to be memorless on the server.

Node could pick their favorite couple peers to use storage proofs on, and if they get bumped off other peers during a dos attack, it's not the end of the world.

Quote

Would a node only require a storage proof for new nodes and drop the requirement after they have established a certain amount of good behaviour?

The problem there is that an attacker could just slowly build up proof free connections to the whole network and still saturate things.

Long term I expect our behavior with inbound connections to look something like: When a new connection comes in and we're full, randomly drop a peer (including, possibly, the new one) based on a priority scheme. The priority scheme could reserve a number of slots for each of several different kinds of priority. For example, some slots should be reserved for the longest connected nodes— since having a long uptime connection is a kind of scarce resource and some attackers come in bursts. Other slots should be reserved for peers which have the lowest minimum round trip time (because being close to you on the network can't be faked), come from the same subnet as you, come from many distinct subnets, are often the first to relay you new blocks, have an IP that when hashed with a secret is a low value, etc. You can list out a bunch of different groups to protect... then randomly (weighed by goodness or whatever) kick nodes that don't make the cut.

One problem with that kind of litany of criteria is that you'll still end up with clients that just don't meet any of the protected classes— they won't appear to be highly useful good nodes, because they're not highly useful (esp SPV clients). They may not be near any of their peers on the network, they may be freshly started and so have low uptime. So it would be good to have a way for nodes to opt into priority without creating an easy avenue for attackers.... thus ideas like a proof of storage.

justusranvier

legendary

Activity: 1400

Merit: 1013

So each node requires a gigabyte for their own storage table, plus one gigabyte for each peer they connect to?

Would a node only require a storage proof for new nodes and drop the requirement after they have established a certain amount of good behaviour?

gmaxwell

staff

Activity: 4326

Merit: 8951

Motivation:
Consider a mutually distrusting decentralized network where each node has some finite capacity, like Bitcoin's P2P network, with 100,000 nodes.

An attacker wants to soak up all the connection capacity of the all the nodes in order to funnel new peers to malicious nodes the attacker controls. Each node can perform actions to make resource usage costly. For example, each node could reject additional connections from single IPs or subnets, requiring the attacker to have access to more address space (a scarce resource) in order to use up that nodes' capacity. Or they could prioritize peers that present a costly Bitcoin bond (e.g. a Fidelity bond/SIN).

The problem there is that the attackers effectiveness scales with the number of victims— e.g. each IP or bond or whatever lets them use 100k sockets—, since the nodes are mutually distrusting and thus can't collaborate to reject an attacker who makes one connection per IP to every single node in the network.

The network could respond by requiring a scarce resource to connect— e.g. a POW or a Bitcoin fee. But an on-connect fee doesn't really map well to a ongoing resource utilization. You could require a periodic fee (e.g. intermittent POW), but setting the rates so that it doesn't favor attackers (e.g. an attacker might not mind spending a half bitcoin to attack the whole network, but regular users might not tolerate a 0.0001/per connect cost).

Also, most POW schemes are subject to huge speedups by more efficient implementations. E.g. a FPGA or even GPU implementation will be enormously faster than a java one used in most clients. A memory hard function may narrow the gap, but common memory hard functions are just as memory hard to validate as they are to produce (though it's possible to do otherwise), and increasing the server's resource utilization is not a great way to address a resource exhaustion attack.

What might be useful is a POW function which maps well to the kind of resource usage that holding a connection represents and is also less vulnerable to speedup for specialized attackers.

One idea that has been suggested for this would be to have a storage hard function: Some function that forces you to store a bunch of data and continue to prove you're still storing it while the connection is up. Making one that is storage hard for the client but not the server seemed pretty tricky and eluded a number of people for some time. I had a prior idea, which required fully homorphic encryption to create a sequential function with a trap-door property to allow random access... which might have worked but it was so complicated that no one would ever bother with it. Fortunately I've since had a better idea:

A proof of storage system:

Take some fast cryptographically strong pseudorandom function, e.g. like Blake2 or another hash function H().

The server gives the client a random seed and a target size. The client repeatedly applies H() in a tree, e.g. {Left, Right} = H(seed) and so on, to build up the target size worth of data. The client takes the final leaves of this tree-structured pseudo-random function and appends their index and sorts the data (or, equivalently, stores it in a binary search tree).

The server then can periodically pick a random index and perform log2(size) hash operations to determine the value at that index and then can challenge the client to provide the corresponding index for that value.

The tree structuring of the function means that you can cheaply lookup the value for a given index, but looking up the index for a given value requires computing the whole thing (and storing it, if you want to efficiently perform multiple lookups).

(There are some possible optimizations in the client, like common prefix compression, but they're only small optimizations because the function is strongly pseudorandom)

A gigabyte table wouldn't be too insanely burdensome for many kinds of clients, but would require a terabyte for an attacker to use resources on 1000 nodes. Such a scheme could be offered optionally to clients (e.g. if you perform a storage proof you get a preferred connection slot) and so it might provide some value in making some attacks ineffective.

Topic: Proof of Storage to make distributed resource consumption costly. - page 3. (Read 24075 times)