Pages:
Author

Topic: Proof of Storage to make distributed resource consumption costly. - page 2. (Read 24046 times)

member
Activity: 111
Merit: 10
is it possible to use this in a decentralised system where the nodes exchange random numbers with each other and store lots of smaller hash trees of the numbers, each hash tree corresponding to an individual node they want to communicate with?
or are there ways of cheating this?


Oh, sorry, I think this what you are describing?
full member
Activity: 169
Merit: 100
Firstbits : 1Hannes
No, go review it again. Thats not whats being described here. The PRNG has to have a special tree structure in order to make Alice's cost in producing an exact match query logarithmic in the tree size. You can trade off additional computation for space, but because the function is random this isn't terribly effective.

Apparently I don't understand it either  Embarrassed.

Despite my ignorance, I want to make the following suggestion. Perhaps, if you could point out the weakness in this scheme it would help me see what I missed.

I think Alice's cost in producing the query can be made independent of the tree size. Simply require Bob to fill a table with the values of
H(seed + index) where index ranges from 0 to N and H is a hash function (preferably a memory hard one). He can then sort these results as before.
 
Alice can easily choose a random index i and compute H(seed+i) for a constant cost and then challenge Bob to produce i given H(seed+i). If Bob stored the sorted table a binary search allows him to find the correct entry in O(log N) time. If not he needs to compute at O(N) cost.

As far as I can see, the Bloom Filter Tree optimization that cryddit described will still work against this, but perhaps the fact that Alice's cost is now O(1) means that we can simply assume that legitimate clients all use this optimization and demand 100GB of nominal storage knowing that only ~1GB of actual storage (or whatever the true ratio is) needs to be consumed in order to be able to respond to her queries in a timely manner.

Secondly, I'm not convinced that the query forwarding attack that Sergio describes is actually an attack. If it costs me 1 GB of storage to be able to connect to X and Y wants to connect to me at a cost of 1 GB, why shouldn't I be able to offload the cost? If we don't setup in a secure manner as outlined by Sergio, a timely response to a Proof of storage query essentially proves either that I have stored the data OR that I am maintaining an open connection to someone else that has. Both of these are costly. So, if I want to open 5000 connections using this attack, I first need to convince 5000 clients to connect to me and I have to maintain those connections.
member
Activity: 111
Merit: 10
is it possible to use this in a decentralised system where the nodes exchange random numbers with each other and store lots of smaller hash trees of the numbers, each hash tree corresponding to an individual node they want to communicate with?
or are there ways of cheating this?
staff
Activity: 4284
Merit: 8808
No, go review it again. Thats not whats being described here. The PRNG has to have a special tree structure in order to make Alice's cost in producing an exact match query logarithmic in the tree size. You can trade off additional computation for space, but because the function is random this isn't terribly effective.
legendary
Activity: 924
Merit: 1132
So let me see if I get this right.  We suppose that Bob wants to connect to Alice's server.


Alice picks a nonce N0, adds Bob and Alice's Server IDs to it,
   hashes it, and communicates the hash H0 to Bob.
Bob picks a nonce N1, adds Alice and Bob's Server IDs to it,
   hashes it, and communicates the hash  H1 to Alice.

Alice reveals her nonce, so Bob can verify that the constructed
    value is the preimage to the hash H0.
Bob reveals his nonce, so Alice can verify that the constructed
    value is the preimage to the hash H1.

Now both of them can calculate N = N0 ^ N1.

Alice picks an address A (say) 32 bits long, then performs the following:

    initialize V (a hash-wide value) with N
    for each bit B in A
        V = hash (V,B);

and asks Bob, "What is the address of Vfinal?"

And now Bob must exhaustively compute (or have stored) the
hash tree in order to find the address.  


My immediate thought is that the burden on Bob is asymmetric, but not by as much as you think.  

If I am Bob, I will set up a Bloom Filter Tree.  Bob needs to exhaustively compute the hash tree once in order to initialize the filters, but can then use a set of filters much smaller than the whole tree to efficiently narrow the address search.  He can spend less than one percent of the memory you thought you were requiring and still find the address with less than ten thousand hashes.

legendary
Activity: 1094
Merit: 1006
legendary
Activity: 2618
Merit: 1007
This could provide a basis for some sort of distributed file sharing, but you still need much more to make it work. 
How?

This is only useful to rank/score peers, the data generated is quite random... you might be able (by bruteforcing it) to generate some meaningful data from time to time (then this would be kind of a compression mechanism) but the content of the data definitely is NOT the focus here, rather the guaranteed waste of ressources, thus being "invested" into the relationship with you. This could help with bootstrapping issues, as a completely new peer has no data to offer and will have to leech initially. With Proof of Storage as described here, he can make sure that he will stay connected longer, as it otherwise would be quite a waste of ressources for him to leech + leave.

Doing a hash tree over a file would then mean that the construction of my tree needs to be randomized too, otherwise the counterparty would compute the same tree and throw away the file. I thought that you want a mechanism to make sure that a random client actually DOES have the full block chain stored... This proposal is something different and more along the lines of the NashX proposal, where both parties show something of high value before proceeding further. Before it was ressources like data on one end, nothing for a new peer on the other end. Now you can at least "invest" ressource exhaustion with this approch on the other end of the deal which shows that you are committed to this relationship.

I'm not too sure if that can beat Tit-for-Tat though, as after some time you probably want to transfer from Proof of Storage (exhaustion) to something like seed ratios or similar.

Also I'm not too sure that you can infer from the willingness to waste ressources the willingness to play fair.
Imagine this scheme in BitTorrent: Seeds accept new leechers only if they either have at least 10% of a file available or agree to waste 1 GB (random numbers). Now I as a new node can connect to a handful of seeds and load my first 10%. This does not mean that I'm willing at all to contribute anything back to the swarm in sense of upload. With tit-for-tat I'd get the whole file eventually even without uploading anything but I'll be severely choked by any peer out there quite fast. With ressource exhaustion, I just need to lie that I am a new fresh node and will get anything. My cost is the disk space + energy used for the calculations, BUT there is no gain on the seeder's end and also not for the network! The only way to detract me from doing these things would be to make it more expensive to run this attack than to participate. Given current electricity prices and HDD prices, there will be a lot of controversy about the "correct" size + computational requirements (you could build a tree where you have a much more "difficult" hash).
legendary
Activity: 1094
Merit: 1006
A proof of storage system:

Take some fast cryptographically strong pseudorandom function, e.g. like Blake2 or another hash function H().

The server gives the client a random seed and a target size. The client repeatedly applies H() in a tree, e.g.  {Left, Right} = H(seed) and so on, to build up the target size worth of data.  The client takes the final leaves of the tree and appends their index and sorts the data (or, equivalently, stores it in a binary search tree).

The server then can periodically pick a random index and perform log2(size) hash operations to determine the value at that index and then can challenge the client to provide the corresponding index for that value.

The tree structuring of the function means that you can cheaply lookup the value for a given index,  but looking up the index for a given value requires computing the whole thing (and storing it, if you want to efficiently perform multiple lookups).

I have just found time to read this post thoughtfully, and...
Mother of Hash... Another genius invention.

I mean how brilliant can the Bitcoin community be ? I am now convinced that most (if not all) of current humanity's problems can be solved by genius mathematical algorithms.

PS.
I wonder if this couldn't somehow be used in file sharing by the way ?
This could provide a basis for some sort of distributed file sharing, but you still need much more to make it work. 
staff
Activity: 4284
Merit: 8808
I initially thought "Proof of Storage" would mean that A has a large file that B previously had, then A periodically prooves to B that she still has the full file available.
It's utterly trivial to do that. Just ask that they tell you parts of the file from time to time. You can even do this without remembering the file yourself if you just compute a hashtree over it, and remember the root, and they give you the hash fragments.

But that wouldn't be very useful as a decentralized anti-dos mechanism: one copy of the file would allow you to prove to as many concurrent peers as you want.  If, instead, the peers ask you to store some unique file for them to address that issue— they have to undertake the cost of actually sending you that file which is contrary to the goal of using this as anti-dos (and also you have the risk that Sergio addressed of someone doing that in order to invisibly delegate their proof work to someone else).

What this adds to that idea is the ability to send a few bytes that the peer can use to make a file of arbitrary size, and then they can convince you that they have actually done so... without you having to do non-trivial communication, computation, or storage. The price is, indeed, that the information stored isn't terribly useful. (You could perhaps replace the hash with some other kind of abstractly useful function "folding proteins" or something— but in general double-use for proof-of-whatever subsidizes the cost of attacking, since now your attack can be paid for by the secondary benefit of the barrier function, and other useful functions have some risk of surprising optimizations that strong cryptographic hashes lack).
legendary
Activity: 2618
Merit: 1007
PS.
I wonder if this couldn't somehow be used in file sharing by the way ?
Likely not, as you are probably more interested in having counterparties that also share files with you than ones that provably waste hard drive space.

I initially thought "Proof of Storage" would mean that A has a large file that B previously had, then A periodically prooves to B that she still has the full file available.

The current idea is rather along the lines of "Proof of wasted HDD-space" (until CPUs catch up and it becomes faster to just generate the tables on the fly then to look them up from disk) and is intended to be used for deciding which connections to drop based on how "invested" other nodes are in you. This liely needs to be a longer time relationship than the usual file sharing time frame to pay off.
sr. member
Activity: 461
Merit: 251
I have just found time to read this post thoughtfully, and...
Mother of Hash... Another genius invention.

I mean how brilliant can the Bitcoin community be ?
Hey, give credit where it's due Smiley  This guy in particular oozes genius ideas.
legendary
Activity: 1470
Merit: 1006
Bringing Legendary Har® to you since 1952
A proof of storage system:

Take some fast cryptographically strong pseudorandom function, e.g. like Blake2 or another hash function H().

The server gives the client a random seed and a target size. The client repeatedly applies H() in a tree, e.g.  {Left, Right} = H(seed) and so on, to build up the target size worth of data.  The client takes the final leaves of the tree and appends their index and sorts the data (or, equivalently, stores it in a binary search tree).

The server then can periodically pick a random index and perform log2(size) hash operations to determine the value at that index and then can challenge the client to provide the corresponding index for that value.

The tree structuring of the function means that you can cheaply lookup the value for a given index,  but looking up the index for a given value requires computing the whole thing (and storing it, if you want to efficiently perform multiple lookups).

I have just found time to read this post thoughtfully, and...
Mother of Hash... Another genius invention.

I mean how brilliant can the Bitcoin community be ? I am now convinced that most (if not all) of current humanity's problems can be solved by genius mathematical algorithms.

PS.
I wonder if this couldn't somehow be used in file sharing by the way ?
hero member
Activity: 555
Merit: 654
Note I invented Proof-of-Harddrive in my Bitcoin : The Digital Kill Switch thread several months prior. Thanks for finding a useful application of it, even if you may prefer DRAM. I had to abandon it as a substitute for PoW because it doesn't provide the entropy for selecting which peer is next.

Could you provide a link to the exact post where you elaborate the idea of Proof-of-Harddrive ?
hero member
Activity: 518
Merit: 521
Note I invented Proof-of-Harddrive in my Bitcoin : The Digital Kill Switch thread several months prior. Thanks for finding a useful application of it, even if you may prefer DRAM. I had to abandon it as a substitute for PoW because it doesn't provide the entropy for selecting which peer is next.
legendary
Activity: 1596
Merit: 1100
Related:

Similar to each bitcoin P2P peer's nonce value, which serves as a node's unique id, I've occasionally desired for each side of the P2P protocol to offer a random seed in "version" and "verack."

(a DH exchange is probably too much to wish for, in the context of bitcoin P2P, heh heh)

hero member
Activity: 555
Merit: 654
side-topic: Also a node could prioritize nodes which store the full block-chain. This can be done with a variety of protocols known as  Proof of Storage and Proof of Retreivability.
legendary
Activity: 1652
Merit: 2301
Chief Scientist
Great idea!

If implemented, it would probably make sense to create a little protocol for the server and client to negotiate the amount of storage needed to "rent" a connection slot. Maybe the server reports something like "I've got 0 slots free, min/median/max proof-of-storage for my other connected nodes is 0/1MB/11MB."
staff
Activity: 4284
Merit: 8808
I think Sergio means the client could outsource the computation to another client.
Okay, I thought perhaps. But Sergio's protocol improvement prevents unwitting outsourcing.  Voluntarily outsourcing is okay I think, as a scarce resource is still being consumed by someone involved in the activity.
legendary
Activity: 1120
Merit: 1152
The timing comment leaves me feeling a bit confused.  I don't hope to prevent outsourcing: if someone is storing the data then that creates the desired scarcity.   If the client instead wants to store no data but perform a billion computations (or what have you) per query, then I think thats okay:  The queries are super cheap  for the server and honest clients so they can be performed frequently. I think the parameters can be chosen so that the storage-less solution of recomputing most of the answer space for every query is pretty costly (which would also achieve the goal of limiting abuse).

I think Sergio means the client could outsource the computation to another client.
staff
Activity: 4284
Merit: 8808
Also it's important to note that C may not store all the tree, but only a precomputed top. To effectively detect if C is in fact storing almost all the tree or not, S would need to make many time measurements of C responses, averaging the elapsed time.
IIUC, for an n level tree, you'd save yourself from storing

(1 - 2-n ) / (2 - 2-n)   ~  1/2

the nodes in the tree.  So could you could just assume everybody is cheating and require an extra level?

Yea this was what I was referring to with "There are some possible optimizations in the client, like common prefix compression, but they're only small optimizations because the function is strongly pseudorandom".

The timing comment leaves me feeling a bit confused.  I don't hope to prevent outsourcing: if someone is storing the data then that creates the desired scarcity.   If the client instead wants to store no data but perform a billion computations (or what have you) per query, then I think thats okay:  The queries are super cheap  for the server and honest clients so they can be performed frequently. I think the parameters can be chosen so that the storage-less solution of recomputing most of the answer space for every query is pretty costly (which would also achieve the goal of limiting abuse).
Pages:
Jump to: