Author

Topic: Proof of Storage (Read 1411 times)

sr. member
Activity: 364
Merit: 264
April 24, 2013, 05:15:15 PM
#18
Another potential contender: Bittorrent's "sync service".

http://labs.bittorrent.com/experiments/sync/get-started.html

Quote
P2P Protocol

BitTorrent Sync synchronizes your files using a peer-to-peer (P2P) protocol. This protocol is very effective for transferring large files across multiple devices, and is very similar to the powerful protocol used by applications like µTorrent and BitTorrent. The data is transferred in pieces from each of the syncing devices, and BitTorrent Sync chooses the optimal algorithm to make sure you have a maximum download and upload speed during the process.

The devices you setup to sync are connected directly using UDP, NAT traversal and UPnP port mapping. We also provide such additional methods of ensuring connectivity as relay and tracker servers. If your devices are on the same local network, BitTorrent Sync will use your LAN for faster synchronization.
Security

BitTorrent Sync was designed with privacy and security in mind. All the traffic between devices is encrypted with AES cypher and a 256-bit key created on the base of the secret—a random string (20 bytes or more) that is unique for every folder.

It’s our priority to make sure that nobody has unauthorized access to your folders. That’s why there are no 3rd party servers involved when syncing your files. All the files are stored only on your trusted devices, controlled and managed solely by you.

For the same reason we provide you with a quick and easy way to manage secrets. You can regularly change them and invite people by sharing a one-time secret instead of distributing a permanent one.
Secret

The secret is a randomly generated 21-byte key. It is Base32-encoded in order to be readable by humans. BitTorrent Sync uses dev/random (Mac, Linux) and the Crypto API (Windows) in order to produce a completely random string. This authentication approach is significantly stronger than a login/password combination used by other services. That's why using a secret generated by BitTorrent Sync is very safe and secure.

If you want even more security, BitTorrent Sync gives you a way to use a custom secret. Just create your own secret, encode it with Base64, and enter in the secret field for BitTorrent Sync. Note that a custom secret should be more than 40 characters long.
Peer Discovery

In order to find proper peers that have the same secret, Sync uses:

    Local peer discovery. All peers inside local network are discovered by sending broadcast packets. If there are peers with the same secret they respond to the broadcast message and connect.
    Peer exchange (PEX). When two peers are connected, they exchange information about other peers they know.
    Known hosts (folder settings). If you have a known host with a static ip:port, you can specify this in Sync client, so that it connects to the peer using this information.
    DHT. Sync uses DHT to distribute information about itself and obtain the information about other peers with this secret. Sync sends SHA2(Secret):ip:port to DHT to announce itself and will get a list of peers by asking DHT for the following key SHA2(Secret)
    BitTorrent tracker. BitTorrent Sync can use a specific tracker server to facilitate peer discovery. The tracker server sees the combination of SHA2(secret):ip:port and helps peers connect directly. The BitTorrent Sync tracker also acts like a STUN server and can help do a NAT traversal for peers so that they can establish a direct connection even behind a NAT.

We recommend that you use a tracker server instead of DHT for reasons of faster response and NAT traversal, so peers have a higher probability of networking directly.

I'll need to read it in detail / actually try it out to comment on whether something like this can actually apply to this scheme.
newbie
Activity: 18
Merit: 0
April 24, 2013, 03:26:25 PM
#17
Alternatively concatenate the timestamp/signature of the last block onto the file contents, and hash that? Would that work?

Yes that will require disk access, but we're assuming that disk rate is faster than network rate.

Interesting, I'll have to think about that.  Initial thought was that I don't think so because then how do the other nodes know what the original file contents are.
newbie
Activity: 18
Merit: 0
April 24, 2013, 03:22:32 PM
#16
I image a kind of free "distributed dropbox" with huge capacity.

Is it possible to use proof of storage for a new coin?  It would serve as a free distributed filesystem for people who need extra space.  People who have extra space would earn coins.

If its even possible there's probably other ways to do it, but here is my way:

-nodes store segments of a file that someone wants backed up
-for each block, nodes agree on a random file segment in a distributed way.
-nodes retrieve the file segment asked for.
-nodes check that the file segment matches a one-way accumulator (or bloom filter?)
-nodes get coins in proportion to the amount of hard drive space they are contributing

Yep, but you can verify somebody else stores the same data only if you have it already as well. What you do is simple cookie authentication:

a: generates a cookie like this: { r: random_bytes , sig: ecc_sign(random_bytes) }
b: cookie.r == ecc_verify(cookie.sig), if ok then computes sha256(cookie + data_checked_ + cookie) and sends it to a
a: does the same sha256 to verify b does indeed stores the data

Imho it's impossible to check somebody owns the data without having it as well - if you have only hash of the data, they could have only the hash too and so on. If you can come up with scheme to actually verify storage w/o having it too, I'd be very interested in how to do that!

There's also Bytecoin mentioned on the wiki, but its description was too wague for me.


I think one-way-accumulators allow you to verify a bunch of data using a small amount of data.  The small data piece is what is stored in the blockchain.  I think bloom-filters are similar, but I am not an expert in either.

Sure, you can verify authenticity of data (if i understand correctly, this is just glorified merkle tree/blockchain). But how does http://upload.wikimedia.org/wikipedia/commons/8/8c/Hashlink_timestamping.svg actually prevents people just storing the hash of data?


The data and the hash would have to be the same size.  Or, if you like, the data *is* the hash.

Then verification consists of actually sending it. Not too bw efficient indeed Smiley

No, you only verify that you have a small segment of the file, so that is the bandwidth requirement.
sr. member
Activity: 364
Merit: 264
April 24, 2013, 03:21:13 PM
#15
I image a kind of free "distributed dropbox" with huge capacity.

Is it possible to use proof of storage for a new coin?  It would serve as a free distributed filesystem for people who need extra space.  People who have extra space would earn coins.

If its even possible there's probably other ways to do it, but here is my way:

-nodes store segments of a file that someone wants backed up
-for each block, nodes agree on a random file segment in a distributed way.
-nodes retrieve the file segment asked for.
-nodes check that the file segment matches a one-way accumulator (or bloom filter?)
-nodes get coins in proportion to the amount of hard drive space they are contributing

Yep, but you can verify somebody else stores the same data only if you have it already as well. What you do is simple cookie authentication:

a: generates a cookie like this: { r: random_bytes , sig: ecc_sign(random_bytes) }
b: cookie.r == ecc_verify(cookie.sig), if ok then computes sha256(cookie + data_checked_ + cookie) and sends it to a
a: does the same sha256 to verify b does indeed stores the data

Imho it's impossible to check somebody owns the data without having it as well - if you have only hash of the data, they could have only the hash too and so on. If you can come up with scheme to actually verify storage w/o having it too, I'd be very interested in how to do that!

There's also Bytecoin mentioned on the wiki, but its description was too wague for me.


I think one-way-accumulators allow you to verify a bunch of data using a small amount of data.  The small data piece is what is stored in the blockchain.  I think bloom-filters are similar, but I am not an expert in either.

Sure, you can verify authenticity of data (if i understand correctly, this is just glorified merkle tree/blockchain). But how does http://upload.wikimedia.org/wikipedia/commons/8/8c/Hashlink_timestamping.svg actually prevents people just storing the hash of data?


The data and the hash would have to be the same size.  Or, if you like, the data *is* the hash.

Alternatively concatenate the timestamp/signature of the last block onto the file contents, and hash that? Would that work?

Yes that will require disk access, but we're assuming that disk rate is faster than network rate.
newbie
Activity: 18
Merit: 0
April 24, 2013, 03:19:04 PM
#14
I image a kind of free "distributed dropbox" with huge capacity.

Is it possible to use proof of storage for a new coin?  It would serve as a free distributed filesystem for people who need extra space.  People who have extra space would earn coins.

If its even possible there's probably other ways to do it, but here is my way:

-nodes store segments of a file that someone wants backed up
-for each block, nodes agree on a random file segment in a distributed way.
-nodes retrieve the file segment asked for.
-nodes check that the file segment matches a one-way accumulator (or bloom filter?)
-nodes get coins in proportion to the amount of hard drive space they are contributing

Yep, but you can verify somebody else stores the same data only if you have it already as well. What you do is simple cookie authentication:

a: generates a cookie like this: { r: random_bytes , sig: ecc_sign(random_bytes) }
b: cookie.r == ecc_verify(cookie.sig), if ok then computes sha256(cookie + data_checked_ + cookie) and sends it to a
a: does the same sha256 to verify b does indeed stores the data

Imho it's impossible to check somebody owns the data without having it as well - if you have only hash of the data, they could have only the hash too and so on. If you can come up with scheme to actually verify storage w/o having it too, I'd be very interested in how to do that!

There's also Bytecoin mentioned on the wiki, but its description was too wague for me.


I think one-way-accumulators allow you to verify a bunch of data using a small amount of data.  The small data piece is what is stored in the blockchain.  I think bloom-filters are similar, but I am not an expert in either.

Sure, you can verify authenticity of data (if i understand correctly, this is just glorified merkle tree/blockchain). But how does http://upload.wikimedia.org/wikipedia/commons/8/8c/Hashlink_timestamping.svg actually prevents people just storing the hash of data?


The data and the hash would have to be the same size.  Or, if you like, the data *is* the hash.
newbie
Activity: 18
Merit: 0
April 24, 2013, 03:11:14 PM
#13
I image a kind of free "distributed dropbox" with huge capacity.

Is it possible to use proof of storage for a new coin?  It would serve as a free distributed filesystem for people who need extra space.  People who have extra space would earn coins.

If its even possible there's probably other ways to do it, but here is my way:

-nodes store segments of a file that someone wants backed up
-for each block, nodes agree on a random file segment in a distributed way.
-nodes retrieve the file segment asked for.
-nodes check that the file segment matches a one-way accumulator (or bloom filter?)
-nodes get coins in proportion to the amount of hard drive space they are contributing

Yep, but you can verify somebody else stores the same data only if you have it already as well. What you do is simple cookie authentication:

a: generates a cookie like this: { r: random_bytes , sig: ecc_sign(random_bytes) }
b: cookie.r == ecc_verify(cookie.sig), if ok then computes sha256(cookie + data_checked_ + cookie) and sends it to a
a: does the same sha256 to verify b does indeed stores the data

Imho it's impossible to check somebody owns the data without having it as well - if you have only hash of the data, they could have only the hash too and so on. If you can come up with scheme to actually verify storage w/o having it too, I'd be very interested in how to do that!

There's also Bytecoin mentioned on the wiki, but its description was too wague for me.


I think one-way-accumulators allow you to verify a bunch of data using a small amount of data.  The small data piece is what is stored in the blockchain.  I think bloom-filters are similar, but I am not an expert in either.
newbie
Activity: 18
Merit: 0
April 24, 2013, 03:08:59 PM
#12
Sorry, I'm not sure I follow you.  The proof of work is retrieving the file piece asked for, so you can't delete file pieces otherwise you won't be able to provide the proof of work and receive your coins.  Since random file pieces are asked for, you can't delete any pieces.

An amazing amount of date is stored but rarely fetched. Your approach might not know until too late. Also, it requires needless (slow) network traffic to test. Since I'm supposing your proof-of-stake is related to (number of blocks stored) * (time stored/block). Gaming the system might be to a malicious node's advantage.


Ok, I think I follow you.  The problem is that malicious nodes might gamble, deleting some file segments to gain more space.  Since file segments are rarely validated, the gamble might pay of for a long while.  You can't increase the validation rate because you can't bog down the network with tons of traffic.

Possible solutions:
-invalidate all/most of a nodes coins if they are caught without a requested file segment.
-only award coins once an entire file has been validated, but then you need a way to keep track of which segments have been checked.
-don't worry about it.  If you delete 20% of the data you are supposed to have, its not much space, and for each 10 minute block odds of requesting a delete segment is 20%, so you would be discovered quickly.
sr. member
Activity: 364
Merit: 264
April 24, 2013, 02:58:27 PM
#11
My thought of bandwidth was to simply use a 24-hour (or some nice fraction of the block retarget interval) rolling bandwidth. Self-reporting will not work, because of the possibility of malicious nodes. Rather, each node will report the bandwidth that it "sees" coming from all other nodes connected to it (one instantaneous measurement every block, averaged over the time interval) - therefore, if there are at least some honest nodes on the network, then reporting by malicious nodes will show up as a discrepancy. Not sure about the particular details, but guys with better math skills can take it from there.
newbie
Activity: 18
Merit: 0
April 24, 2013, 02:54:55 PM
#10
Ok seems there was some interest.  I'm just working this out now.  Since I'm only a bitcoin beginner, I admit this might not pan out.

The block chain is basically just like bitcoin, its the same size.  Each block has an accumulator that includes all the file segments stored during a 10 minute interval.  Though the size of files stored on the network might be huge, the accumulator is small.  In proof of work, nodes validate the hash is a solution of for a block.  In proof of storage the nodes validate that the retrieved file segment is in the accumulator of the appropriate block.

jimhsu,
I remember hearing about wuala before.  I didn't know they went centralized.  Thanks.

Redundancy is an issue as mentioned.  But, it seems like you can use any amount of redundancy you want, as long as people are adding storage space to the network.

Also, proof-of-bandwidth would also need to be validated, so maybe there are bandwidth tests in order to join the network.  Hadn't thought of bandwidth issues before.  But would need to be worked out.
Red
full member
Activity: 210
Merit: 115
April 24, 2013, 02:45:01 PM
#9
Sorry, I'm not sure I follow you.  The proof of work is retrieving the file piece asked for, so you can't delete file pieces otherwise you won't be able to provide the proof of work and receive your coins.  Since random file pieces are asked for, you can't delete any pieces.

An amazing amount of date is stored but rarely fetched. Your approach might not know until too late. Also, it requires needless (slow) network traffic to test. Since I'm supposing your proof-of-stake is related to (number of blocks stored) * (time stored/block). Gaming the system might be to a malicious node's advantage.
newbie
Activity: 18
Merit: 0
April 24, 2013, 02:40:18 PM
#8
The blockchain stores one small number for every file.  It is an accumulator which has all the file pieces added to it.  To verify file segment storage nodes retrieve the accumulator from the blockchain and check that the file segment is in the accumulator.

That is an interesting concept. Thanks for the reference to a Bloom Filter!

The problem I see is that while you know the stored a data. You don't know if that store displaced something else. (They deleted the data afterwords)

Would you know on the next store? Does every byte stored need to go through the Bloom Filter each time? Article was TL;DR; (yet)

An alternate solution is just to pose a proof-of-storage piece of work. Obviously there would need to be redundant storage of each data block. Just ask each redundant node to re-hash the block using a random seed that you choose. They should all match.

Sorry, I'm not sure I follow you.  The proof of work is retrieving the file piece asked for, so you can't delete file pieces otherwise you won't be able to provide the proof of work and receive your coins.  Since random file pieces are asked for, you can't delete any pieces.
sr. member
Activity: 364
Merit: 264
April 24, 2013, 02:30:38 PM
#7
Here's a publication on Wuala, and why it ultimately scrapped the distributed approach in favor of centralization:
http://www.eurecom.fr/fr/publication/3772/download/rs-publi-3772.pdf

1. Better data-deduplication. This is clearly a concern with downloading the full blockchain (no way would a user ever require 10000 copies of his or her files, no matter how important).

Modify client to allow a "blockchain coverage" option. If blockchain coverage is adjusted down, delete the most redundant blocks in the local blockchain. If coverage is adjusted up, download the least redundant blocks to the local blockchain. (Need a way to get redundancy somehow, perhaps via broadcasting something from the client software, counting blocks "that I own"). Tie "blockchain coverage" setting directly into "allocated storage size" (below).

2. Bandwidth limitations. Users putting up, say 1TB, over a puny 768 kbps connection. "Proof of storage" also requires a "proof of bandwidth" and "proof of uptime" component. Storage with unreliable bandwidth is useless.

I propose a function like this: "Storage rate" = Allocated storage size * 24-hour rolling bandwidth (units: bits^2/sec). Tie this into "difficulty". Clients that contribute either more bandwidth or more storage are rewarded w/ blocks.

3. Centralization efficiency. Having clients communicate with a central server for distributed data is inefficient. Cryptocoins inherently solve this problem. But then there's the blockchain problem again.

4. Simplicity. Centralization is simple, for the reasons above. Then again, cryptocoins have this down also.
sr. member
Activity: 364
Merit: 264
April 24, 2013, 02:23:07 PM
#6
So something like a crypto-version of Wuala (before it got bought by Lacie)? I would be up for that.

What would be a "block" in this case? A unit of storage (1GB, 1TB, ?) Encrypted file segments (as transactions)? What will addresses correspond to? Cool idea regardless.

A full blockchain definitely won't work though, so the client will have to be coded with that in mind.
Red
full member
Activity: 210
Merit: 115
April 24, 2013, 02:21:38 PM
#5
The blockchain stores one small number for every file.  It is an accumulator which has all the file pieces added to it.  To verify file segment storage nodes retrieve the accumulator from the blockchain and check that the file segment is in the accumulator.

That is an interesting concept. Thanks for the reference to a Bloom Filter!

The problem I see is that while you know the stored a data. You don't know if that store displaced something else. (They deleted the data afterwords)

Would you know on the next store? Does every byte stored need to go through the Bloom Filter each time? Article was TL;DR; (yet)

An alternate solution is just to pose a proof-of-storage piece of work. Obviously there would need to be redundant storage of each data block. Just ask each redundant node to re-hash the block using a random seed that you choose. They should all match.
newbie
Activity: 18
Merit: 0
April 24, 2013, 01:22:27 PM
#4

I see validation as:

The blockchain stores one small number for every file.  It is an accumulator which has all the file pieces added to it.  To verify file segment storage nodes retrieve the accumulator from the blockchain and check that the file segment is in the accumulator.
hero member
Activity: 631
Merit: 501
April 24, 2013, 01:15:56 PM
#3
Not to mention a HUGE block chain...
Did I mention huge?   Wink
legendary
Activity: 1205
Merit: 1010
April 24, 2013, 01:14:25 PM
#2
It's a difficult problem, the main issue is that, file segments can be stored only at limited number of nodes, thus preventing most nodes from verifying the 'proof'. So basically there is no proof, as least not as solid a proof as in proof-of-work and proof-of-stake where every node can easily verify the proof.
newbie
Activity: 18
Merit: 0
April 24, 2013, 01:02:37 PM
#1
I image a kind of free "distributed dropbox" with huge capacity.

Is it possible to use proof of storage for a new coin?  It would serve as a free distributed filesystem for people who need extra space.  People who have extra space would earn coins.

If its even possible there's probably other ways to do it, but here is my way:

-nodes store segments of a file that someone wants backed up
-for each block, nodes agree on a random file segment in a distributed way.
-nodes retrieve the file segment asked for.
-nodes check that the file segment matches a one-way accumulator (or bloom filter?)
-nodes get coins in proportion to the amount of hard drive space they are contributing
Jump to: