Author

Topic: Idea for an anonymous file-sharing system (Read 276 times)

sr. member
Activity: 378
Merit: 250
March 08, 2018, 11:58:26 AM
#5
Major hurdle for adoption of GNUnet and similar is the morality of even remotely helping out child molesters or Nazis.

Censorship resistence is attractive, yet not without allowing users to select and enforce their own policies. This is the problem which needs to be solved. Anon filesharing exists, we just dont want it.
legendary
Activity: 2254
Merit: 1278
What I propose instead is something like the following system. (Note however that this is only a half-baked idea...)

I'll just chip in with a strong (experience-led) exhortation to consider including (at least optional) descriptive metadata.

Cheers

Graham
jr. member
Activity: 112
Merit: 5
I'm with the poster above that there are indeed projects already developed for this purpose.

One is Oyster protocol PRL.  Testnet A was a success and will be reopened to the public soon.  Testnet B is live right now with Mainnet slated for 6 April 2018. 

From the whitepaper:
Quote
The Oyster Protocol enables websites to silently generate traffic revenue as visitors perform
Proof of Work for a decentralized storage ledger.
Introduction
Despite the exponential growth of the internet, mechanisms for monetizing web content have
remained stagnant. Advertisements intrude on privacy, distract from the intended content, and
break design continuity in websites. Due to a general disregard and negative sentiment
towards online advertisements, ad blockers have become mainstream. They have become so
mainstream that content publishers are pushing back by blocking and limiting viewers from
content if ad blockers are detected. Publishers are losing either a large amount of money from
ad blockers, or a large amount of viewers from ad block retaliation mechanisms. Therefore the
entire advertising scene gradually morphed into an ineffective, inefficient and intrusive ordeal
without foresight and wholistic solution deployment.
In parallel, there is currently no storage service that is both convenient and private. If you
choose convenience, then you are opting for a standard cloud storage company which
precludes privacy and anonymity. Closed source software means you can never truly take what
they say for granted. If you choose privacy, then you will seldom find an accessible and straight
forward web interface with a simple ‘upload’ button.
The Oyster Protocol is a true two-birds-one-stone proposition. The Protocol introduces a
radically different approach to getting content publishers and content consumers to reach
equilibrium and cooperation. As a consequence, anyone with a web browser can store and
retrieve files in a decentralized, anonymous, secure, and reliable manner.
legendary
Activity: 2226
Merit: 1589
Do not die for Putin
Sorry if I am missing the point because I only partially understand your post, but there are a few Blockchain based projects out there with developments focused on encryption and anonymity, at least one of them with an API.

One of those is enigma https://enigma.co/

Quoting:
"Abstract
A peer-to-peer network, enabling different parties to jointly store and run computations
on data while keeping the data completely private. Enigma’s computational
model is based on a highly optimized version of secure multi-party computation,
guaranteed by a verifiable secret-sharing scheme. For storage, we use a modified
distributed hashtable for holding secret-shared data. An external blockchain
is utilized as the controller of the network, manages access control, identities and
serves as a tamper-proof log of events. Security deposits and fees incentivize operation,
correctness and fairness of the system. Similar to Bitcoin, Enigma removes
the need for a trusted third party, enabling autonomous control of personal data.
For the first time, users are able to share their data with cryptographic guarantees
regarding their privacy."


Another one is NuCypher https://www.nucypher.com/blockchain.html

Quoting:

"NuCypher KMS is a decentralized Key Management System (KMS) that addresses the limitations
of using consensus networks to securely store and manipulate private, encrypted data [1]. It provides
encryption and cryptographic access control, performed by a decentralized network, leveraging
proxy re-encryption [2]. Unlike centralized KMS as a service solutions, it doesn’t require trusting a
service provider. NuCypher KMS enables sharing of sensitive data for both decentralized and centralized
applications, providing security infrastructure for applications from healthcare to identity
management to decentralized content marketplaces. NuCypher KMS will be an essential part of
decentralized applications, just as SSL/TLS is an essential part of every secure web application."

Would any of these address the use case you are discussing?
administrator
Activity: 5222
Merit: 13032
February 28, 2018, 11:44:31 PM
#1
The state of anonymous file-sharing (and anonymous Web hosting) is very poor. The most commonly-used solution is Tor hidden services, but those have terrible security. They are weak to intersection, timing, and DoS attacks. Plus, Tor is fundamentally centralized, relying on a fixed set of Tor directory authorities to manage the network. I have no doubt whatsoever that the NSA & friends could easily find the true IP address of any Tor hidden service. I think that they only hold off on doing so in most cases because they like to build a false sense of security while holding that tool in reserve.

The ultimate solution to this is IMO to switch from a network architecture of "point-to-point" to a network architecture of "distributed data-store". Instead of having clients talk to a server somewhere (even behind 7 proxies), you should have the "server" upload their data to some "anonymous cloud", and then have clients download the data from that cloud, without ever needing to have any sort of connection to the server machine. This nicely addresses the most serious attacks against Tor: intersection & timing attacks against the server are much more difficult, since the server does not need to be online or sending data at the same time as the client, and DoS attacks are handled by the system itself.

Freenet and GNUnet are distributed data-store systems. Freenet even has a number of websites and social networks which function on the data-store model. It is possible to redo nearly every website under this model, though it is a major change.

But one major problem with Freenet and GNUnet is that their security (especially in Freenet's case) is ad hoc: they basically jam the system with a bunch of obfuscation and hope that it works. I have no confidence whatsoever in their security as a result. They're both probably especially vulnerable to sybil attacks when used in their opennet modes. They're also very slow, and they would probably fail to provide censorship-resistance if seriously challenged.



What I propose instead is something like the following system. (Note however that this is only a half-baked idea...)

Data stores

There are a handful of data-store-servers, each internally centralized. The job of one of these data stores is to maintain a key-value data-store, provide it for people to download either in full, via something like rsync, or via a private information retrieval (PIR) scheme. When PIR is used, it allows clients to download one or more keys from the server without giving the server any information about what keys were downloaded, providing the client with perfect anonymity even when the entire connection is observed by an attacker.

Data store descriptors

Clients will download "data-store descriptors" describing a number of data-stores. Eg:
Code:
Data-store alpha
Public key: xxx
IPs: a.b.c.d, e.f.g.h
Download-Cost: 1 mSatoshi/B
Upload-Cost: 5 mSatoshi/B

Data-store beta
...

It is not important that clients have some particular combination of data-stores. They can download as many of these descriptors as they want, whenever they come across them. The core software for this system might come with some built-in, but more could be added by the user.

Data-stores can charge for uploads and downloads. This can be done perfectly anonymously using blinded bearer certificates, or less-perfectly via eg. Bitcoin-Lighting.

Uploading data

You want to upload song.mp3.

1.Encrypt it with a random key.
2. Break it into fixed-size chunks, say 16kB in size.
3. Choose at least 3, but maybe more, data-stores that you know about.
4. Download all or a large random selection of recently-uploaded data on each of the chosen data-stores.
5. For each of your chosen data-stores, randomly classify each as either Original or Derived, but at least one must be Original.
6. Assume that you're using exactly 3 data-stores. Let your data be D, and the data at each of the data-stores be X, Y, and Z. Between 0 and 2 of X, Y, and Z will already be known. Randomly select the not-yet-known values so that D = X+Y+Z. For example, if you chose Y as Derived and X&Z as Original, you would randomly choose X&Z such that X+Z = D-Y. Prepare to upload the new data block(s) to the data-store(s). (You can use any reversible operation to combine the data; maybe addition isn't ideal.)
7. Repeat steps 5-6 for each block of data.
8. Create and prepare to upload your metadata block, which will have a table like:
Code:
Block# Store1_Key Store2_Key Store3_Key
     1        xyz        abc        def
     2        123        456        789
...
If your table is more than the block size, you can put a pointer to a continuation block at the end of it. (Or structure it as a tree.)

Finally, you should upload all of the blocks that you have prepared to upload, but you should do it in a random order and spread out over time. The more time you put between each block, the more difficult it will be to connect the blocks together.

Then you'll get a CHK URI that you can give to people which looks something like:
CHK@store1+store2+store,key1,key2,key3,decryption_key
eg. CHK@alpha+beta+gamma,SVbD9~HM,nzf3AX45,yFCBc-A4,bA7qLNJR7IXRKn6uS5PAySjIM6azPFvK~18kSi6bbNQ

PIR schemes don't give anonymous uploading natively, so there will need to be some onion routing thing between you and the server. But higher latency is OK here, and there are alternatives to Tor's naïve onion routing such as Riffle, so I think that this can be made very anonymous.

Downloading data

You were given a URI like the one above which leads to song.mp3.

1. You need to have previously downloaded descriptors for all of the data-stores in the URI.
2. From each data-store, download the listed keys using the anonymous PIR scheme, and add the data together. This will get you the first block which lists all of the others
3. Download all of the other blocks in the same way.
4. Once you have all of the blocks, concatenate them together and decrypt them with the decryption key in the URI.

Plausible deniability and censorship-resistance for data-stores

The key advantage of this scheme compared to things like Freenet is the plausible deniability and censorship-resistance for the ones storing the data. On Freenet, if you're running a node and someone gives you a CHK that they say is a copyright violation or whatever, it is technically possible for you to expunge that CHK from your node, and so maybe you could be forced to do so. Same for Tor hidden-service DHT participants.

But for a data-store in this scheme, if someone gives you a CHK that they demand be removed, they can say that some data in your data-store is being used by that CHK, but they can't say whether that data belongs to that CHK. The data may have been uploaded by someone else entirely, and if you delete it, you may break the original CHK which is totally legitimate, as well as any others which subsequently used that data. It's like creating new content by pasting together words cut out from a newspaper. I suspect that this aspect will make the system totally immune to DMCA takedowns and similar.

Because of this plausible deniability and censorship-resistance, an increased level of centralization can be accepted. You can more reasonably have a few dozen extremely fast, powerful data-stores rather than thousands of nodes on home Internet connections. This eliminates sybil attacks (on nodes) and improves the speed of the system. And while there are few data-stores, they are not an integral part of the system as a whole (ie. they don't "vote" or anything), and they can be fairly easily replaced if necessary.

Extra thoughts

It's maybe not necessary for each block's components to be stored on separate data-stores.

A CHK will stop working if any of its data-stores goes down. I wonder if, instead of addition, you could use an error-recovery scheme such that you only need 3 of 4 components of each block, or something like that.

Data-store blocks might have an expiration, but it should be either uniform across the data-store or very coarse-grained.

Data-store keys should be short, do not need to be unpredictable, and do not need to be user-definable. Data-stores might assign sequential keys starting at 0, and fill in gaps as blocks expire.
Jump to: