Pages:
Author

Topic: Tahoe-lafs and Bitcoin Integration Bounty (210 BTC pledged) (Read 13059 times)

hero member
Activity: 714
Merit: 500
Another interesting project along these lines is DataHaven.NET. The developers recently created a currency-based marketplace for P2P encrypted storage, and have stated they're interested in integrating Tahoe-LAFS. However, it's not clear how they'll accomplish this.

Interesting. Though it does look a little like the project is dying. Also where is the github/source link?

One problem with their market approach is that they are effectively price fixing. It would be better to have the market decide the price of storage based on the provider's reputation/capabilities/performance etc.

It would be pretty sweet to have semi-autonomous applications that are able to negotiate their own data storage contracts via a nice API.
newbie
Activity: 19
Merit: 0
Another interesting project along these lines is DataHaven.NET. The developers recently created a currency-based marketplace for P2P encrypted storage, and have stated they're interested in integrating Tahoe-LAFS. However, it's not clear how they'll accomplish this.

I've been experimenting with integrating Tahoe with Electrum for syncing plugin data. It's not intended to become a general data storage solution, but it would greatly benefit from there being an accounting or quota system built into Tahoe, as suggested ITT.

I figure the wallet storage grid could be volunteer driven in the short-term until something is worked out. Still, my concern is with backing up Tahoe directory writecaps, of the form:

Code:
URI:DIR2:65sm34olspfc5msodmptuckpne:yrpwyrha3sqlx5qst3w7l724ebbqyotobsjaoyi7pamt52xrfo5q

The above URI is the key to your files stored in Tahoe almost like a Bitcoin private key is the key to your stored Bitcoin. You can't afford to lose it.

The following bash function hex encodes the Tahoe writecap then passes it through Electrum's mnemonic system. The number of words generated is large but manageable. In cases where only temporary storage space on a grid is needed, the number of words may be irrelevant.

Code:
#!/bin/bash

# -----------------------------------------------------------------------------
# LAFSify: derive an Electrum wallet from any given Tahoe-LAFS writecap
#
# requires expect, hURL, and jq
# https://github.com/atweiden/dotfiles/blob/master/_functions.d/functions/lafsify.sh
# -----------------------------------------------------------------------------

function lafsify() {
echo -n 'Enter your Tahoe-LAFS writecap: '; read WRITECAP
SEED=$(hURL -s --no-color --HEX $WRITECAP)
expect <  spawn electrum -w electrum-lafs.dat restore -o -C
  expect "Password*" {
    send "\r"
  }
  expect "fee*" {
    send "\r"
  }
  expect "gap*" {
    send "\r"
  }
  expect "seed*" {
    send "$SEED\r"
  }
  expect eof
EOF
MNEMONIC=$(electrum -w electrum-lafs.dat getseed -o | jq -M '.mnemonic')
echo "Your Tahoe-LAFS writecap: $WRITECAP"
echo "Your resulting Electrum wallet seed: $SEED"
echo "Your resulting Electrum wallet mnemonic: $MNEMONIC"
}

function unlafsify() {
echo -n 'Enter your Electrum-LAFS mnemonic: '; read MNEMONIC
expect <  spawn electrum -w electrum-lafs.dat restore -o -C
  expect "Password*" {
    send "\r"
  }
  expect "fee*" {
    send "\r"
  }
  expect "gap*" {
    send "\r"
  }
  expect "seed*" {
    send "$MNEMONIC\r"
  }
  expect eof
EOF
SEED=$(electrum -w electrum-lafs.dat getseed -o | jq -M '.seed')
WRITECAP=$(hURL -s --no-color --hex $SEED)
echo "Your starting Electrum wallet mnemonic: $MNEMONIC"
echo "Your starting Electrum wallet seed: $SEED"
echo "Your resulting Tahoe-LAFS writecap: $WRITECAP"
}

Using
Code:
URI:DIR2:65sm34olspfc5msodmptuckpne:yrpwyrha3sqlx5qst3w7l724ebbqyotobsjaoyi7pamt52xrfo5q
...

we get the 57-word mnemonic:

"knee drink survive keep glove story existence brown sunlight jump suddenly slide hate gun group thorn curve family been energy squeeze brain visit strain passion spill driver take pierce end sway unable wrinkle youth grey forgive dirty tough impossible ship throne born need suddenly march reason caress moan carve drive soar shirt team weary began movie string"

Running it backwards works as well.

The more interesting aspect of this is it gives each Tahoe writecap its own deterministic Bitcoin wallet. It's also possible to make a simple brainwallet with it, but with Electrum you get a fully deterministic wallet with more features.

Assuming the tahoe-client is running on a trusted local device, it may be possible for the writecap to autonomously negotiate for space in the storage grid to which it belongs. In particular, because the writecap is self-aware of its own Electrum seed, it could in theory create and authorize a micropayment channel with the tahoe grid. There are many workable variations to this approach.

Only the user, and the machine controlled by the user, would know the Electrum wallet seed, because the seed would be the writecap. The writecap would need to stay a secret, but if so, the user could pay for a storage subscription by transferring coins into the writecap wallet.

Payment is an entirely separate issue. Fundamentally, giving a person 57 words to write down or memorize, then ensuring him that his writecap will always be around with his files, regardless of what needs to happen on the server side, could be an issue for those seeking reliable long-term storage. I'm not clear on what happens if the grid introducer must be moved or replaced, or if this too could happen autonomously. Maybe it's possible to run an "ongoing" grid of some kind. Maybe the grid is created with X introducer.furl but later changes to Y introducer.furl, and all previously participating storage-node operators would update their node to point to Y introducer.furl. I don't know enough about Tahoe to say whether this negates the location of previously-created writecaps or if this is even a concern.
legendary
Activity: 2618
Merit: 1007
The pledgers seem to be rather very quiet  Wink
The project itself sits on donations of a few hundred BTC, check their page on archive.org for their old donation address that was switched without any comment recently. The funds are still there.
full member
Activity: 235
Merit: 100
My interest is not in getting the pledged funds, but to have a distributed storage service w/o central control Smiley
hero member
Activity: 714
Merit: 500
The pledgers seem to be rather very quiet  Wink
full member
Activity: 235
Merit: 100
Watching this with plan to allocate some development power to the project, as team would free up
hero member
Activity: 714
Merit: 500
We would also require some price discovery mechanism to establish a marketplace for storage/bandwidth.
hero member
Activity: 714
Merit: 500
Just thought I would say that I am more than happy to work on this for 210 BTC  Grin

Sorry for the necro, but this thread reminds me that this kind of idea is really really important.

Can anyone point me to any ongoing and active effort to build an incentivized file-storage P2P network?




legendary
Activity: 2618
Merit: 1007
As you can see it took only a second to download and emitted only one byte to stdout, but it downloaded and verified the integrity of the 128 KiB segment that contained that byte.

It takes 1 second for YOU to download 128 KiB (and for your counterpart(s) to upload 128 KiB). Also this generates heavy traffic for nodes - I would test them once per hour or even more often. This means there would be significant outgoing traffic if I host a few TB only for testing that I still have the files.

CPU time is cheaper than bandwidth. It would make more sense to precompute CRC32 oder Adler32 hashes of random ranges in each file, distribute these and request these hashes. Once you run out/low, you can do a full verify, download the file + generate new hashes. This could be tuned to happen once a month or even less frequent, depending on your personal rating on that node's reliability.
sr. member
Activity: 312
Merit: 265
You can create a Bounty jar on a site I launched today: https://booster.io
newbie
Activity: 20
Merit: 0

Correct, to verify every bit of every share you need to download every bit of every share. It's expensive - hopefully future versions of Tahoe-LAFS will implement a probabilistic "proof of retrievability" protocol like the one you suggest.

Downloading only a subset of a file is already implemented, but there isn't a command implemented that says "pick a random segment of this file, download it from that server, and let me know if it passed integrity checks". You can approximate it with the current Tahoe-LAFS client, like this (the following lines that begin with "$" is me typing in stuff as though I were using a bash prompt):

1. Pick a random spot in the file. Let's say the file size is 10 MB:

$ FILESIZE=29153345
$ python -c "import random;print random.randrange(0, $FILESIZE)"
2451799

2. Fetch the segment that contains that point. Segments are (unless you've tweaked the configuration in a way that nobody does) 128 KiB in size, so this will download the 128 KiB of the file that contain byte number 2451799 and check the integrity of all 128 KiB:

$ curl --range 2451799-2451799 http://localhost/uri/URI:CHK:jwq3f6lkcioyxeuxlt3exlulqe:sccvpp27agfz32lqjghq2djaxetcuo7luko5dhrpdgs7bfidbasa:1:1:29153345 | hexdump -C
% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100     1    0     1    0     0      0      0 --:--:--  0:00:01 --:--:--     0
00000000  a9                                                |.|
00000001

As you can see it took only a second to download and emitted only one byte to stdout, but it downloaded and verified the integrity of the 128 KiB segment that contained that byte.

If you are using multiple servers, using Tahoe-LAFS's awesome erasure coding feature to spread out the data among multiple servers, then this will download the data from the 3 fastest servers (unless you've changed the default setting from "3" to some other number). There is no good way to force it to download the data from specific servers in order to test them -- it always picks the fastest servers. You can see which server(s) it used by looking at the "Recent Uploads and Downloads" page on the web user interface, which will also tell you a bunch of performance statistics about this download.

In short, this feature is *almost* there. We just need someone to write some code to do this automatically in the client (which is written in Python) instead of as a series of bash commands. Also this code should download one (randomly chosen) block from every server it can find instead of from just the three fastest servers, and it should print out a useful summary of what it tried and which servers had good shares.

Oh, there is a different function which does print out a useful summary of results -- the "verify" feature. But, that downloads and tests every block instead of just one randomly chosen block. Another way to implement this is to add an option to that to indicate how many blocks it should try:

$ time tahoe check --verify URI:CHK:jwq3f6lkcioyxeuxlt3exlulqe:sccvpp27agfz32lqjghq2djaxetcuo7luko5dhrpdgs7bfidbasa:1:1:29153345
Summary: Healthy
 storage index: 7qhuoagk4z4ugsjkjgjcre6sx4
 good-shares: 1 (encoding is 1-of-1)
 wrong-shares: 0
real    1m2.705s
user    0m0.570s
sys     0m0.060s

donator
Activity: 2058
Merit: 1054
Hm. Money may not have been considered in the original design, but motivated attackers were. A volunteer platform is different to a monetized platform, but I don't think introducing money changes the threat model for existing components, or at least not in the way you seem to think it would.
The question is, motivated to do which attack? I can understand protection of the secrecy of the data. But without consideration of money, I don't see how protection against faking work would be up to snuff.

But as I said before, I know very little about Tahoe-LAFS. I described the kind of issues it will have to deal with, if it's up to the task, more power to it.
newbie
Activity: 5
Merit: 0
Let's say I'm the original uploader. I'm keeping the verify-cap (and may or may not keep the file), and at some later time I want to verify that node A is storing the file as he's supposed to. Do I need A to send the entire file to me? Bandwidth is expensive, and this means each verification is expensive. So I can't do it very often, let's be generous and say once a month. So I upload the file to A and pay him. He chooses not to store it, and one month later he fails the verification. What penalty does he have? What recourse do I have? If I pay after the fact, what makes sure I pay? Since it's to a large extent about redundancy, I can choose to pay only in the contingency that I require the download. [...] If it's expensive, it also means I pay for the verification more than for the storage.

For the system to make any sense at all verification needs to be cheap. I was thinking some sort of probabilistic test where I quiz on X random bits, if I have a copy that's cheap, and the probability to pass the test without the file (or at least so large a portion of it that he may as well keep the whole thing) is slim. Then I can do verifications more rapidly which limit the room for manipulations. But unless there's some fancy cryptographic way to conduct the test without the entire copy, it still means I need to rely on other nodes to verify each other, and they also need to be incentivized to do that. And there needs to be some sort of conflict resolution (based on a probabilistic model) for times that I'm not around (even if I normally operate a trusted node, it could be down for maintenance or something).

Correct, to verify every bit of every share you need to download every bit of every share. It's expensive - hopefully future versions of Tahoe-LAFS will implement a probabilistic "proof of retrievability" protocol like the one you suggest.

I suspect penalties for misbehaviour will be part of a future accounting system (https://tahoe-lafs.org/trac/tahoe-lafs/wiki/NewAccountingDesign).

I'll say again - a volunteer platform is entirely different than a monetized platform. When money is involved people will do everything they can to manipulate the system, steal and obtain pay for no work done. There needs to be a system resistant to manipulations, and I'm still not at all convinced Tahoe-LAFS is at that level.

Hm. Money may not have been considered in the original design, but motivated attackers were. A volunteer platform is different to a monetized platform, but I don't think introducing money changes the threat model for existing components, or at least not in the way you seem to think it would.
donator
Activity: 4760
Merit: 4323
Leading Crypto Sports Betting & Casino Platform
Interesting project.

+1

I don't know anything about it other than what I've read in this thread, but if there are people who will pay BTC for storage space, I'm in for 1 TB.
hero member
Activity: 714
Merit: 500
Interesting project.
donator
Activity: 2058
Merit: 1054
This isn't clear at all. Is the ciphertext known only to the file owner? If so, does this mean that only the owner can verify? And even if everyone can verify, again - what incentive do they have to do this? I don't want a system where the owner needs to continuously operate a node to keep everyone honest. He should be able to upload, pay for storage for X period with Y redundancy, forget about it, let the system keep itself in check, and connect at a later time to download the file as needed. Is that really not a challenge?

Sorry if I wasn't clear. Only a verify-cap can verify - neither being the files "owner" nor possessing the ciphertext is sufficient. The original uploader (or, "owner") has a verify-cap if she decides to keep it, and anyone she shares it with also has it. But you're right that the only users with incentive to verify a file are those that care about its integrity, Tahoe-LAFS isn't designed to "keep itself in check". If it was, you'd have to rely on some subset of the storage servers for integrity.
If I understand this correctly, it's worse than I thought.

Let's say I'm the original uploader. I'm keeping the verify-cap (and may or may not keep the file), and at some later time I want to verify that node A is storing the file as he's supposed to. Do I need A to send the entire file to me? Bandwidth is expensive, and this means each verification is expensive. So I can't do it very often, let's be generous and say once a month. So I upload the file to A and pay him. He chooses not to store it, and one month later he fails the verification. What penalty does he have? What recourse do I have? If I pay after the fact, what makes sure I pay? Since it's to a large extent about redundancy, I can choose to pay only in the contingency that I require the download. Also, what stops A from not storing the file, but rather redownload it from another node on each verification? If downloading each time is cheaper than storing it, A can cut costs and I lose the redundancy I paid for. If it's expensive, it also means I pay for the verification more than for the storage.

For the system to make any sense at all verification needs to be cheap. I was thinking some sort of probabilistic test where I quiz on X random bits, if I have a copy that's cheap, and the probability to pass the test without the file (or at least so large a portion of it that he may as well keep the whole thing) is slim. Then I can do verifications more rapidly which limit the room for manipulations. But unless there's some fancy cryptographic way to conduct the test without the entire copy, it still means I need to rely on other nodes to verify each other, and they also need to be incentivized to do that. And there needs to be some sort of conflict resolution (based on a probabilistic model) for times that I'm not around (even if I normally operate a trusted node, it could be down for maintenance or something).

I'll say again - a volunteer platform is entirely different than a monetized platform. When money is involved people will do everything they can to manipulate the system, steal and obtain pay for no work done. There needs to be a system resistant to manipulations, and I'm still not at all convinced Tahoe-LAFS is at that level.
newbie
Activity: 5
Merit: 0
This isn't clear at all. Is the ciphertext known only to the file owner? If so, does this mean that only the owner can verify? And even if everyone can verify, again - what incentive do they have to do this? I don't want a system where the owner needs to continuously operate a node to keep everyone honest. He should be able to upload, pay for storage for X period with Y redundancy, forget about it, let the system keep itself in check, and connect at a later time to download the file as needed. Is that really not a challenge?

Sorry if I wasn't clear. Only a verify-cap can verify - neither being the files "owner" nor possessing the ciphertext is sufficient. The original uploader (or, "owner") has a verify-cap if she decides to keep it, and anyone she shares it with also has it. But you're right that the only users with incentive to verify a file are those that care about its integrity, Tahoe-LAFS isn't designed to "keep itself in check". If it was, you'd have to rely on some subset of the storage servers for integrity.
donator
Activity: 2058
Merit: 1054
I suppose the verifying node will itself need a copy of the file (or the part of the file it's verifying)? Then either the owner of the file will need his own node to store the files and verify (which somewhat defeats the purpose), or he will need to rely on other nodes to quiz each other. What incentive do they have for this, and in case of a conflict, who should be believed? What are the penalties for failure to verify?
The verifying node only needs a secure hash of the original (shares+ciphertext) for a file - a verify-cap is exactly this. To actually verify that whatever the other nodes are storing matches the original, all shares must be downloaded, hashed and the result compared with the verify-cap. There's no need for the sort of majority consensus protocol that I think you're suggesting.
This isn't clear at all. Is the ciphertext known only to the file owner? If so, does this mean that only the owner can verify? And even if everyone can verify, again - what incentive do they have to do this? I don't want a system where the owner needs to continuously operate a node to keep everyone honest. He should be able to upload, pay for storage for X period with Y redundancy, forget about it, let the system keep itself in check, and connect at a later time to download the file as needed. Is that really not a challenge?
newbie
Activity: 5
Merit: 0
I suppose the verifying node will itself need a copy of the file (or the part of the file it's verifying)? Then either the owner of the file will need his own node to store the files and verify (which somewhat defeats the purpose), or he will need to rely on other nodes to quiz each other. What incentive do they have for this, and in case of a conflict, who should be believed? What are the penalties for failure to verify?

The verifying node only needs a secure hash of the original (shares+ciphertext) for a file - a verify-cap is exactly this. To actually verify that whatever the other nodes are storing matches the original, all shares must be downloaded, hashed and the result compared with the verify-cap. There's no need for the sort of majority consensus protocol that I think you're suggesting.

Maybe there are already solutions for all of this, and if so it's great. But again, a fully monetized platform is a league of its own in the requirements for a comprehensive verification protocol. As I understand Tahoe-LAFS is not currently a fully monetized platform, so I have my doubts that such a verification protocol has been developed.

I can't think of any new requirements that monetization would impose upon the file verification protocol.
donator
Activity: 2058
Merit: 1054
I don't know much about Tahoe-LAFS, but if there's no built-in incentive to provide storage, I'm guessing there's also no built in verification that you're actually storing what you say you are.
In Tahoe-LAFS there's a verify-capability associated with each file that lets you do an integrity check over the ciphertext and erasure coded shares for that file. A server can only lie about storing a file until another node attempts to verify it, and this can easily be scheduled to happen periodically.
I suppose the verifying node will itself need a copy of the file (or the part of the file it's verifying)? Then either the owner of the file will need his own node to store the files and verify (which somewhat defeats the purpose), or he will need to rely on other nodes to quiz each other. What incentive do they have for this, and in case of a conflict, who should be believed? What are the penalties for failure to verify?

Maybe there are already solutions for all of this, and if so it's great. But again, a fully monetized platform is a league of its own in the requirements for a comprehensive verification protocol. As I understand Tahoe-LAFS is not currently a fully monetized platform, so I have my doubts that such a verification protocol has been developed.
Pages:
Jump to: