Sia - Siafund Redemption Deadline: June 1st, 2015 - page 10.

msin

legendary

Activity: 1470

Merit: 1004

Disappointing to hear, hope you can find a solution.

Taek

hero member

Activity: 543

Merit: 501

Treechains are not particularly similar to the quorums that Sia builds, Sia's quorum's [were] built out of a paxos-like message passing system. Treechains are a bunch of merge-mined POW blockchains that follow some special rules to build a tree.

Unforturnately everything that I thought was working for quorums is still broken. The problem is that when you have hundreds or millions of quorums, an attacker with 50% strength will be able to compromise at least a handful of quorums in the 70-80% range, and the attacker can pull successful attacks on those. It'd be easy to solve that problem if you could provide a proof-of-public-data, but the only way to prove that is to have the data yourself. Otherwise you're trusting whoever says the data is public or not public, and you'd have to go with the majority, which sometimes is an attacker.

We could weaken our assumptions to 33% global corruption of storage, and then increase the quorum size, and then it's much less likely for an attacker to break 51% strength in any quorum. But then you have failure modes that might destroy everything. IE if an attacker does manage to break 51% control of a particular quorum, he'll be able to continually inflate (by lying) the amount of storage he has, and eventually take 51% control of every quorum.

The source of the problem comes from the fact that not every node knows about every transaction. With a file storage system, you have to be able to handle many, many, many transactions, or have some way to allow the system to change state without using a transaction. But if you're doing proof-of-storage, and someone edits a file, the rest of the network has to know that the file has been edited so that they can change how they read a storage proof. Therefore, I don't think it's possible to edit a file without having a transaction. If most of the files are computer backups or large static files (videos, music, etc.), it might be okay but the real goal is to support files like web pages. Ideally you'd never need an HDD again (an SDD + large amount of RAM would be sufficient).

So I guess we have two options then:

1. Everyone sees every transaction
2. Not everyone sees every transaction

In the case of 1, the amount of transactions would need to be limited, and you'd have fees and they'd get pretty large. Something like this could potentially be done right on top of Bitcoin. This would probably only be useful for large _and_ important files. I don't really see Netflix using something like this, but it could be a worthwhile alternative to bittorrent (which suffers from the same problem of having files that are very hard to edit).

In the case of 2, you have to have some way of knowing that everything is okay, even though you can't see all of the transactions. The general strategy is to have a large but finite number of people verifying each piece of information, and you randomly assign who gets to verify what. You have some metric for protecting against sybil attacks during your assignment. (storage in our case). The problem is that if you're relying on something like a blockchain, you need to assume that each subset of verifiers is able to stay honest.

Consider the situation where 51% are dishonest. They are the majority, and so they can tell the network that things look like X. The network doesn't know, so it just has to accept X as true. The 49% honest people complain, saying that things actually look like Y. The dishonest group can be completely dishonest (change balances and such), because they don't ever tell anyone what Y is (Y is just a hash to keep things scalable). They just keep it to themselves and successfully corrupt the network.

So say you give the honest hosts some tool to claim that Y doesn't exist and is a made up hash. Then, in honest quorums you can have a dishonest set of hosts claim that the honest hosts are releasing data which doesn't exist. The network now needs to verify somehow that the honest hosts really have made the data public and aren't actually dishonest hosts in disguise. The only way to do this is for the network to download all of the data and verify for itself that the data exists. But now dishonest hosts will always claim that they can't see the data, and the rest of the network is always forced to download the data to verify that it's publicly available. So you end up with a situation that's not any different from a single blockchain.

Moon math aside, there doesn't seem to be any way around this. That's where Sia is stuck right now. The quorum's in the currently available whitepaper suffer from a related problem. Someone who manages to take control of >50% of a large number of quorums can start kicking out honest hosts, which will result in fines for the honest hosts and also result in a higher percentage of the hosts across the network as a whole being dishonest. Sia becomes pretty vulnerable once someone gets around 33% of the network.

=======

So let's say that we accept our limitations of 33%. What happens when someone breaks 33%? Unfortunately, they get a lot of power. In Bitcoin, when someone breaks 50% they control the blockchain for a while, they can attempt double spends and they can block transactions, but they can't spend other people's money. But in a network where the majority of hosts don't see all of the data, someone who manages to break a portion of the data CAN spend money from other wallets. They just lie about the state of the network, because they don't have to provide signatures of accuracy.

Even if we were able to prove that the dishonest hosts hadn't dishonestly spent a wallet's money (eg without a signature) (this proof could be done using snarks/scip), the dishonest hosts could still refuse to reveal what data had made it into the blockchain. The biggest issue here is that honest transactions could make it into the blockchain, and while the snarks would verify that everything was legal, honest hosts might not be able to know if their transactions had gone through. A dishonest party could prove that they had the network in a legal state without ever revealing what that state is. So the rest of the world just sees this giant black void that they can't reason about or make transactions through. The wallets within it might as well not have any coins.

Undecided

=======

So, what do we do? I'm going to keep looking at this but it doesn't seem like highly scalable storage is going to work in a decentralized way on today's technologies. I don't think it's reasonable to assume that nobody will ever get to 33% control (even without pools) because Bitcoin has a single factory that's got something like 25% of the mining power. Sia might fall to the same thing. It would be highly desirable to have some sort of response to the failure mode, such that if the honest set of people once again got above 67%, the network could quickly restore to a functioning, fully informed state.

Another big issue with proof-of-storage-for-consensus is that it's not progress free. You have to announce that you have the storage, you have to download files (and people have to be willing to upload the files to you - another issue we aren't certain how to solve). A malicous party controlling the network can just refuse to accept you as a contributor, and it won't matter how much storage you have because you will be ignored. This doesn't happen on Bitcoin because any random person can submit a proof of work. In Sia, this isn't currently possible.

=======

Dark days for Sia =/. But filecoin is also broken, permacoin is not efficient, maidsafe is opaque and very difficult to audit, and storj is underspecified. Cryptography is very hard, something I didn't properly appreciate until the past month or two. But Sia is not dead yet, we're just back to square 1 for the time being. The money hasn't run out yet and we won't give up until it does. Our total expenses since doing the crowd sale have been about 1/2 of what we raised. (through the crowd sale. We also have venture funding although that's in a completely different pool of money. Our responsibilities following the crowd sale is to make Sia, our responsibilities following venture funding is to make Nebulous Inc. a succesful corporation).

In the worst case, I think we will be able to create a less powerful system for peer to peer storage. Instead of selling storage to a broader network, you'd sell storage to individual peers. The peers would need to be responsible for picking reliable and honest hosts, which they could achieve by using reputation, randomness, and redundancy.

Huh

sorry guys I hope the next major update has more positive news. As always, happy to walk you through anything and explain exactly what's broken. Also happy to highlight questions you have about other systems, though it's very time consuming to take a close look.

Tobo

hero member

Activity: 763

Merit: 500

Quote from: Taek on October 03, 2014, 10:48:53 AM

I've also been reading a bit about Peter Todd's treechains concept, and it seems like that might be a valuable approach to doing things. Treechains are still early on in life but it might be a big step forward for cryptocurrency. As Peter Todd said: (paraphrasing) "Bitcoin is like the telephone. The infrastructure is very smart and the applications are dumb. Treechains are like the Internet. The infrastructure is very dumb and the applications are very smart." Having a much less inherently useful yet much more flexible infrastructure sounds very valuable.

This treechains concept sounds like your guys' quorum consensus. Are they similar to each other?

Breasal

hero member

Activity: 585

Merit: 500

https://forum.thesupernet.org/index.php?topic=163.0;topicseen

Taek

hero member

Activity: 543

Merit: 501

I'll be in touch with CfB soon.

I've been looking at our whitepaper and thinking that it might need to be spread into multiple papers. There are a lot of core concepts that don't depend on each other. It's intentionally designed this way to keep complexity low, which is critical for a cryptosystem, but that also means that we can probably split Sia into multiple papers, and this should really help with the credibility of the system as a whole. Additionally, it's a lot easier for someone to read a single paper about Proof-of-Capacity than it is for someone to read the entire Sia paper, especially if they're only interested in the capacity piece.

I've also been reading a bit about Peter Todd's treechains concept, and it seems like that might be a valuable approach to doing things. Treechains are still early on in life but it might be a big step forward for cryptocurrency. As Peter Todd said: (paraphrasing) "Bitcoin is like the telephone. The infrastructure is very smart and the applications are dumb. Treechains are like the Internet. The infrastructure is very dumb and the applications are very smart." Having a much less inherently useful yet much more flexible infrastructure sounds very valuable.

msin

legendary

Activity: 1470

Merit: 1004

Quote from: Breasal on October 03, 2014, 09:57:36 AM

Quote from: Tobo on October 02, 2014, 11:04:43 AM

CfB (Nxt core developer) is working on a distributed computing project. It might be able to help Sia on the efficiency of network computing.

- Jinn is a general purpose processor based on ternary logic with hardware support for distributed computing. This project also includes the Jiniri Limited (single-node) and Jiniri Unlimited (multi-node) emulators, in addition we are creating a monetized proof-of-concept game.

- http://www.jinnlabs.com/

Wow, this would be incredible if Jinn and Sia could collaborate! It seems that both the tech and their level of "coolness" as disruptors make them a pair made in heaven imo. Cheesy

Jinn is a very ambitious project, and if successful, many crypto projects will be using it (Ethereum, NxtAT, etc..)

Breasal

hero member

Activity: 585

Merit: 500

Quote from: Tobo on October 02, 2014, 11:04:43 AM

CfB (Nxt core developer) is working on a distributed computing project. It might be able to help Sia on the efficiency of network computing.

- Jinn is a general purpose processor based on ternary logic with hardware support for distributed computing. This project also includes the Jiniri Limited (single-node) and Jiniri Unlimited (multi-node) emulators, in addition we are creating a monetized proof-of-concept game.

- http://www.jinnlabs.com/

Wow, this would be incredible if Jinn and Sia could collaborate! It seems that both the tech and their level of "coolness" as disruptors make them a pair made in heaven imo. Cheesy

Tobo

hero member

Activity: 763

Merit: 500

CfB (Nxt core developer) is working on a distributed computing project. It might be able to help Sia on the efficiency of network computing.

- Jinn is a general purpose processor based on ternary logic with hardware support for distributed computing. This project also includes the Jiniri Limited (single-node) and Jiniri Unlimited (multi-node) emulators, in addition we are creating a monetized proof-of-concept game.

- http://www.jinnlabs.com/

mladen00

legendary

Activity: 2124

Merit: 1013

K-ing®

interesting concept

marek3ball-orig

member

Activity: 80

Merit: 10

Some good news? Storj is running fast

.

Djinou94

legendary

Activity: 1036

Merit: 1000

Cannot wait for the new whitepaper

Taek

hero member

Activity: 543

Merit: 501

A different update. I completely reworked the way that Sia handles files. This means we're going to need a completely new whitepaper, though only a few of the major concepts have changed.

The biggest benefit is that the total amount of bandwidth required to make Sia work has dropped dramatically. A downside is that now, when you add value to a file (for storing it), you can't remove it ever.

On the other side of things, we're working on building an ncurses-like interface for Sia, because the command line is pretty annoying. I'm hoping that'll be done soon, and we'll have an alpha 2.1, where nothing substantial has changed but at least there's an easier-to-use frontend.

I'm not sure when I'll have the next whitepaper ready. Maybe 2-3 weeks?

msin

legendary

Activity: 1470

Merit: 1004

Great, thanks for the update.

Taek

hero member

Activity: 543

Merit: 501

Apologizing for the triple post, but I'm happy to announce that we're ready to release the alpha v2!

We believe that we've sorted through all of the major issues, although file sizes are limited to 768kb, and you're not going to be able to be a participant on the network unless you have a public IP address or have manually set up port forwarding on your router. It's also linux only.

You can download the binary at siacoin.com/download.html, or fork the 'release' branch of our github repo.

It's definitely rough around the edges but everything seems to work without any major errors.

Let me know what you guys think and feel free to ask for help. I'll be poking my head into the IRC channel every now and then over the next few weeks, but the forum is still probably the best way to get my attention.

Taek

hero member

Activity: 543

Merit: 501

Going back to the random distrubution problem (it was bothering me enough):

The solution is to forcibly randomize one non-joining person in every single quorum. This is going to be super hand-wavy, but the overall gist is this:

When you join Sia, you are put in a random place where someone already exists. In addition, the quorum that you join has 1 additional person swapped out with someone random in the network.

If you control 1% of the network, and 50.5% of the quorum, all your random trials will only have you maintain 50.5% control of the quorum:

You roll the dice until you get a hit on your target quorum. Your target has 1 removed (the guy you replace), which has a 50% chance of being your own. Then one more is removed randomly. So in total, 2 are removed (-1.01 total for you), and you get +1 (because you are guaranteed to have hit the target quorum), and then you get +0.01 from the new guy (because the new guy has a 1% chance of being you; you control 1% of the network). So essentially by doing this, we prevent someone with 1% of the network from every being able to target a quorum.

What about 33%?

This equilibrium is around 2/3. You gain +1, then you lose 2 random. At 2/3, you lose an expected 4/3 quorum members. But then when bringing in the extra new guy, you have a 1/3 chance of it being yours. So the equilibrium allows a person at 33% to hold a 66% majority on a target quorum, though it's expensive to keep rolling to stay that high. At this point, files are at risk of being corrupted, because 66% is enough to corrupt files stored at the standard network value.

What about 50%?

The equilibrium is 75%, which still isn't enough to hit the 80% that's required to disrupt meta-quorum consensus. Though it's pretty uncomfortably close, requiring only a bit of luck to get to the 80%. Even at 80% though, you still only have a fraction of a percent of a chance of pulling off a corruption attack.

An even safer alternative is to swap out 2 random siblings for each new sibling that enters the network. This is more expensive to the quorum, and results in more overall bandwidth being used, but your equilibrium values fall a lot farther.

1% attack struggles to get past 1/3.
33% attack struggles to get past 55% (which is now ~safe from file corruption attacks)
50% attack struggles to get past 66% (now very safe from meta-quorum disruption)

=====

On the other end of the equation, you can try to kick out honest hosts from a quorum to stack it in your favor. But we have a convenient weapon: you can't kick out arbitrary honest hosts, you have to kick out _all_ honest hosts. It's an all-or-none procedure. When hosts are kicked from the network, you can replace them using the same entry process described above to keep the equilibrium values at the same points.

This does get a little messy, because a malicious party can keep kicking out full sets of honest hosts each time they get the majority in a quorum. That will add lots of load to the network, and is an effective DOS strategy, which we don't have a good solution to yet. Swapping out hosts is expensive, and it's one thing when the new guy is paying for it (like in the first situation), but it's a harder problem when you are kicking 20% (or 45%) of hosts who produced an honest block. Do you charge the whole quorum evenly (for not being well connected?)? You can't be sure which set is dishonest and you have to go with majority. So who pays for turbulence, or how can you force the two honest blocks to merge without introducing more error points into the consensus algorithm?

There might be a way to force multiple honest blocks to merge, but I'm highly wary of this approach because it could also be abusable in a way that results in forked quorums.

=====

I'm glad you brought this up, because our random distrubution algorithm did need a second look.

Taek

hero member

Activity: 543

Merit: 501

Astor, if you are interested, we'd be willing to interview you and see if you are a match for the team.

The protocol is largely incomplete, where in many places we've chosen the 'proof-of-concept' solution as opposed to the most optimal solution, because there's a lot of work that needs to happen before Sia is a fully optimized protocol. If you have a strong coding background (in any language), we'd be intereted in having you as an employee.

msin

legendary

Activity: 1470

Merit: 1004

Astor seems like a good potential dev for Sia.

Taek

hero member

Activity: 543

Merit: 501

Quote from: astor on September 03, 2014, 04:20:55 AM

My simulation says otherwise. Simulating an 8TB attack on a 8PB storage (1%), I could get a quorum majority using a 1% attack with only 43.7 trials per participant. For a 0.1% attack, I need 480 trials per resource.

To avoid this attack, you're going to have to make it so expensive to join that nobody will be able to afford it ;-)

This is not a normal binomial trial, but a process where the attacker chooses *where to withdraw* a participant. You only control the random placement.

Besides, the whitepaper states that the fee will be returned for well-behaving participants, so the sybil attack is essentially free. You might want to update that.

8TB is only .1% of 8PB. I'm not sure if your simulation did 1%, or 0.1%, but I'll look a lot closer at this. And for the record, a quorum is safe all the way up to 80% dishonesty. We do suffer the weakness where an attacker can choose who to withdraw. I'm not sure if you've included this in your simulation, but when you withdraw turbulence in introduced in the quorum you withdraw from to fight quorum manipulation. Withdrawing (still on the fence about the exact number) a host resulted in (1 or 2? I forget) additional participants being replaced at random. This feature was meant to protect the quorum that was being withdrawn from, but it will also protect on a much lesser scale the distribution of hosts in the network as a whole. The down payment cost is probably going to be in the ballpark of 1 week of earnings. The fine for leaving is going to be the cost of moving everyone around in your absence, and can't be avoided. The cost is reduced if you limit the amount of recovery bandwidth that is used (IE you upload your whole XGB to the next guy, instead of forcing them to do file repair).

43 trials per participant will cost something like 10 months of storage to pull off. At 1% of the network it's not terribly prohibitive but it's not trivial either. But keep in mind again that you only shot for quorum majority, which will not break Sia. Sia is designed to function at high probability even when 80% of the quorum is compromised. This works because of how quorums communicate to each other (and I'm actually re-writing this process right now, because this is what we're going to be implementing after the alpha release). When a quorum creates a new block, it talks to its siblings by having each host ping (3 or 5, haven't decided) random hosts in the sibling quorums. It's a low number to keep message volume down, but even at 3 messages per block per sibling, a quorum with only 20% honesty is going to have 60 chances of informing the other quorum. Assuming you've been manipulating around this quorum specifically, each of those 60 chances is going to have ~50% likelihood of reaching someone honest, giving a very comfortable margin. Even if the other quorum is also highly manipulated, (4/5)^60 still yields a favourable probability of each quorum learning about every block the other quorum creates.

And further, once a group of dishonest hosts have infected a quorum, they've only got 1 chance to pull off a dishonesty attack. As soon as it's made known that two conflicting blocks have been announced by the opposing quorum, the losers will be ejected from the quorum. If the block with the majority is fully honest (which can happen), the 20% honest hosts will be kicked from the quroum, but they will take an additional (20% or 40%) of the dishonest hosts along as a part of the quorum-protection. So the dishonest quorum will lose it's advantage. If the dishonest block release actually breaks rules (which is what you'd expect, why would you produce a conflicting honest block? - you don't gain anything if you're still following the rules of the network), then 100% of the dishonest hosts will be purged from the network. In this case, the turbulence doesn't need to trigger because it's clear that the hosts being kicked are dishonest, instead of being unsure. The 20% honest hosts can stick around then.

This still has problems that I'm aware of. Because the network redundancy for files is only about 2.5, a dishonest group achieving ~66% control of a quorum could potentially corrupt files. Additionally, depending on how the turbulence fees are set up, you could use the auto-turbulence to 'drain' quorums where you have the majority, effectively giving you an extra N rerolls on getting dishonest hosts into your target quorum, without having to pay for them.

My goal would be to make it prohibitively expensive for someone with less than 33% of the network to reach greater than 66% control of any quorum, and prohibitively expensive for someone with 49% control of the network to reach greater than 80% control of any quorum. We're busy this week, but next week I'll take some time out and research this better. I imagine someone's got a better way to manage random distributions than what we are using.

I think I would define 'prohibitively expensive' as 1 network-year of cost. So if the network is 8PB, 'prohibitively expensive' would mean spending the same amount it would cost to rent all 8PB for 1 year. But it's also worth looking at incentive. The 33% attack is only to be able to corrupt files, and typically if you are doing this you have a specific set of files that you want to corrupt. It's much harder to target a specific quorum than it is to target whichever quorum you are getting luckiest with. But additionally at 33% of the network you've got a huge profit incentive, and you're burning profits trying to attack the network.

I'll come back in a few weeks with a more mathematically oriented exploration of attacks regarding the random distribution process. I do realize that in it's current conception, it's not bullet proof.

Quote from: astor on September 03, 2014, 04:20:55 AM

I don't see a description of how the quorum works for the tree. What is described is a gossip protocol, but how will the tree reach *global* consensus on, say, the random generator?

I'm still thinking about the exact best approach. One solution I had that was functional was that each 'meta-quorum' has a nominated 'leaf-quorum' which performs the consensus and picks the exact values. Whatever block this 'leaf-quorum' creates is propagated and is used by the whole network to guarantee consensus. I think though there might be a way to achieve global consensus without nominating an explicit quorum. It's worth noting that global consensus is really only required for figuring out the macro-operations, like what the cost of storage is, and how many coins moved from meta-quorum A to meta-quorum B. If you are working in a binary tree, that means you really only need to be in consensus with your single sibling, and then you need a way to announce to your children what the consensus was. Your children don't have to all receive the block at the same time, they just have to know that "meta-quorum level 3, block #5100 was _this_". If one sibling gets it a block or two earlier than the other, it's not going to disrupt global consensus.

There is a 'reassure' protocol that will probably exist, which operates a lot like a 'trust but verify'. When you receive a meta block, you send random messages (and also receive random message) that verify everyone got the same block. This also helps to speed block propagation. If one quorum receives a different block from the rest, it's going to have 3 communications per honest host to figure that out.

Quote

====

For section 10, I'm not exaclty sure what problem you are discussing, but if an attacker creates a dishonest block, then an honest host inside of the attackers quorum will be able to prove that the block is dishonest (by showing a snapshot, and the previous honest blocks). An honest host can prove that a particular block is dishonest once a dishonest block has been created.

How? Can you spell out how the honest host will prove this? I'm saying that an attacker that controls the quorum can redefine what users store so that any proof of storage can be trivially calculated. For example the attacker simply defines that the users all store a long string of 0s. Calculating any merkle-tree challenge is then trivial.

"Controlling the quorum" is a very non-trivial task. That means you can create dishonest blocks without having the honest blocks propagate.

Lets walk through the steps:

1. A quorum creates a dishonest block, which either doesn't do proof of storage, or pays out money that the wallets didn't approve of, etc.
2. This quorum has honest hosts inside of it. These honest hosts will create an honest block, most likely without any signatures from the dishonest hosts. So there's this really empty, but honest block that's floating around.
3. The honest hosts contact their sibling quorum with 3 messages each, informing the sibling quorum of the block.
4. The sibling quorum bounces back a dishonest block.

There's now a dispute in progress. The sibling quorum will only accept a block if it can see the finer details of how the block was created (usually just hashes and macro-data are passed around, but now we have to demonstrate where the hashes came from).

5. The honest host provides the full block information. If the dishonest hosts cannot provide full information, they lose and are kicked. If they provide full, but dishonest, information:
6. The sibling quorum sends back the full block that the dishonest hosts received.

Important things to note:

each block has both a hash of the parent block, and a hash of the parent 'state'. The parent state is something that's saved infrequently into a snapshot, which can be used to rebuild the state and get the same original hash. Each block also contains the state hash after it was created. This means that the dishonest block _must_ align itself with the previous state, or it'll just be ignored as early or late.

7. The honest host uses its snapshot to build up the 'state' that the quorum was in prior to the dishonest block, and it sends the state to the sibling quorum. The sibling quorum can now use the previous state (which it can verify by hashing) plus the transactions in the dishonest block to prove to itself that the block is dishonest.

This process is bandwidth intense, depending on the size of the state, but the bandwidth required can be reduced by using merkle trees again. The honest host only needs to isolate the parts of the dishonest block that are dishonest.

Quote

Just so I understand, you're doing merkle-tree proofs on the reed solomon parts?

Right. So each host has some encoded-and-redundant piece that they need to maintain, and the merkle-root of that piece is known by everyone. A random 32byte piece within the slice they are holding is chosen, and they have to provide the information needed to prove on that piece.

Quote

Let me guess... if you still prove 64k blocks, then 700/16 = 44 levels of you have a 60bit addressable storage system (might be a bit excessive?). If you have something like that, I don't see how that affect my comment. I'm saying that if you prove 64k out of 8GB, then fetching that 64k must be 125.000 times as expensive as the reward for storing those 64k for the same amount of time.

Let's say a participant can choose to fetch the 64k from some other participant for cost 125.000, or store 64k*125.000 = 8GB for cost 125.000.

Let's assume a user is accessing random blocks, and churning through 8GB of storage once a month. With a block frequencey of 1/day, there is a 3% chance that the user will fetch the 64k block that is being proven, and where accessing the block must cost > the cost of storing 8GB for the block interval. So for a block interval of 1/day, the increased cost for this user is 3%/month.

(To offset this, I will do my own 31:30 reed salomon on top of your system so I can avoid touching the "poisoned block" during that day and avoid the cost)

But this assumes that you're only proving 64kB per day. That's a pretty low figure.

There's a way to simplify this further. Each proof covers 1 day of storage. So if you're downloading something, you have to pay for XGB * 1 day cost of storage, in case you end up downloading whatever is being proved in the current or next block. You have to pay this fee (initially), because the network can't be sure what storage will be targeted in the next block, but an attacker who is attempting to manipulate consensus can. At 64GB, and $5/TB/mo, this amounts to around 0.5 cents. At 1TB this is closer to 16 cents. Still manageable. If you are unlucky, you lose the 16 cents. If you are not unlucky, you get it back. Assuming that a user fetches a whole sector at a time (and that the sector is always the max size of 2MB), a user has a 2MB/1TB chance of losing that 16 cents, or 2*10^(-6) chance of losing the 16 cents. Not a big deal. The cost of downloading 2MB * 51 hosts (assuming the standard network redundancy of 2.5) is probably going to be higher than your expected payout of 0.000032 cents for getting unlucky.

Quote

Sorry, *changing*. Letting participants leave or join a quorum outside of the quorum protocol itself typically breaks the protocol.

I don't think it would break our protocol. It's designed to cope with arbitrary (and malicious) failures of nodes. Nodes can only join the quorum if they go through the random process, but nodes can leave the quorum whenever without breaking consensus. You don't need a minimum number of signatures to create a block. The algorithm insures that all remaining honest hosts will arrive at the same block during consensus, regardless of how many nodes have left (had failures) in the middle. Nodes who are joining can't jump in on the current block, they have to wait until they are confirmed by the network, and then they join at a designated block.

astor

newbie

Activity: 39

Merit: 0

Quote from: Taek on September 02, 2014, 05:02:44 PM

Quote from: astor on September 02, 2014, 04:24:39 PM

Just read your whitepaper. I am impressed by the guts, but not by your solution.

Maybe some of these issues have been discussed in this long, long thread, but:

- Your quorum is not a quorum when put in a tree!
- Assuming that attackers are randomly distributed is nonsense (sorry). The sybil attack is as old as time.
- Thus your binomal distribution calculations are simply wrong, because the assumptions are wrong (sybil attack).
- The whitepaper does not describe what the consensus tree looks like. Judging from the calculations in section 10, it seems like it could be a binary tree.
- A honest party cannot prove dishonesty with a properly programmed attacker using a simulated storage and controlling the (attacked) random generator (section 10).
- The number of messages required to keep consensus is very high.
- Did I mention sybil attacks? There is no protection!
- Section 6, what the cost of fetching a block from another participant needs to be is completely wrong in the whitepaper. It is off by 5 orders of magnitude! (by 100.000x!). This p0wns the consensus protocol, or this will indeed be an incredibly expensive storage platform.
- You allow changing the voter set in the consensus protocol without having consensus it seems.
- The proof of storage algorithm is way too naive when much, better algorithms exist.
- Nothing in the protocol requires a node from EVER SERVING DATA!
- Who is going to "fine" misbehaving nodes? The only thing you will see are invalid quorums where data is held hostage.

This is like a swiss cheese! Who reviewed this?

Alright lets go through this bit by bit. First off, the whitepaper hasn't been updated since May, so there are a few things I'll say here that are probably in conflict with what's actually written in the whitepaper. But most of the problems you bring up are not issues, you've merely misunderstood how we've addressed the problem.

First, and most important, the sybil attacks:

We don't assume that attackers are randomly distrubuted. We randomly distribute nodes as they enter the network. I actually don't see where this is discussed in the whitepaper- though it was in a previous version I may have forgotten to add it to this one. When a sibling joins the network, they replace a random existing sibing. The sibling that got replaced goes into the new quorum. This ensures that attackers are randomly distributed throughout the network - you have no control over where you are added when you join.

This is one of the reasons you have to pay a fee when joining the network. It makes it expensive to join multiple times and pick only the random slots that favor you. A large attacker with lots of resources can get lots of rerolls, but each reroll will be expensive and they need exponetentially more rerolls as they try to stack a particular quorum with a higher and higher percentage of corrupt nodes. Does this make sense? The attacker needs the down payments to keep rerolling, but they also need enough storage once they decide to stay in a quorum, because they will be performing storage proofs.

My simulation says otherwise. Simulating an 8TB attack on a 8PB storage (1%), I could get a quorum majority using a 1% attack with only 43.7 trials per participant. For a 0.1% attack, I need 480 trials per resource.

To avoid this attack, you're going to have to make it so expensive to join that nobody will be able to afford it ;-)

This is not a normal binomial trial, but a process where the attacker chooses *where to withdraw* a participant. You only control the random placement.

Besides, the whitepaper states that the fee will be returned for well-behaving participants, so the sybil attack is essentially free. You might want to update that.

Quote

====

What does the consensus tree look like? It'll probably be a binary tree, but that hasn't actually been decided. Could be 4 children per parent, or more. But probably just a binary tree.

I don't see a description of how the quorum works for the tree. What is described is a gossip protocol, but how will the tree reach *global* consensus on, say, the random generator?

Quote

====

For section 10, I'm not exaclty sure what problem you are discussing, but if an attacker creates a dishonest block, then an honest host inside of the attackers quorum will be able to prove that the block is dishonest (by showing a snapshot, and the previous honest blocks). An honest host can prove that a particular block is dishonest once a dishonest block has been created.

How? Can you spell out how the honest host will prove this? I'm saying that an attacker that controls the quorum can redefine what users store so that any proof of storage can be trivially calculated. For example the attacker simply defines that the users all store a long string of 0s. Calculating any merkle-tree challenge is then trivial.

Just so I understand, you're doing merkle-tree proofs on the reed solomon parts?

Quote

====

The number of messages required to keep consensus is high: yes unfortunately. At 128 nodes per quorum, assuming ~4kb per heartbeat, each participant will be doing on average somewhere around 512kb of bandwidth, and in attack cases could be doing up to 64mb of bandwidth per block. These are both bounded in ways that make consensus possible, but it's pretty expensive. I'm expecting the block rate to end up at around 1 per day, so participating nodes would be looking at consuming around 15mb of bandwidth per month, per 64GB of storage they are contributing. (The 64GB is a flexible number, we may up it to 256GB or something larger to reduce the total bandwidth the network is consuming).

Attack conditions are pretty severe, but we have a set of (undiscussed) approaches to the problem. DOS attacks are probably Sia's biggest weakness right now.

====

Section 6 has a dated way of doing proof of storage. Today, we use merkle trees + a saved-hash structure. It ends up adding a total of about 700 bytes per message to the consensus algorithm. Each heartbeat will have ~1kb of mandatory stuff, leaving 3kb for optional stuff. 3kb may end up not being enough, but remember that's 3kb _per sibling_, _per quorum_, which means that each quorum has a total of around 350kb per block that can be used for transactions, and there are going to be many quorums.

Let me guess... if you still prove 64k blocks, then 700/16 = 44 levels of you have a 60bit addressable storage system (might be a bit excessive?). If you have something like that, I don't see how that affect my comment. I'm saying that if you prove 64k out of 8GB, then fetching that 64k must be 125.000 times as expensive as the reward for storing those 64k for the same amount of time.

Let's say a participant can choose to fetch the 64k from some other participant for cost 125.000, or store 64k*125.000 = 8GB for cost 125.000.

Let's assume a user is accessing random blocks, and churning through 8GB of storage once a month. With a block frequencey of 1/day, there is a 3% chance that the user will fetch the 64k block that is being proven, and where accessing the block must cost > the cost of storing 8GB for the block interval. So for a block interval of 1/day, the increased cost for this user is 3%/month.

(To offset this, I will do my own 31:30 reed salomon on top of your system so I can avoid touching the "poisoned block" during that day and avoid the cost)

But this assumes that you're only proving 64kB per day. That's a pretty low figure.

Quote

====

"You allow hanging the voter set in the consensus protocol without having consensus"

I'm not sure what you mean by this.

Sorry, *changing*. Letting participants leave or join a quorum outside of the quorum protocol itself typically breaks the protocol.

Quote

====

Nothing in the protocol requires a node from ever serving data:

this is something that isn't discussed in the current whitepaper. There will be monetary incentives for serving data, and barring that, there will be a proof-of-data-transaction, which is a complicated and expensive process but forces a node to either serve data or be kicked. This currently isn't a high priority, I want to make sure the storage stuff all works before we move onto things like bandwidth, cost, and actually uploading files to users.

====

Fining misbehaving nodes:

a node that misbehaves will be kicked from the network, and have it will not recieve it's initial "down payment" as a refund. This is how nodes are fined. If a node gracefully leaves the network (there are a few cleanup things it needs to do - not fully discussed yet).

====

I realize that a lot of the details are left out. We're planning on releasing another paper in the next few months, namely an explicit specification for single-quorum consensus. (IE what the alpha v2 is going to look like). There's a lot of work left to do. Things that we are consciously leaving out for now:

+ People downloading wallets
+ Storing files that the network can't automatically repair (to defeat storage pool concerns)
+ The explicit way that things are priced
+ The explicit way that nodes in different quorums can look each other up

These are all important, and will be fully and formally defined eventually, but we're focusing on the more core problems first.

The things that we care about most right now:

+ Maintaining consensus in a way that is fully fault-tolerant
+ Preventing sybil attacks
+ preventing random number manipulation
+ secure proof of storage
+ transaction malleability
+ scalability

It's going to be much easier for me to discuss Sia with you if we focus on one problem at a time. I think the one you are most worried about is Sybil attacks, so lets focus on that until you are convinced that Sia is secure against them. (or until I am convinced that I've missed something important). Or if you want to focus on something else specifically (like total bandwidth consumption), we can do that. But just 1 problem at a time so the discussion can be more focused.

Taek

hero member

Activity: 543

Merit: 501

Quote from: astor on September 02, 2014, 04:24:39 PM

Just read your whitepaper. I am impressed by the guts, but not by your solution.

Maybe some of these issues have been discussed in this long, long thread, but:

- Your quorum is not a quorum when put in a tree!
- Assuming that attackers are randomly distributed is nonsense (sorry). The sybil attack is as old as time.
- Thus your binomal distribution calculations are simply wrong, because the assumptions are wrong (sybil attack).
- The whitepaper does not describe what the consensus tree looks like. Judging from the calculations in section 10, it seems like it could be a binary tree.
- A honest party cannot prove dishonesty with a properly programmed attacker using a simulated storage and controlling the (attacked) random generator (section 10).
- The number of messages required to keep consensus is very high.
- Did I mention sybil attacks? There is no protection!
- Section 6, what the cost of fetching a block from another participant needs to be is completely wrong in the whitepaper. It is off by 5 orders of magnitude! (by 100.000x!). This p0wns the consensus protocol, or this will indeed be an incredibly expensive storage platform.
- You allow changing the voter set in the consensus protocol without having consensus it seems.
- The proof of storage algorithm is way too naive when much, better algorithms exist.
- Nothing in the protocol requires a node from EVER SERVING DATA!
- Who is going to "fine" misbehaving nodes? The only thing you will see are invalid quorums where data is held hostage.

This is like a swiss cheese! Who reviewed this?

Alright lets go through this bit by bit. First off, the whitepaper hasn't been updated since May, so there are a few things I'll say here that are probably in conflict with what's actually written in the whitepaper. But most of the problems you bring up are not issues, you've merely misunderstood how we've addressed the problem.

First, and most important, the sybil attacks:

We don't assume that attackers are randomly distrubuted. We randomly distribute nodes as they enter the network. I actually don't see where this is discussed in the whitepaper- though it was in a previous version I may have forgotten to add it to this one. When a sibling joins the network, they replace a random existing sibing. The sibling that got replaced goes into the new quorum. This ensures that attackers are randomly distributed throughout the network - you have no control over where you are added when you join.

This is one of the reasons you have to pay a fee when joining the network. It makes it expensive to join multiple times and pick only the random slots that favor you. A large attacker with lots of resources can get lots of rerolls, but each reroll will be expensive and they need exponetentially more rerolls as they try to stack a particular quorum with a higher and higher percentage of corrupt nodes. Does this make sense? The attacker needs the down payments to keep rerolling, but they also need enough storage once they decide to stay in a quorum, because they will be performing storage proofs.

====

What does the consensus tree look like? It'll probably be a binary tree, but that hasn't actually been decided. Could be 4 children per parent, or more. But probably just a binary tree.

====

For section 10, I'm not exaclty sure what problem you are discussing, but if an attacker creates a dishonest block, then an honest host inside of the attackers quorum will be able to prove that the block is dishonest (by showing a snapshot, and the previous honest blocks). An honest host can prove that a particular block is dishonest once a dishonest block has been created.

====

The number of messages required to keep consensus is high: yes unfortunately. At 128 nodes per quorum, assuming ~4kb per heartbeat, each participant will be doing on average somewhere around 512kb of bandwidth, and in attack cases could be doing up to 64mb of bandwidth per block. These are both bounded in ways that make consensus possible, but it's pretty expensive. I'm expecting the block rate to end up at around 1 per day, so participating nodes would be looking at consuming around 15mb of bandwidth per month, per 64GB of storage they are contributing. (The 64GB is a flexible number, we may up it to 256GB or something larger to reduce the total bandwidth the network is consuming).

Attack conditions are pretty severe, but we have a set of (undiscussed) approaches to the problem. DOS attacks are probably Sia's biggest weakness right now.

====

Section 6 has a dated way of doing proof of storage. Today, we use merkle trees + a saved-hash structure. It ends up adding a total of about 700 bytes per message to the consensus algorithm. Each heartbeat will have ~1kb of mandatory stuff, leaving 3kb for optional stuff. 3kb may end up not being enough, but remember that's 3kb _per sibling_, _per quorum_, which means that each quorum has a total of around 350kb per block that can be used for transactions, and there are going to be many quorums.

====

"You allow hanging the voter set in the consensus protocol without having consensus"

I'm not sure what you mean by this.

====

Nothing in the protocol requires a node from ever serving data:

this is something that isn't discussed in the current whitepaper. There will be monetary incentives for serving data, and barring that, there will be a proof-of-data-transaction, which is a complicated and expensive process but forces a node to either serve data or be kicked. This currently isn't a high priority, I want to make sure the storage stuff all works before we move onto things like bandwidth, cost, and actually uploading files to users.

====

Fining misbehaving nodes:

a node that misbehaves will be kicked from the network, and have it will not recieve it's initial "down payment" as a refund. This is how nodes are fined. If a node gracefully leaves the network (there are a few cleanup things it needs to do - not fully discussed yet).

====

I realize that a lot of the details are left out. We're planning on releasing another paper in the next few months, namely an explicit specification for single-quorum consensus. (IE what the alpha v2 is going to look like). There's a lot of work left to do. Things that we are consciously leaving out for now:

+ People downloading wallets
+ Storing files that the network can't automatically repair (to defeat storage pool concerns)
+ The explicit way that things are priced
+ The explicit way that nodes in different quorums can look each other up

These are all important, and will be fully and formally defined eventually, but we're focusing on the more core problems first.

The things that we care about most right now:

+ Maintaining consensus in a way that is fully fault-tolerant
+ Preventing sybil attacks
+ preventing random number manipulation
+ secure proof of storage
+ transaction malleability
+ scalability

It's going to be much easier for me to discuss Sia with you if we focus on one problem at a time. I think the one you are most worried about is Sybil attacks, so lets focus on that until you are convinced that Sia is secure against them. (or until I am convinced that I've missed something important). Or if you want to focus on something else specifically (like total bandwidth consumption), we can do that. But just 1 problem at a time so the discussion can be more focused.

Topic: Sia - Siafund Redemption Deadline: June 1st, 2015 - page 10. (Read 68780 times)