My simulation says otherwise. Simulating an 8TB attack on a 8PB storage (1%), I could get a quorum majority using a 1% attack with only 43.7 trials per participant. For a 0.1% attack, I need 480 trials per resource.
To avoid this attack, you're going to have to make it so expensive to join that nobody will be able to afford it ;-)
This is not a normal binomial trial, but a process where the attacker chooses *where to withdraw* a participant. You only control the random placement.
Besides, the whitepaper states that the fee will be returned for well-behaving participants, so the sybil attack is essentially free. You might want to update that.
8TB is only .1% of 8PB. I'm not sure if your simulation did 1%, or 0.1%, but I'll look a lot closer at this. And for the record, a quorum is safe all the way up to 80% dishonesty. We do suffer the weakness where an attacker can choose who to withdraw. I'm not sure if you've included this in your simulation, but when you withdraw turbulence in introduced in the quorum you withdraw from to fight quorum manipulation. Withdrawing (still on the fence about the exact number) a host resulted in (1 or 2? I forget) additional participants being replaced at random. This feature was meant to protect the quorum that was being withdrawn from, but it will also protect on a much lesser scale the distribution of hosts in the network as a whole. The down payment cost is probably going to be in the ballpark of 1 week of earnings. The fine for leaving is going to be the cost of moving everyone around in your absence, and can't be avoided. The cost is reduced if you limit the amount of recovery bandwidth that is used (IE you upload your whole XGB to the next guy, instead of forcing them to do file repair).
43 trials per participant will cost something like 10 months of storage to pull off. At 1% of the network it's not terribly prohibitive but it's not trivial either. But keep in mind again that you only shot for quorum majority, which will not break Sia. Sia is designed to function at high probability even when 80% of the quorum is compromised. This works because of how quorums communicate to each other (and I'm actually re-writing this process right now, because this is what we're going to be implementing after the alpha release). When a quorum creates a new block, it talks to its siblings by having each host ping (3 or 5, haven't decided) random hosts in the sibling quorums. It's a low number to keep message volume down, but even at 3 messages per block per sibling, a quorum with only 20% honesty is going to have 60 chances of informing the other quorum. Assuming you've been manipulating around this quorum specifically, each of those 60 chances is going to have ~50% likelihood of reaching someone honest, giving a very comfortable margin. Even if the other quorum is also highly manipulated, (4/5)^60 still yields a favourable probability of each quorum learning about every block the other quorum creates.
And further, once a group of dishonest hosts have infected a quorum, they've only got 1 chance to pull off a dishonesty attack. As soon as it's made known that two conflicting blocks have been announced by the opposing quorum, the losers will be ejected from the quorum. If the block with the majority is fully honest (which can happen), the 20% honest hosts will be kicked from the quroum, but they will take an additional (20% or 40%) of the dishonest hosts along as a part of the quorum-protection. So the dishonest quorum will lose it's advantage. If the dishonest block release actually breaks rules (which is what you'd expect, why would you produce a conflicting honest block? - you don't gain anything if you're still following the rules of the network), then 100% of the dishonest hosts will be purged from the network. In this case, the turbulence doesn't need to trigger because it's clear that the hosts being kicked are dishonest, instead of being unsure. The 20% honest hosts can stick around then.
This still has problems that I'm aware of. Because the network redundancy for files is only about 2.5, a dishonest group achieving ~66% control of a quorum could potentially corrupt files. Additionally, depending on how the turbulence fees are set up, you could use the auto-turbulence to 'drain' quorums where you have the majority, effectively giving you an extra N rerolls on getting dishonest hosts into your target quorum, without having to pay for them.
My goal would be to make it prohibitively expensive for someone with less than 33% of the network to reach greater than 66% control of any quorum, and prohibitively expensive for someone with 49% control of the network to reach greater than 80% control of any quorum. We're busy this week, but next week I'll take some time out and research this better. I imagine someone's got a better way to manage random distributions than what we are using.
I think I would define 'prohibitively expensive' as 1 network-year of cost. So if the network is 8PB, 'prohibitively expensive' would mean spending the same amount it would cost to rent all 8PB for 1 year. But it's also worth looking at incentive. The 33% attack is only to be able to corrupt files, and typically if you are doing this you have a specific set of files that you want to corrupt. It's much harder to target a specific quorum than it is to target whichever quorum you are getting luckiest with. But additionally at 33% of the network you've got a huge profit incentive, and you're burning profits trying to attack the network.
I'll come back in a few weeks with a more mathematically oriented exploration of attacks regarding the random distribution process. I do realize that in it's current conception, it's not bullet proof.
I don't see a description of how the quorum works for the tree. What is described is a gossip protocol, but how will the tree reach *global* consensus on, say, the random generator?
I'm still thinking about the exact best approach. One solution I had that was functional was that each 'meta-quorum' has a nominated 'leaf-quorum' which performs the consensus and picks the exact values. Whatever block this 'leaf-quorum' creates is propagated and is used by the whole network to guarantee consensus. I think though there might be a way to achieve global consensus without nominating an explicit quorum. It's worth noting that global consensus is really only required for figuring out the macro-operations, like what the cost of storage is, and how many coins moved from meta-quorum A to meta-quorum B. If you are working in a binary tree, that means you really only need to be in consensus with your single sibling, and then you need a way to announce to your children what the consensus was. Your children don't have to all receive the block at the same time, they just have to know that "meta-quorum level 3, block #5100 was _this_". If one sibling gets it a block or two earlier than the other, it's not going to disrupt global consensus.
There is a 'reassure' protocol that will probably exist, which operates a lot like a 'trust but verify'. When you receive a meta block, you send random messages (and also receive random message) that verify everyone got the same block. This also helps to speed block propagation. If one quorum receives a different block from the rest, it's going to have 3 communications per honest host to figure that out.
====
For section 10, I'm not exaclty sure what problem you are discussing, but if an attacker creates a dishonest block, then an honest host inside of the attackers quorum will be able to prove that the block is dishonest (by showing a snapshot, and the previous honest blocks). An honest host can prove that a particular block is dishonest once a dishonest block has been created.
How? Can you spell out how the honest host will prove this? I'm saying that an attacker that controls the quorum can redefine what users store so that any proof of storage can be trivially calculated. For example the attacker simply defines that the users all store a long string of 0s. Calculating any merkle-tree challenge is then trivial.
"Controlling the quorum" is a very non-trivial task. That means you can create dishonest blocks without having the honest blocks propagate.
Lets walk through the steps:
1. A quorum creates a dishonest block, which either doesn't do proof of storage, or pays out money that the wallets didn't approve of, etc.
2. This quorum has honest hosts inside of it. These honest hosts will create an honest block, most likely without any signatures from the dishonest hosts. So there's this really empty, but honest block that's floating around.
3. The honest hosts contact their sibling quorum with 3 messages each, informing the sibling quorum of the block.
4. The sibling quorum bounces back a dishonest block.
There's now a dispute in progress. The sibling quorum will only accept a block if it can see the finer details of how the block was created (usually just hashes and macro-data are passed around, but now we have to demonstrate where the hashes came from).
5. The honest host provides the full block information. If the dishonest hosts cannot provide full information, they lose and are kicked. If they provide full, but dishonest, information:
6. The sibling quorum sends back the full block that the dishonest hosts received.
Important things to note:
each block has both a hash of the parent block, and a hash of the parent 'state'. The parent state is something that's saved infrequently into a snapshot, which can be used to rebuild the state and get the same original hash. Each block also contains the state hash after it was created. This means that the dishonest block _must_ align itself with the previous state, or it'll just be ignored as early or late.
7. The honest host uses its snapshot to build up the 'state' that the quorum was in prior to the dishonest block, and it sends the state to the sibling quorum. The sibling quorum can now use the previous state (which it can verify by hashing) plus the transactions in the dishonest block to prove to itself that the block is dishonest.
This process is bandwidth intense, depending on the size of the state, but the bandwidth required can be reduced by using merkle trees again. The honest host only needs to isolate the parts of the dishonest block that are dishonest.
Just so I understand, you're doing merkle-tree proofs on the reed solomon parts?
Right. So each host has some encoded-and-redundant piece that they need to maintain, and the merkle-root of that piece is known by everyone. A random 32byte piece within the slice they are holding is chosen, and they have to provide the information needed to prove on that piece.
Let me guess... if you still prove 64k blocks, then 700/16 = 44 levels of you have a 60bit addressable storage system (might be a bit excessive?). If you have something like that, I don't see how that affect my comment. I'm saying that if you prove 64k out of 8GB, then fetching that 64k must be 125.000 times as expensive as the reward for storing those 64k for the same amount of time.
Let's say a participant can choose to fetch the 64k from some other participant for cost 125.000, or store 64k*125.000 = 8GB for cost 125.000.
Let's assume a user is accessing random blocks, and churning through 8GB of storage once a month. With a block frequencey of 1/day, there is a 3% chance that the user will fetch the 64k block that is being proven, and where accessing the block must cost > the cost of storing 8GB for the block interval. So for a block interval of 1/day, the increased cost for this user is 3%/month.
(To offset this, I will do my own 31:30 reed salomon on top of your system so I can avoid touching the "poisoned block" during that day and avoid the cost)
But this assumes that you're only proving 64kB per day. That's a pretty low figure.
There's a way to simplify this further. Each proof covers 1 day of storage. So if you're downloading something, you have to pay for XGB * 1 day cost of storage, in case you end up downloading whatever is being proved in the current or next block. You have to pay this fee (initially), because the network can't be sure what storage will be targeted in the next block, but an attacker who is attempting to manipulate consensus can. At 64GB, and $5/TB/mo, this amounts to around 0.5 cents. At 1TB this is closer to 16 cents. Still manageable. If you are unlucky, you lose the 16 cents. If you are not unlucky, you get it back. Assuming that a user fetches a whole sector at a time (and that the sector is always the max size of 2MB), a user has a 2MB/1TB chance of losing that 16 cents, or 2*10^(-6) chance of losing the 16 cents. Not a big deal. The cost of downloading 2MB * 51 hosts (assuming the standard network redundancy of 2.5) is probably going to be higher than your expected payout of 0.000032 cents for getting unlucky.
Sorry, *changing*. Letting participants leave or join a quorum outside of the quorum protocol itself typically breaks the protocol.
I don't think it would break our protocol. It's designed to cope with arbitrary (and malicious) failures of nodes. Nodes can only join the quorum if they go through the random process, but nodes can leave the quorum whenever without breaking consensus. You don't need a minimum number of signatures to create a block. The algorithm insures that all remaining honest hosts will arrive at the same block during consensus, regardless of how many nodes have left (had failures) in the middle. Nodes who are joining can't jump in on the current block, they have to wait until they are confirmed by the network, and then they join at a designated block.