Based on these numbers (despite not factoring in sybil inputs), it seems clear that a high level of anonymity can be achieved by increasing the number of pooling stages to 10+, even if the attacker controls > 50% of nodes.
Depends. Because 50% means that your anonymity set is reduced by 50% on each round as I explained in my other post above.
Example. If you are mixed with 10 others on each round, then only 5 will be anonymous (and one of the five might be you), so that means have 50% + 20% (1 in 5) chance to be non-anonymous. So 70% per round. You will need more rounds or you need larger mix sizes.
This is actually not correct. A distinction needs to be made between the risk of being unmasked completely, and the reduction in the size of the set of anonymous entities in a pool.
As an illustration: say we have a ballot with only two voters. We would know with 50% certainty the identity behind each vote. It's a small anonymous set of identities, but the vote is still anonymous. The lack of certainty represents a break in the causal chain. This is important for various reasons, but doesn't diminish the importance of having a large pool of anonymous identities (likewise for various reasons). So for strong anonymity we need some level of certainty of not being unmasked completely AND a sufficiently large pool of anonymous users.
I posit that the distinction is meaningless as the outcomes are pushed out to the edges of the causality graph at economies-of-scale. Because at economies-of-scale, the adversary doesn't have perfect identities data, rather the NSA has statistically overlapping data sets (e.g. Tor breaks, browser fingerprints, etc), that when correlated generate identities. The NSA is not just targeting a few millionaires to know where all the wealth is being stored (so the G20 can confiscate it after 2016 as the world descends into a nightmare debt collapse), rather they are saving everything in Utah and targeting all the millionaires.
Anonymity is never an all-or-nothing proposition, rather is a degree of anonymity. That is why the distinction I made between privacy and anonymity upthread has blended and disappeared as we have discussed Darksend more. (that was your point too
)
Also you have to factor in the non-anonymous rate of Tor and those inputs who didn't use Tor at all are not anonymous. This reduces your anonymity set, even if you use Tor.
This is important and I don't think the ramifications of IP addresses unmasking anonymity have been adequately discussed here yet.
What would be required to unmask an otherwise anonymous darksend transaction if the IP addresses were available at each of the compromised nodes?
I surmise that you mean to say is if a Darksend does not pass through a compromised Masternode, then how can interception of IP address by a Tor node impact anonymity of a Darksend. Correct?
If so, then my analysis is that if you see the same IP address sending the input and signing the outputs, you still don't know which output that was, because the output signing is blinded cryptographically. But it depends on how the outputs are collected. If the outputs are first sent by each IP, then separate the collection signed, then output can be correlated to IP. But if the outputs are blinded signed as they are collected using ring signatures, then knowing the IP doesn't help the adversary.
So we need to ask Evan if he is using ring signatures?
However even if he is using ring signatures, there is another way that interception of IP can break anonymity.
When you spend the output of a Darksend, then your IP can correlate your identity to the same one as the input, and thus anonymity is broken.
So yes not obfuscating IP, breaks anonymity of the Darksend.
Also there is another way to break anonymity of the Darksend. If I merge two or more outputs of Darksends to form the inputs spent on a transaction, then I have correlated that those outputs share one identity (since they will look different than a Darksend mix transaction which has a constant amount and matching # of inputs and outputs).