Pages:
Author

Topic: CoinJoin: Bitcoin privacy for the real world - page 12. (Read 294672 times)

member
Activity: 114
Merit: 12

2/ An attacker has 2 objectives:
      - deanonymization of entities
      - determination of the links between input and output of transactions

These 2 objectives are not orthogonal. They're mutually reinforcing. Every information gained for one can be used for the other.


I think defining it as a submodular function of those two concerns is a great starting point. Not a lot to be done on #1, at least on-blockchain, is there?

If we are just comparing between what we have now and what we could have, I think it's vastly superior. For now, at least in the US, the feds snoop on all our credit card purchases, without a warrant. Probably enough evidence for a warrant, but not for conviction of any particular crime.

Getting beyond that with reasonably not too much effort will be the real challenge.

p.s. Anyone know of any good annotated datasets of transactions? Unsupervised learning is notoriously  Huh to interpret.
sr. member
Activity: 384
Merit: 258
The "entropy" will depend upon the model of attacker. Start by enumerating those.

My attempt to collect a few random thoughts on the subject (long post).

1/ First studies in this field have modeled the blockchain as transactions graph and/or addresses graph. In a generic way, the enlarged blockchain ecosystem can be modeled as a graph in which nodes are:
- Txo : associated to a given amount and controlled by a given script
- Entity (human / organizational / machine) : controls txos by controlling associated scripts (alone or with others entities)
- Tx : acts like "micro-mixers" of txos (amounts of input txos are mixed/splitted and forwarded to output txos)
Note: the description is purposefully simplified, but it should be enough for the discussion.

2/ An attacker has 2 objectives:
      - deanonymization of entities
      - determination of the links between input and output of transactions

These 2 objectives are not orthogonal. They're mutually reinforcing. Every information gained for one can be used for the other.


Deanonymization of entities

3/ It often starts with side-channel attacks:
- information gathering (bitcoin addresses, id, emails, ...) from various sources (forums, social networks, db managed by exchanges, merchants, ...). These information allow to associate deanonymized entities to a subset of the txos. Solutions like Stealth addresses help to address this issue.
- network eavesdropping (like the one described here). It's a 2-steps process (at least):
      a - association of an ip address to a tx
      b - association of an entity (person, ...) to the ip address
      c - for more complex scenarii (mixed or coinjoin txs) ip address has to be associated to a subset of the input txos of the tx.

4/ As stated by gmaxwell, starting from the txos associated to deanonymized entities, the attackers want to follow the deanonymized funds forwards or backwards and expand their knowledge recursively. It leads us to the second objective.


Determination of links between inputs and outputs of transactions

5/ I think the problem can (should ?) be addressed in a probabilistic way. An attacker doesn't need 100% certainty before deciding of an action. She just needs to be above a given threshold of confidence. If a bunch of the analysis can be automated, the attacker can study several alternative hypotheses and decide which one seems the best.

6/ Taint analysis is the first tool usable for this kind of analysis. The result has 100% certainty (it's just a basic "read" of the blockchain) but produces limited insights.

7/ You can use some heuristics to enrich information provided by taint analysis. First studies in the domain used very simple heuristics like multi-input transactions and shadow addresses (see the paper "Evaluating User Privacy in Bitcoin") but gave quite good results. This privacy issue was addressed by avoiding address reuse and some new proposal like mixers or coinjoin txs. You can also use some "best-guess" heuristics like the one used by blockchain.info to determine which output is payment and which one is change.

8/ The more you know about entities, the more you can use sophisticated heuristics. For example, an attacker can use a specific knowledge (2 persons live in the same city and are occasional users of localbitcoin) in order to infer that the input of a coinjoin tx is linked to a specific output (if she already knows which input/output is controlled by these 2 persons). The attacker has not 100% certainty about this inference but it could be a reasonable hypothesis.

9/ Coinjoin has been proposed as an additional solution to strengthen privacy. As already stated by gmaxwell, it does not provide 100% anonymity but can provide better privacy. With proper design it helps to increase the cost to retrieve a given quantity of information (links between input and output). But as stated above, coinjoin remains attackable if you have side-channel informations.

Here's a "dumb" example. The attacker is an intel. agency with access to a huge amount of side-channel infos. As an analyst of this agency, I investigate on a man (let's call him Charlie) suspected to finance a terrorist attack by repeated small bitcoin transactions. Our hypothesis is that funds are received by another person (Mr. X) suspected to sell the coins on localbitcoin to gather dollars which will be used to buy some materials for the attack (I told you, it's a dumb scenario). Today, I want to analyze a given coinjoin transaction because I know that Charlie controls one of its inputs. Let's say that this coinjoin tx has 3 inputs and 3 outputs, all with same amount. The agency has a program of massive surveillance which tells me that :
- input A is controlled by a woman suffering a cancer (information retrieved from her facebook account)
- input B is controlled by Charlie
- input C is controlled by a teenager (boy - information retrieved from snapchat)
- output D is controlled by a small e-commerce website running on tor and selling weed
- output E is controlled by a porn website
- output F is controlled by an unknown entity, for now.

According to additional information that I can access, I will be able to build different hypotheses with more or less confidence:
Hyp1) The woman has bought some weed to cure her pain and the teenager has watched some porn. Charlie may have send some coins to Mr. X and I should investigate deeper the output F
Hyp2) If I know that Charlie smokes weed may be this tx is just a false positive for my investigation => Charlie has bought some weed. The woman has nothing against a porn movie from time to time and the teenager...has bought a video game.
Hyp3) [use your own fantasy here]
...

I think you get the idea. This thought experiment illustrates a few interesting facts:
- to be effective, this kind of attack requires side-channels informations. The more information you have, the more effective you are.
- let's forget the "paranoïd" scenario that intel. agencies want to do massive surveillance just because they can do it or because they pursue some nasty goals. This experiment shows that to do their job (investigating potential serious threats) intel. agencies "have to" break privacy of all users (I don't argue it's good or bad, I just draw a logical conclusion. So please, do not yell at me)
- to strengthen his privacy Charlie could chain several coinjoin txs but this solution has a major drawback: if coinjoin txs are rare among "normal" users, chaining several txs becomes a real red flag telling "Hey ! I prepare (do) something illegal"
- as proposed by others persons, systematic coinjoin txs for all users would produce a combinatorial explosion. It's not perfect anonymity but it would raise very significantly the cost of this kind of attack. 


Automated analysis (machine learning algorithms, ...)

10/ To my knowledge, few studies have used this kind of tool until now but they could be very effective (see the paper "Unsupervised Approaches to Detecting Anomalous Behavior in the Bitcoin Transaction Network"). By trying to detect repeated patterns in the blockchain, this approach helps to infer additional information. The rationale is simple : some activities are associated to very specific patterns. For example, it's likely that transactions corresponding to mining pools paying the miners or gambling sites paying the players have very specific characteristics.

Another example is provided by the MTGox's source code which was leaked a few month ago. It was quickly identified that there was an automatic process in place to split/merge amounts in their hot wallet. This knowledge was later used to detect that some funds were transfered according to this pattern (a few days before MTGox officially states that some funds had been retrieved). In this case, the pattern has been provided by a leak but you get the idea on how this kind of information can be used to reach the 2 objectives of the attacker.

11/ Patterns corresponding to automated (and non-random) processes should be the easiest to detect but it does not seem impossible that human operations could also be processed by studying temporal or behavorial patterns. For example, if you live in Europe it's likely that you send your txs at very different GMT hours compared to american or asian users.


Conclusion

12/ The current model provided by bitcoin (or by bitcoin with occasional coinjoin txs) is ultimately very close to the current situation of the interweb. Thus, it's quite straightforward to deduce the different levels of attacks and attackers:
- Group A : Intel. agencies
They have access to a massive amount of side-channel information already gathered from various sources. They will get the best results. Since not all side-channels information are not obtained by legal (official) means, these information don't always allow an official (legal) reaction.
- Group B : Large corporations
They have access to some amounts of side-channel information by providing paying services requiring the user to provide some personal informations. They'll be able to extract additional information by merging these data with data from the blockchain but at a lower level than group A. They'll an incentive to monetize these information by selling them to others entities (marketing, ads, ...).
- Group C : Basically the rest of the world
We have occasional access to some side-channel information by our direct interaction with others bitcoiners. Individually, it'll be difficult to extract significant information by merging these data with data from the blockchain.

13/ A bitcoin bank located in an exotic island and providing offchain transactions could remain the best solution for those wanting to hide illegal activities like money laundering.

14/ In this matter, I would say that the current bitcoin model does not change a lot of things compared to the current statu quo (even if bitcoin remains a great technology and a great innovation)

Disclosure : english is not my mother tongue Roll Eyes
hero member
Activity: 658
Merit: 500
The Buck Stops Here.
I think gmaxwell should clarify that the bitcoin coinjoin model is centralised whereas Darkcoin has decentralised coinjoin.
I think you should keep your garbage pump and dump crap out of this thread, put it someplace people won't annoy me by reporting it.

The things I described above in this thread can be implemented in a decentralized manner, as is described in some depth in post five. What darkcoin does doesn't sound decentralized at all— it depends on selected servers— but whos to say? Last I checked software was both closed source and not even working. When darkcoin was announced it claimed what it was implementing, however, was coinjoin.

Quote
looking like they are stalling
Bitcoin is openly developed software, anyone who wants to work on it can contribute to it, and last I checked none of the people who have ever worked on it are your payroll. If you're honestly concerned about privacy in Bitcoin you could do some things to help improve it. Pumping some sketchy altcoin in the wrong sub-forum, however, is not going to help, nor is attacking people who have no responsibility to serve your interests.

Sorry Greg.
staff
Activity: 4284
Merit: 8808
I think gmaxwell should clarify that the bitcoin coinjoin model is centralised whereas Darkcoin has decentralised coinjoin.
I think you should keep your garbage pump and dump crap out of this thread, put it someplace people won't annoy me by reporting it.

The things I described above in this thread can be implemented in a decentralized manner, as is described in some depth in post five. What darkcoin does doesn't sound decentralized at all— it depends on selected servers— but whos to say? Last I checked software was both closed source and not even working. When darkcoin was announced it claimed what it was implementing, however, was coinjoin.

Quote
looking like they are stalling
Bitcoin is openly developed software, anyone who wants to work on it can contribute to it, and last I checked none of the people who have ever worked on it are your payroll. If you're honestly concerned about privacy in Bitcoin you could do some things to help improve it. Pumping some sketchy altcoin in the wrong sub-forum, however, is not going to help, nor is attacking people who have no responsibility to serve your interests.

For some context for those confused about where this little OT tangent came from, someone wrote a fairly scathing analysis of DarkCoin, basically making a case that it's a substanceless effort promoted by misleading marketing. Unfortunately, in making their argument they linked back here... drawing along some vigilant defenders. I'll continue deleting any more darkcoin posts that show up.
sr. member
Activity: 448
Merit: 250
I think gmaxwell should clarify that the bitcoin coinjoin model is centralised whereas Darkcoin has decentralised coinjoin. So, I can't see that it is pointless. His support for ring signatures is academic in the sense of "nice in theory but frankly not workable in real life". Those that have experience ring sigs know how buggy and alpha the software is.

Gmaxwell and his bitcoin devs should also realise that the IRS has already mapped out all significant bitcoin addresses to social security numbers, whilst they debate the alpha tech of ring sigs but yet are doing nothing to fix the privacy issue. The bitcoin dev team are looking like they are stalling on privacy. It's time they do something about it instead of opining this and that.
legendary
Activity: 1400
Merit: 1013
Andytoshi and I spent some time trying to formalize a notion of "coinjoin entropy"— e.g. how many possible mappings of inputs to outputs are possible given the values. A result of that was that discussion was the realization that if you allow for the possibility that coinjoin participants might also be paying each other then basically all coinjoin's have perfect entropy because there is some payment matrix that permits any of the output parties to be any of the input parties.

We didn't actually solve the entropy question for the non-concurrent payment case, it's an interesting question.

I'm ready to accept that CoinJoin can be viable with outputs of different sizes.

Next question: what about information leaked via script types?

I think we can assume that in the near future there will be several types of scripts in common use: P2PKH, multisig, and stealth. We can also assume that the frequency with which these scripts types are used will vary between different classes of users (merchants vs customers).

Even if the amounts associated with inputs and outputs don't make the join trivially reversible, how do you stop the scripts from acting as a side channel that can deliver enough information to make the join reversible?
staff
Activity: 4284
Merit: 8808
The "entropy" will depend upon the model of attacker. Start by enumerating those.
The attacker knows everything in the blockchain. The attacker knows the identity of the payer or payee of some small number of transactions. The attacker wants to follow these identified funds forwards or backwards and expand their knoweldge recursively. The CJ users want the attackers analysis to fail, for themselves (most importantly) and for third parties.

I think of two main attack objectives— where the attacker is trying to identify a single user and where success/failure depends on how persuasive the evidence the attacker can extract for that single user.  And one where the attacker is trying to broadly deanonymize everyone in order to feed larger scale analysis. For this latter attack the defender's is successful if they're able to increase the noise level of the analysis by a non-trivial amount at low cost to themselves, e.g. success in this latter cases is completely continuous.

I outlined some more specific attack objectives in the original post— things like people you do business with being able to determine your income, net worth, supplies costs, or prices.
legendary
Activity: 1120
Merit: 1164
Academic discussions aside, in any case what has been actually implemented in Dark Wallet is that you have two classes of users: people who want to send money now, and people who have coins that they want to mix. The latter don't have any particular requirements on the exact amounts they want mixed, so they copy the amounts the former are sending, guaranteeing that at least two outputs are identical and have two possible senders. Blockchain.info's CoinJoin implementation is similar, with blockchain.info operating maintaining a pool of funds that is used to copy exact output amounts.

Future implementations can and will improve on these concepts, but again, what's implemented now takes output indistinguishably into account and provides reasonably good privacy already.
member
Activity: 114
Merit: 12
Andytoshi and I spent some time trying to formalize a notion of "coinjoin entropy"— e.g. how many possible mappings of inputs to outputs are possible given the values. A result of that was that discussion was the realization that if you allow for the possibility that coinjoin participants might also be paying each other then basically all coinjoin's have perfect entropy because there is some payment matrix that permits any of the output parties to be any of the input parties.

We didn't actually solve the entropy question for the non-concurrent payment case, it's an interesting question.

The "entropy" will depend upon the model of attacker. Start by enumerating those.

legendary
Activity: 1400
Merit: 1013
since there is no such thing as "a Bitcoin", it is impossible to make a deterministic linkage between inputs and outputs.
That is a true, but useless, statement.

As I mentioned before, mass surveillance doesn't require 100% accuracy.
sr. member
Activity: 469
Merit: 253
Andytoshi and I spent some time trying to formalize a notion of "coinjoin entropy"— e.g. how many possible mappings of inputs to outputs are possible given the values. A result of that was that discussion was the realization that if you allow for the possibility that coinjoin participants might also be paying each other then basically all coinjoin's have perfect entropy because there is some payment matrix that permits any of the output parties to be any of the input parties.

Sound logic, but it's not even necessary - since there is no such thing as "a Bitcoin", it is impossible to make a deterministic linkage between inputs and outputs. If the system was designed to actually transfer distinguishable units ('things') from one account to another, then the many-many mapping of txs wouldn't actually make sense (because somewhere in the guts of the code it would be deciding which bill with which serial number goes to which output, which would mean that de facto it was a set of one-one txs in disguise).
sr. member
Activity: 476
Merit: 251
COINECT
Incorrect.  People have speculated that potentially an attacker powered by non-public mathematical breakthroughs could select parameters in a way to make a system weaker against publicly-unknown attacks which require specific improbable curve characteristics. This is pure conjecture, however— though it's something prudent to be cautious about, it is not a known backdoor vector by itself. The process for selecting curve parameters already excludes known classes of bad curves, if we knew about any other classes we'd exclude those too.  In the case of Bitcoin our parameters were selected in a way where performance considerations removed their degrees of freedom (like the ed25519 parameters were selected), and are all explicable from first principles. In fact, they have an additional property that even if you drop some of the performance characteristics, and increment from the smallest possible parameter requirement the first curve of prime order you find is the one Bitcoin uses. Other cryptosystems also use nothing up my sleeve numbers, where an abundance of caution demands the designers pick them in a way that limit their degrees of freedom but at the same time no one knows of a way where control could be used secretly to do something bad— it's just a good practice, not a backdoor closed.

In the case of the GGPR'12 based SNARKs there is no comparable way to generate the parameters: The creation of the prover/verification keys requires computation using secret values, which— if known— completely compromise the soundness of the proofs. Here the backdoor is very concrete— not theoretical— when you know just a couple of the secrets you can do a few multiplies and have a false proof. Worse, there is no known way to use a nothing up my sleeve number to pick the parameters to convince people that no one could know the secrets. The best you can do is use process, like the CA system does (but potentially way better) to convince people of security.  This isn't insurmountable for _many_ applications, but it is not at all comparable to EC curve parameters, there is nothing in curve parameter selection that looks like a magic number where if the attacker knows it all is lost. As far as I know there nothing like this in widespread use, the nearest parallel I can think of is the backdoor in DUAL_EC DRBG, though in that case the backdoor was a "surprise" and no process to prove that the potential backdoor wasn't weaponized (because it was)— which certantly makes it more concerning. There are a fair number of proposed _theoretical_ cryptosystems which have similar assumptions (e.g. Any of the neat uses of obfuscation involve the obfuscation being established by a trusted party), but I'm not aware of these systems being put into production. One reason for this may be because theoreticians find trusted initialization to be more acceptable than practitioners do— the theoreticians just posit "A spherically honest cow in random-oracle derived motion faithfully creates the parameters", the practitioners are the ones that have to figure out how to approximate the spherical cow using three chickens, a reed-solomon code, and a priest.

Thank you for correcting my misleading statement. There are definitely are significant trust differences between the two processes. Your explanation will come in handy as more discussions about Zerocash surface.

Quote from: gmaxwell
I don't think there was anything hostile there, I've spent time with the developers of it and I think they're great guys, ... and I don't know any other developers who have been hostile about it either, so I'd really like to know what you're talking about.

I don't think that you in particular are hostile to anonymity nor am I suggesting that any of the Bitcoin developers have personal issues with any of the Zerocash developers. But I also know that there are many agendas at play in the Bitcoin world and not all of them include the type of anonymity that Zerocash provides.

I still think that there will have to be a public conversation about Zerocash, particularly after its developers have revealed how they plan on instilling confidence in their parameters, but I will save it for another thread.
sr. member
Activity: 384
Merit: 258
well, you could plan to sell a closed product to some agencies like Fincen, banks... but it sounds like a small market
I'm sure there's plenty of money available to fund that.

The question is whether or not you can get funding to produce the tools that let the general public protect themselves from it.

Agree with that.

Unfortunately, I fear there's an asymmetry which isn't in favor of this kind of project for now.

Projects are funded by entities which have some interest in the project and projects "tracking" users are more appealing to corporations and VC since there's a big data trend in the corporate world with an expected financial ROI. I guess it's not a political or ideological choice. They just do what they're supposed to do: business.

On the other hand, privacy friendly projects have not a clear financial ROI. For now.
Funding this kind of project is more a "militant" choice than a financial one. For now.
Moreover, Bitcoin is still presented as an anonymous and shadowy financial system by a majority of mainstream media, which is far from the reality but does not help people to understand the challenges at stake. I fear it's not specific to Bitcoin and that very few people are really aware of the challenges posed by technologies like internet or cryptocurrencies in term of privacy.

Dark activities have nothing to do with that (sorry eric schmidt). We need to think a new model of society, interconnected, in which a resource (data) has gained a massive increased value without any market mechanism fixing this value. For now.

Sorry for this "philosophical" off-topic (barely checked with google translate - a "free" product provided by Google Inc. -)
legendary
Activity: 1400
Merit: 1013
A result of that was that discussion was the realization that if you allow for the possibility that coinjoin participants might also be paying each other then basically all coinjoin's have perfect entropy because there is some payment matrix that permits any of the output parties to be any of the input parties.
So the Payment Protocol should be CoinJoin.
legendary
Activity: 1400
Merit: 1013
well, you could plan to sell a closed product to some agencies like Fincen, banks... but it sounds like a small market
I'm sure there's plenty of money available to fund that.

The question is whether or not you can get funding to produce the tools that let the general public protect themselves from it.
sr. member
Activity: 384
Merit: 258
The problem is that those tools are all being developed behind closed doors, which gives the attackers an advantage. Too many people think a low taint score displayed on blockchain.info provides them with meaningful protection.

I'll predict though that for all this investment money flowing into Bitcoin startups, not one VC is going to fund investment into open source graph analysis tools that could help individuals measure their privacy and warn them before they do something that will compromise it.
100% agree.
I'm interested in this subject and I thought about the opportunity to build such a tool for the community since data and technology are available.

But I came up with the following conclusions :
- it seems very difficult to fund a company like this one (well, you could plan to sell a closed product to some agencies like Fincen, banks... but it sounds like a small market)
- human analysts are still required and I'm not sure that many people in the community are ready to invest a lot of time as unpaid volunteers. So it seems difficult to build a sustainable community around an open platform like this one.

BTW, I would be glad to be proven wrong and be funded to build a tool like this one.  Wink


staff
Activity: 4284
Merit: 8808
Andytoshi and I spent some time trying to formalize a notion of "coinjoin entropy"— e.g. how many possible mappings of inputs to outputs are possible given the values. A result of that was that discussion was the realization that if you allow for the possibility that coinjoin participants might also be paying each other then basically all coinjoin's have perfect entropy because there is some payment matrix that permits any of the output parties to be any of the input parties.

We didn't actually solve the entropy question for the non-concurrent payment case, it's an interesting question.
legendary
Activity: 1400
Merit: 1013
Moreover, till now, main heuristics used for analysis of the transaction graph are "quite simple" (tainting, "address reuse" and "change address" patterns) and I guess it's just the first generation (at least which is publicized). I'm quite sure that more elaborated tools (temporal patterns, behavorial patterns) combined with side-channels attacks (data gathered from peripheral systems like exchanges, merchants, ...) could provide stunning results.
The problem is that those tools are all being developed behind closed doors, which gives the attackers an advantage. Too many people think a low taint score displayed on blockchain.info provides them with meaningful protection.

I'll predict though that for all this investment money flowing into Bitcoin startups, not one VC is going to fund investment into open source graph analysis tools that could help individuals measure their privacy and warn them before they do something that will compromise it.
sr. member
Activity: 384
Merit: 258
Follow-up of discussion from this thread

A careless join is one in which the correct solution is obviously the most plausible. Note that evaluating different solutions to the matrix lends itself well to parallel processing, there is no time limit since everything is permanently recorded in the blockchain, and mass surveillance doesn't require 100% accuracy.

Probably the only way to make CoinJoin useful in real-world situations is to build into the clients the exact same type of analysis tools attackers will use to reverse the joins so that clients can evaluate any proposed join prior to agreeing to it. (That's a good subject for the other thread).

I like your remark about time limit and 100% accuracy. It remembers me an interview of a computer scientist working for an US intel agency. He was explaining how they were building a different paradigm for search: sometimes you ask a question and you don't get an answer because it's just too early. If you don't ask again later, you'll never know the answer. But if you let the question pending, till the "right" data is gathered and aggregated, then you start to get something very efficient.

You can apply the same principle to transaction graph analysis. Even if you're very carefull about the privacy of your transactions, one-time coinjoin is not enough if others participants are less carefull and can be deanonymized later.
For now, systematic coinjoin (as proposed by DarkWallet) seems the most serious option to enhance privacy since it would create a combinatorial explosion and would significantly raise the cost of deanonymisation. But I guess it also comes with a lot of others issues to solve (delay required, constraints on amounts, ...).

Moreover, till now, main heuristics used for analysis of the transaction graph are "quite simple" (tainting, "address reuse" and "change address" patterns) and I guess it's just the first generation (at least which is publicized). I'm quite sure that more elaborated tools (temporal patterns, behavorial patterns) combined with side-channels attacks (data gathered from peripheral systems like exchanges, merchants, ...) could provide stunning results.
newbie
Activity: 13
Merit: 0
Interesting read. Perhaps someone can make test QT soon Smiley
Pages:
Jump to: