Author

Topic: [ANN] Scalable Bitcoin Mixing on Unequal Inputs (Read 8001 times)

legendary
Activity: 1456
Merit: 1000
A viable trust still persist in the use of Zerocash so i suggest you check it out.

I don't really understand this post.

Are you saying that the initial release requires an element of trust at launch?

Or that its looking likely to be closed source for a long while, even after they have finished development?

Or that the project is funded by government / military money?

Anyway, back on the main OP topic. Nice paper and thinking. I'll be keeping up with your progress.
legendary
Activity: 1176
Merit: 1015
Also have a look at ring signatures if you're interested in anonymity.
sr. member
Activity: 434
Merit: 250
A viable trust still persist in the use of Zerocash so i suggest you check it out.
sr. member
Activity: 384
Merit: 258
The intent was that mixes have a one-to-one correspondence to simple cycles. So, examples 1a and 1b induce two mixes, while 1c three mixes.
After thinking over the threat model, I'm tending to agree more with laurentmt's point, and my answer to your question is this:
Combining node-distjoint cycles of the same amount is not only a viable variation, but also has the advantage of larger anonymity size for that mix. Supposing the mix amount is equal, if I were to weigh the pros of a large single trxn versus that of two with a significant time difference, I'd favor the larger trxn.
Anyway, I think BVM cycling/splitting phase remains highly valuable (even for coinjoin) considering your objective to come up with a scalable solution.

Indeed, the "problem" of coinjoin is that it's subjected to 2 antagonist constraints:
1 - more players involved in the tx (more txins & txouts) is desirable for better privacy.
2 - txs are subjected to technical constraints (max number of txins & txouts) and organizational constraints (it's more likely that the signature of a coinjoin tx will fail if there's a lot of participants -1 rogue participant is enough-). These points make less participants quite desirable.

BCM cycling phase (or a variant) may be an adequate solution to "locally" maximize entropy and cope with (2) at the same time.
Thus in the case of coinjoin, may be this phase can be tweaked to produces subgraphs instead of simple cycles. These subgraphs should be as close as possible of a given pattern (with defined # txins & txouts).

Note : I don't know the best target values for # txins & txouts but I'm quite confident that people from the community are able to define them.
Open question : Is it desirable to introduce specific behaviours in BCM depending on the underlying primitive ?


WRT (1), chained coinjoins seem a straightforward solution (and already implemented in Darkcoin Wink)
Imho, it shouldn't be managed by BCM but by wallets which call multiple iterations of BCM according to rules defined by the wallet.
I think it's the solution implemented by Darkcoin wallets allowing users to select the number of iterations.

Main reasons I see for this:
- I don't think anybody has come up with an objective criteria defining what is the right number of iterations / desirable entropy (surely depending on the threat model as you pointed out)
- I may be wrong but this "chained txs" schema seems to me quite specific to coinjoin. Not sure others primitives have the same requirement.


laurentmt, this will help in the entropy analysis somewhat. Being that the # of cycles of the same flow amount is also a random variable, the size of the combined cycle is still too a RV.
It makes me think that it would be nice to have some stats from the blockchain about distributions of this kind of things. Added into my todo list.

EDIT: Removed assertion about how the cycling phase works. Seems wrong.
newbie
Activity: 22
Merit: 29
Quote
[Do] you intend that all mixes of simple cycles are actually executed independently?

The intent was that mixes have a one-to-one correspondence to simple cycles. So, examples 1a and 1b induce two mixes, while 1c three mixes.

After thinking over the threat model, I'm tending to agree more with laurentmt's point, and my answer to your question is this:

Combining node-distjoint cycles of the same amount is not only a viable variation, but also has the advantage of larger anonymity size for that mix. Supposing the mix amount is equal, if I were to weigh the pros of a large single trxn versus that of two with a significant time difference, I'd favor the larger trxn.

Threat model: the adversary
(1) has access to the blockchain,
(2) can differentiate, with non-negligible advantage, normal transactions from those that are apart of a mix,
(3) can know the identity of a person in control of an input address of the mix. 

Temporal distance doesn't increase anonymity, because the adversary can still tell which trxns are part of a mix. The larger anonymity set is meaningful, because the adversary can choose to examine only the mix that contains the input address of the person he's investigating. Since that trxn is larger in the combined case, the person is better protected. In the separated case, the adversary just ignores the trxn that doesn't have the address as an input.

laurentmt, this will help in the entropy analysis somewhat. Being that the # of cycles of the same flow amount is also a random variable, the size of the combined cycle is still too a RV.

member
Activity: 111
Merit: 10
hi,
 Really interesting paper.
You state that the separate mixes of the simple cycles can be executed independently of each other.
 I’m not clear if you intend that all mixes of simple cycles are actually executed independently?
 Or do you intend to combine simple cycle mixes that have the same mix amount but completely different players? The mixes would still seem to be decomposed to allow independent execution.
 Are there any disadvantages to doing this?
legendary
Activity: 1358
Merit: 1002
chained coinjoin is being used in darkcoin, i dont know why me bringing this information to this thread is reason for deleting it, but when talking about a tech that is already used i see no reason for the deletion..

all im hinting is, there is a version doing exactly what your talking about, just bringing it to light that it is a workable solution.
newbie
Activity: 22
Merit: 29

LCG, I'm supportive of zerocoin/cash as well, of course. It's afterall the first and, for now, possibly the only demonstrably anonymous approach. Best regards on that exciting project.

I, like the other the devs who inspired this work, want to explore how to make bitcoin credibly anonymous using existing bitcoin capability.
legendary
Activity: 1148
Merit: 1014
In Satoshi I Trust
sr. member
Activity: 384
Merit: 258
Chained CoinJoins... now that's exciting! I hope this can open possibilities for chaining.
Yep. It's the idea of a n-stage switching network proposed by gmaxwell in his first post about Coinjoin. It aims to mix inputs for a high number of players (and keep entropy provided by number of participants) while coping with transaction size limits.

newbie
Activity: 22
Merit: 29
Quote
[Note: In my previous post, I suggested to chain coinjoin txs to increase entropy. It can help but it doesn't work for all cases (like a coinjoin tx with only 2 inputs and 2 outputs)]

Chained CoinJoins... now that's exciting! I hope this can open possibilities for chaining.
sr. member
Activity: 384
Merit: 258
Sure I see where you're going with that, laurentmt. And agreed that entropy is less for BCM/Join than for native Joins, # of players being equal.
Yup. I suck at poker ! Wink

That said, BCM is intended to make it easier to mix against larger, more diverse sets of players, and the number of mix participants is a random variable
100% agree about the value of BCM as a decentralized solution to address scalability and heterogeneity of inputs.

so one could argue that the entropy comparison should be a probabilistic calculation where the expected # of players for a Join in BCM is equal to the # of players in the native join. Some txns in BCM will have fewer, some will have more.
To be honest, I don't feel comfortable with a probabilistic calculation. I tend to think that all measures of anonymity should be done on specific instances of mix(es) and not on averaged or probabilistic mix(es). This is the only way for users to be sure of their level of anonymity.

Note that this remark also worths for my use of entropy which is a measure at tx level. For perfect coinjoins (all inputs with same value, all outputs with same value) this measure is ok. But for coinjoins with different input/output values, the degree of anonymity is not the same for all players and we should have a more local measure (at txout level). Work in progress...

Entropy analysis does cover the combinatorial dimension of semantic security, assuming the adversary knows all information known to any participant until the Join. Still, there are a few other dimension that I think should be taken into account.
100% agree

If we're willing to allow consideration for the common case (1) separately from the worst case (3), then with (1) BCM has something that native doesn't: the times in which the transactions occur can vary, so much so that other mix transactions can be interspersed in between transactions of this mix. There can be other non-mix transactions that look like mix transactions as well.
Definitely one of the strengths of BCM. Interspersed txs could be a solution to increase entropy of coinjoin mixes.
[Note: In my previous post, I suggested to chain coinjoin txs to increase entropy. It can help but it doesn't work for all cases (like a coinjoin tx with only 2 inputs and 2 outputs)]
newbie
Activity: 22
Merit: 29
Sure I see where you're going with that, laurentmt. And agreed that entropy is less for BCM/Join than for native Joins, # of players being equal. That said, BCM is intended to make it easier to mix against larger, more diverse sets of players, and the number of mix participants is a random variable, so one could argue that the entropy comparison should be a probabilistic calculation where the expected # of players for a Join in BCM is equal to the # of players in the native join. Some txns in BCM will have fewer, some will have more.

Entropy analysis does cover the combinatorial dimension of semantic security, assuming the adversary knows all information known to any participant until the Join. Still, there are a few other dimension that I think should be taken into account.

Consider three types of adversaries for BCM:

(1) Passive - has access to the blockchain record of the mix(es), may identify whether a general BC txn is part of a mix, and may know the identity of the person controlling an input address that's part of the overall mix.

(2) Active - Participated in BCM but was not put of the Join of interest (thus knows everything in ! plus the # of coins being mixed, the mix matrix, and # of participants in the Join)
 
(3) Active - Participated in both BCM and the Join (thus knowing everything in 2 plus the input addresses of all participants, and the output addresses)

For native Join, these are one and the same type of adversary.

If we're willing to allow consideration for the common case (1) separately from the worst case (3), then with (1) BCM has something that native doesn't: the times in which the transactions occur can vary, so much so that other mix transactions can be interspersed in between transactions of this mix. There can be other non-mix transactions that look like mix transactions as well.
sr. member
Activity: 384
Merit: 258
Hi Sundance,

Very interesting paper !
I'm quite impressed by the effort done to formalize the idea in a clear and understandable manner. I'll need to read the paper again to be sure that I get it right, but I've started to play with the model, to check how much the final result is sensitive to the underlying primitive.
Note: I didn't sleep a lot last night, thus my maths and my logic could be a bit random today. I apologize in advance if there's some mistakes in what follows.

I tend to think that the notion of entropy already used by gmaxwell and andytoshi is quite handy to quantify the quality of a given tx in term of privacy. So I've used that measure.

Notations

C(tx) = number of possible combinations to link the txins and txouts of a tx

H(tx) = Entropy of the tx. We consider that all combinations have same probability (attacker has no additional information) thus we have H(tx) = log C(tx)

({vin_1,...,vin_n}, {vout_1,...,vout_m}) = transaction with n txins and m txouts where vin_i is the value of the ith txin and vout_j is the value jth txout

Case studied : the first example from your paper (inputs = [1, 2, 3]) with Coinjoin used as underlying primitive.

Hypothesis : the values of the inputs are the amounts proposed by the players but could correspond to several txos per player (which helps to increase entropy).

Scenario 1: BCM + 2 coinjoins
tx1 = ( {2,2}, {2,2} )
tx2 = ( {1,1}, {1,1} )

C(tx1) = 2   
=> H(tx1) = 1 bit
C(tx2) = 2   
=> H(tx2) = 1 bit
H(tx1 + tx2) = 2 bits

Scenario 2 : BCM + 2 coinjoins
tx3 = ( {2,2}, {1,1,1,1} )
tx4 = ( {1,1}, {1,1} )

C(tx3) = 6 * 1 = 6   
=> H(tx3) = 2.585 bits
C(tx4) = 2
=> H(tx4) = 1 bit
H(tx3 + tx4) = 3.585 bits

Scenario 3 : Naive coinjoin => A unique coinjoin with same vins and vouts as Scenario 2
tx5 = ( {1,1,2,2}, {1,1,1,1,1,1} )
C(tx5) = 6 * 5 * (4! / (2! * 2!)) = 180      
=> H(tx5) = 7.491 bits

We have a ratio H(tx3 + tx4) / H(tx5) = 47.85%
Let's try another scenario and check if we can get better results.

Scenario 4 : BCM + 2 coinjoins
tx6 = ( {1,1,1,1}, {1,1,1,1} )
tx7 = ( {1,1}, {1,1} )

C(tx6) = 4 * 3 * 2 = 24
=> H(tx6) = 4.584 bits
C(tx7) = 2
=> H(tx7) = 1 bit
H(tx6 + tx7) = 5.584 bits

Scenario 5 : Naive coinjoin => A unique coinjoin with same vins and vouts as Scenario 4
tx8 = ( {1,1,1,1,1,1}, {1,1,1,1,1,1} )
C(tx8) = 6 * 5 * 4 * 3 * 2 = 720   
=> H(tx8) = 9.491 bits

We have a ratio H(tx6 + tx7) / H(tx8) = 58.83%

I think scenario 4 is the best we can get with 2 txs. An easy solution to increase entropy would be to chain another round of coinjoins txs after the first ones. I think it would be enough to get an entropy higher than the naive coinjoin.

Now, the interesting point would be to check how it compares in term of efficiency with TierNolan Atomic cross chain which would produce very different patterns in the blockchain.

Don't know if it makes sense. Sorry if it's not the case. Anyway, keep up the good work !


EDIT:
Previous computations of combinations/entropy just try to match 1 input with 1 or n outputs.
But I guess it would be more correct to also consider input aggregates as possible combinations.
This is required if you want to make sense of txs like ( {1,1}, {1.2, 0.8} ) => C(tx)=1 and H(tx)=0

Under this form, we should have something like:
Scenario 1
C(tx1) = 3 and H(tx1) = 1.585 bit
C(tx2) = 3 and H(tx2) = 1.585 bit
H(tx1 + tx2) = 3.17 bits

Scenario 2
C(tx3) = 7 and H(tx3) = 2.807 bits
C(tx4) = 3 and H(tx4) = 1.585 bit
H(tx3 + tx4) = 4.392 bits

Scenario 3
C(tx5) = 687 and => H(tx5) = 9.424 bits (if my computations are ok)

H(tx3 + tx4) / H(tx5) = 46.6%

I'm too lazy too compute the last scenarios by hand but that form seems to confirm the first results.
newbie
Activity: 22
Merit: 29

Greg, thanks! Yes, did find that recently and reference it in the paper. In fact I consider CoinShift an N-way extension of the same techniques.

My guess is that CoinSwap inspired the work on atomic cross-chain transactions (on which I based CoinShift)?

With CoinShift, I'm aiming for shifting ownership of bitcoins (atomic, trustless, decentralized) one step along the cycle found in BCM. (For 1<= i < N, player i transfers to player i+1. Player N transfers to player 1.)

I'll read the CoinSwap post again and explore how it lays out for N players.
newbie
Activity: 22
Merit: 29
Andy,

Excellent feedback, thank you for that!

You're on point in suggesting to make it explicit to prioritize non-self matches over self matches --- which can be done in the algorithm or by requiring that v_ij > 0 when i != j. The latter was my original intention.

Even with that, there is always the possibility of a self-match, which in my code I call a 'fault'. It turns out that, with the above change to ensure prioritization, there can be at most one fault. Otherwise, there would be two players who could have matched earlier (due to the above) but didn't, a contradiction.

A fault happens depending on the randomization and the conditioning of the instance, as you pointed out. Nothing to avoid, it's just a fact of life that one person may not be able to mix all of his coins in that instance.

Still, you reminded me of a topic I wanted to include in the paper on well- or ill-conditioned instances. Eg, if one player has input that is larger than the sum of all the others, then chances are that he will mix with a large number of the other players, and will certainly be the one with a self match for the remainder of his coins.

I want to avoid this or allow the players to avoid mixing in this circumstance, so another open question I will add is: can input conditions be defined that define an ill-conditioned instance? If so, then the approach will be that either each player checks for this condition during the process to decide whether to mix with a certain player or during peer discovery the condition is used to filter which players are grouped together.

On DoS:
That's exactly the line of thinking that's helpful, because I think the algorithm is such that DoS is an attacker's best option.

My thoughts are that particular attack can be mitigated by (1) adding a step where players confirm the inputs they received against each others, and (2) allowing the players to drop a peer if the result of that confirmation show that the peer sent different inputs to sufficiently high number of players. This seems similar to the blame idea from Tim's CoinShuffle paper.

What's clear is that honest players need a way to identify and boot bad players. That is a work in progress, and I been considering whether to use such a step for input confirmation for a while now. Need to be careful about opening more attack vectors by the inclusion of confirmation steps.
full member
Activity: 179
Merit: 151
-
Very cool stuff! I really appreciate the clear expositions and diagrams.

I'll have to think a bit about DoS potential (e.g. can you join a session, commit to different values to each peer, and have them all abort when they're unable to agree on what's going on?) but the mixing mechanism is really slick. Looking forward to your upcoming CoinSwap-like proposal.

Edit: One interesting observation is that if one party tries to mix N coins, while everyone else combined has only M, where M < N, then it appears that this party will wind up mixing N - M coins with himself. If M ≥ N then nobody mixes with themselves. Is it generally true that this algorithm always leads to the least required self-mixing? Maybe this is a straightforward consequence of condition (4)?

Edit2: Ah, not quite. If there are a lot of zeroes in the priority matrix it is possible that self-matches will be taken before non-self-matches. If you were to enforce that self-matches were always lowest in the priority list, then self-matches would only be taken as a "last resort" and (unless I'm confused) this would always lead to the least possible amount of self-matching.

So I propose adapting Algorithm 2 to always sort the (n, n) pairs last, regardless of priority.

Andrew
staff
Activity: 4284
Merit: 8808
Neat, I will be sure to read this.

The second paper, forthcoming, is on a new mixing primitive, CoinShift, based on TierNolan's atomic cross-chain solution.
Have you seen https://bitcointalksearch.org/topic/coinswap-transaction-graph-disjoint-trustless-trading-321228 ?
newbie
Activity: 22
Merit: 29

As part of a new open source implementation project, I've written a research paper presenting a new, complentary protocol to increase anonymity to Bitcoin.

The preliminary paper is the first of two papers describing the algorithmic underpinnings of the project, building on ideas from this forum (CoinJoin, CoinSwap, CoinShuffle, and atomic transfers)

The new algorithm, called Byzantine Cycle Mode, extends the application of Bitcoin mixing primitives (CoinJoin, etc) to large numbers of players mixing unequal inputs. I use an analogy to how encryption modes such as CTR and CBC extend the application of block ciphers (AES, 3DES) to large inputs of non whole block sizes.

Abstract:
Quote
We present a new distributed algorithm, called Byzantine Cycle Mode (BCM), that mixes bitcoins of different sizes. Most known decentralized, riskless Bitcoin mixing algorithms (CoinShuffle, CoinShift, DarkWallet’s CoinJoin) either require the numbers of bitcoin being mixed to be equal or their anonymity strongly depends on it. Some also do not scale easily to large number of players. BCM relaxes these constraints by transforming large instances with unequal bitcoin amounts into smaller sub-instances on equal amounts — allowing players to mix using the known algorithms while preserving their degree of semantic security.

Appreciate feedback.

The second paper, forthcoming, is on a new mixing primitive, CoinShift, based on TierNolan's atomic cross-chain solution.
Jump to: