Given a transaction, what methods exist to detect which of the outputs is most likely to be the change?
How do different clients choose the order of the outputs? Assuming no order information can be used, what kind of other heuristics can be used?
I spent
a lot of time implementing coin selection in Armory. A few of your assumptions will be incorrect if the tx was created with Armory. Here's how Armory creates transactions:
(1) Coin-selection-evaluation method. This is a method that scores a coin-selection based on various factors: How many unique addresses are linked on the input side, the size of the transaction, the required fee, and... the output anonymity of the transaction (explained below). All factors are scored between 0 (unfavorable) and 1 (favorable).
(2) I generate a dozen deterministic coin selection methods that accommodate different, specific wallet structures. Those are added to the list. Then I generate some semi-random solutions (deterministic solutions like above, but with some perturbation).
(3) Score all the solutions in part (2) with the eval method in part (1). Default weights are hard-coded into armoryengine.py, but the user could modify them. It defines how important each factor is: most important is not using zero-conf. Next is whether the transaction can be free (not unrelated to zero-conf). Then number of input addresses linked. Then tx priority, tx size, and output anonymity. However, the user is free to make output anonymity most important (I do! but only for fun, not because I actually care
).
(4) A new address is retrieved from the wallet, a change TxOut is added to the recipient list,
then the recipient list is shuffled. Transaction is signed and broadcast.
The parts you care about:
(1) If the transaction is small relative to the wallet balance, it a couple of the deterministic solutions are approx 2x the output value and 3x, so that change and recip are approximately the same or at least same order of magnitude. It may help clean up dust, it may help anonymity (but no guarantee for either). But when the tx amount is small relative to the wallet balance, you will frequently get equal or reversed size outputs, in random order. This violates your assumption that the change address is always smaller. Even if Armory didn't do it, a wallet with one very large TxOut,
must have a large change output if you want to send only a small amount to someone. So even without Armory's technique, you'll see that assumption violated with any client.
(2) Recipients are always shuffled, though only with python RNG (don't need cryptographic RNG for this
)
(3) The fun part of Armory's "output anonymity" scoring is that it actually takes into account the number of significant digits in both change output and recipient. Even though (recip, change) = (12.4, 11.8320385) are the same order of magnitude, it's pretty obvious which one is change which is output. As such, I actually count the number of significant digits, and give the selection a higher score if it's closer, like (12.4, 11.83). It gets an extra bonus -- a score actually above 1.0 -- if it finds a solution in which the change address has less sigfigs than the recip: (12.4, 10.0). i.e. deceptively-sized outputs is favored.
How often does (3) trigger? I don't know. I've tested the evaluation code, though, and it definitely scores those solutions higher, but I don't know how often such deceptive solutions are ever found/included. But if it ever happens, then your assumption about sig figs would be violated, as well.