Pages:
Author

Topic: How to detect the change output? - page 2. (Read 7282 times)

hero member
Activity: 784
Merit: 1009
firstbits:1MinerQ
December 14, 2012, 03:45:00 AM
#13
The problem is that size() is one in the common case of one payee, so GetRandInt will always return 0.The change ends up in the first output.

I think it should be size()+1.
Even with many outputs it would always not be the last one. Would it be useful as well to randomly split change into more than one address when above some value? To further complicate predictability.
hero member
Activity: 900
Merit: 1014
advocate of a cryptographic attack on the globe
December 14, 2012, 02:26:04 AM
#12
bitcoin-qt tries to randomize the position of the change output, but I believe the code has a flaw...
Ay carumba, how did we not notice that for over two years?

I introduced that bug with the 'sendmany' command two years ago (commit b9d1ed85). This is why programmers should not be trusted to test their own code (I probably carefully tested to make sure the change position looked random when I send to more than one destination, and never tested the degenerate send-to-one case; sigh).



Oh wow, so the change was always the first output if you send bitcoins to 1 address? Awesome find. But yea.. how did we miss it.
legendary
Activity: 1652
Merit: 2301
Chief Scientist
December 13, 2012, 09:30:16 PM
#11
bitcoin-qt tries to randomize the position of the change output, but I believe the code has a flaw...
Ay carumba, how did we not notice that for over two years?

I introduced that bug with the 'sendmany' command two years ago (commit b9d1ed85). This is why programmers should not be trusted to test their own code (I probably carefully tested to make sure the change position looked random when I send to more than one destination, and never tested the degenerate send-to-one case; sigh).

legendary
Activity: 1708
Merit: 1010
December 13, 2012, 09:00:06 PM
#10
I wonder if having a choice in the automatic selection methods would be beneficial as well.  Consider this scenario.

Say I have a full desktop client, as well as a thin hardware client (think bitcoincard) that syncs with the desktop client on a regular basis.  So I want my desktop client to automaticly create simple transactions that the thin hardware client can use easily, and preferablely in combinations that don't have a change address at all.  (Thin hardware clients most likely would have only one, or only a few, addresses to work with, change addresses would be trivial to identify if they go back to the sending address)

So what I would want my desktop client to do is to regularly produce transactions wherein I have two change transactions; one for the thin client, in the useful denominations, and one with the true change back to the desktop client.  One time, the thin client might get a .01 transaction, another time it might get .02, and another .05; up through .1|.2|.5|1|2|5 so that it would have a selction of transactions that would permit it to combine a spending transaction to any exact amount, or almost any exact amount.  A full client could also use this same technique to, occasionally, produce a changeless transaction as well.  I'm not sure if this would be useful or not, but I would guess that it might complicate 'taint' tracking systems.
Hal
vip
Activity: 314
Merit: 4276
December 13, 2012, 08:43:57 PM
#9
bitcoin-qt tries to randomize the position of the change output, but I believe the code has a flaw:

// Insert change txn at random position:
vector::iterator position = wtxNew.vout.begin()+GetRandInt(wtxNew.vout.size());
wtxNew.vout.insert(position, CTxOut(nChange, scriptChange));

The problem is that size() is one in the common case of one payee, so GetRandInt will always return 0.The change ends up in the first output.

I think it should be size()+1.
vip
Activity: 1386
Merit: 1140
The Casascius 1oz 10BTC Silver Round (w/ Gold B)
December 02, 2012, 12:14:33 AM
#8
For instance, I just select all addresses that have balances with more than 2 decimal places, then I create a transaction sending their combined balance:  the integer part goes back to me, the non-integer part gets donated to another project.

Funny I'm not the only one who does this.  I normally work mostly with whole bitcoins (since my lowest physical coin is 1 BTC anyway), and then I use coin crumbles to make paper banknote wallets worth 0.1 BTC and carry them around to leave in tip jars etc.

Anyway, one useful (but less useful in a deterministic sense) heuristic when following the blockchain is to try and correlate behavior as well.  If I notice consistent behavior following a chain one way, it's probably the same person.  This is what led to me providing what was later determined to be a critical clue as to Goat's lost 400 BTC a while back: I notice that if I followed the chain a certain way I could see a pattern of a sender who frequently sent bitcoins in amounts like 3.995 1.995 8.995 which led me to suggest that it was a service that charged a 0.005 BTC transaction fee and took it out of the transaction amount.
legendary
Activity: 1428
Merit: 1093
Core Armory Developer
December 02, 2012, 12:05:56 AM
#7
By the way, I just implemented coin-control in Armory (not released yet, but soon).  And I discovered a good use for it:  cleaning up BTC crumbles in my wallet.  For instance, I just select all addresses that have balances with more than 2 decimal places, then I create a transaction sending their combined balance:  the integer part goes back to me, the non-integer part gets donated to another project.   Then my wallet is a ton cleaner (its balance is 28.15 instead of 28.37302811).  I just started doing this with each of my wallets and it made me think of this thread:

(1) The recipient of my transaction is being sent an output with 8 decimal places, the change has zero decimal places -- the exact opposite of what you would expect of a normal transaction
(2) Future transactions will be much cleaner in terms of decimal places, because when I pay someone 10.1 BTC, my change is 18.05 BTC, which is fairly indistinguishable.  
(3) Not that it matters to determining change, but I think it's good for the network (albeit, very mildly) to be sweeping up the crumbs like that.

I suspect, when I do release coin control, there will be other users doing the same thing, further complicating anyone's efforts to determine what is the correct change address. 
hero member
Activity: 900
Merit: 1014
advocate of a cryptographic attack on the globe
November 28, 2012, 11:51:50 PM
#6
Given a transaction, what methods exist to detect which of the outputs is most likely to be the change?

How do different clients choose the order of the outputs? Assuming no order information can be used, what kind of other heuristics can be used?


I spent a lot of time implementing coin selection in Armory.  A few of your assumptions will be incorrect if the tx was created with Armory.  Here's how Armory creates transactions:

(1) Coin-selection-evaluation method.   This is a method that scores a coin-selection based on various factors:  How many unique addresses are linked on the input side, the size of the transaction, the required fee, and... the output anonymity of the transaction (explained below).  All factors are scored between 0 (unfavorable) and 1 (favorable).

(2) I generate a dozen deterministic coin selection methods that accommodate different, specific wallet structures.  Those are added to the list. Then I generate some semi-random solutions (deterministic solutions like above, but with some perturbation).  

(3) Score all the solutions in part (2) with the eval method in part (1).  Default weights are hard-coded into armoryengine.py, but the user could modify them.  It defines how important each factor is:  most important is not using zero-conf.  Next is whether the transaction can be free (not unrelated to zero-conf).  Then number of input addresses linked.  Then tx priority, tx size, and output anonymity.  However, the user is free to make output anonymity most important (I do!  but only for fun, not because I actually care Smiley).

(4) A new address is retrieved from the wallet, a change TxOut is added to the recipient list, then the recipient list is shuffled.  Transaction is signed and broadcast.

The parts you care about:  

(1)  If the transaction is small relative to the wallet balance, it a couple of the deterministic solutions are approx 2x the output value and 3x, so that change and recip are approximately the same or at least same order of magnitude.  It may help clean up dust, it may help anonymity (but no guarantee for either).  But when the tx amount is small relative to the wallet balance, you will frequently get equal or reversed size outputs, in random order.  This violates your assumption that the change address is always smaller.  Even if Armory didn't do it, a wallet with one very large TxOut, must have a large change output if you want to send only a small amount to someone.  So even without Armory's technique, you'll see that assumption violated with any client.

(2) Recipients are always shuffled, though only with python RNG (don't need cryptographic RNG for this Smiley)

(3) The fun part of Armory's "output anonymity" scoring is that it actually takes into account the number of significant digits in both change output and recipient.  Even though (recip, change) = (12.4, 11.8320385) are the same order of magnitude, it's pretty obvious which one is change which is output.  As such, I actually count the number of significant digits, and give the selection a higher score if it's closer, like (12.4, 11.83).  It gets an extra bonus -- a score actually above 1.0 -- if it finds a solution in which the change address has less sigfigs than the recip:  (12.4, 10.0).  i.e. deceptively-sized outputs is favored.  

How often does (3) trigger?  I don't know.  I've tested the evaluation code, though, and it definitely scores those solutions higher, but I don't know how often such deceptive solutions are ever found/included.  But if it ever happens, then your assumption about sig figs would be violated, as well.


Awesome.
legendary
Activity: 1428
Merit: 1093
Core Armory Developer
November 27, 2012, 05:35:32 PM
#5
Given a transaction, what methods exist to detect which of the outputs is most likely to be the change?

How do different clients choose the order of the outputs? Assuming no order information can be used, what kind of other heuristics can be used?


I spent a lot of time implementing coin selection in Armory.  A few of your assumptions will be incorrect if the tx was created with Armory.  Here's how Armory creates transactions:

(1) Coin-selection-evaluation method.   This is a method that scores a coin-selection based on various factors:  How many unique addresses are linked on the input side, the size of the transaction, the required fee, and... the output anonymity of the transaction (explained below).  All factors are scored between 0 (unfavorable) and 1 (favorable).

(2) I generate a dozen deterministic coin selection methods that accommodate different, specific wallet structures.  Those are added to the list. Then I generate some semi-random solutions (deterministic solutions like above, but with some perturbation).  

(3) Score all the solutions in part (2) with the eval method in part (1).  Default weights are hard-coded into armoryengine.py, but the user could modify them.  It defines how important each factor is:  most important is not using zero-conf.  Next is whether the transaction can be free (not unrelated to zero-conf).  Then number of input addresses linked.  Then tx priority, tx size, and output anonymity.  However, the user is free to make output anonymity most important (I do!  but only for fun, not because I actually care Smiley).

(4) A new address is retrieved from the wallet, a change TxOut is added to the recipient list, then the recipient list is shuffled.  Transaction is signed and broadcast.

The parts you care about:  

(1)  If the transaction is small relative to the wallet balance, it a couple of the deterministic solutions are approx 2x the output value and 3x, so that change and recip are approximately the same or at least same order of magnitude.  It may help clean up dust, it may help anonymity (but no guarantee for either).  But when the tx amount is small relative to the wallet balance, you will frequently get equal or reversed size outputs, in random order.  This violates your assumption that the change address is always smaller.  Even if Armory didn't do it, a wallet with one very large TxOut, must have a large change output if you want to send only a small amount to someone.  So even without Armory's technique, you'll see that assumption violated with any client.

(2) Recipients are always shuffled, though only with python RNG (don't need cryptographic RNG for this Smiley)

(3) The fun part of Armory's "output anonymity" scoring is that it actually takes into account the number of significant digits in both change output and recipient.  Even though (recip, change) = (12.4, 11.8320385) are the same order of magnitude, it's pretty obvious which one is change which is output.  As such, I actually count the number of significant digits, and give the selection a higher score if it's closer, like (12.4, 11.83).  It gets an extra bonus -- a score actually above 1.0 -- if it finds a solution in which the change address has less sigfigs than the recip:  (12.4, 10.0).  i.e. deceptively-sized outputs is favored.  

How often does (3) trigger?  I don't know.  I've tested the evaluation code, though, and it definitely scores those solutions higher, but I don't know how often such deceptive solutions are ever found/included.  But if it ever happens, then your assumption about sig figs would be violated, as well.
kjj
legendary
Activity: 1302
Merit: 1026
November 27, 2012, 04:18:34 PM
#4
3. (Assuming no anonymity-targeted coin selection): The change must be lower in value than all inputs (otherwise that input would not have been selected). This is especially useful for transactions where multiple coins are combined.

This is a decent assumption most of the time, but once every few weeks, someone pops on IRC complaining that the coin selector overselected.  Most recent was, if I recall, something like 0.5 + 0.2 + 0.3 + 0.01 -> 1.0 + 0.01

On the topic of tracking in general, I would just say, "Welcome to the arms race".  The raw transaction API lets crazy people like me do silly things like create all transactions with 2 to 5 outputs of nearly equal size.  The next step will be a public mix-sender.
legendary
Activity: 1596
Merit: 1100
November 27, 2012, 03:51:40 PM
#3
Given a transaction, what methods exist to detect which of the outputs is most likely to be the change?

It is intentionally randomized to make it harder to guess Smiley

legendary
Activity: 2506
Merit: 1010
November 27, 2012, 03:49:21 PM
#2
Any other suggestions?

The Bitcoin-Qt client will pull a new address from the key pool for change.  Thus any amount sent to an address that already has been used in a transaction is unlikely to be change.   Of course, with other clients, particularly Blockchain.info, this isn't necessarily the case.

There are some clues from anecdotes as well.  For over-the-counter trades, quite often upon receipt of funds the recipient will turn around and spend those funds that were received as soon as there has been a confirmation.

This alone doesn't help much but because the coins chosen by Bitcoin-Qt client will first try to use coins for a transaction where all coins selected already have at least six confirmations, then if a transaction spends a coin with fewer than six confirmations it was only done so because there were insufficient funds in the wallet of coins with six or more confirmations.

So take a transaction A with outputs X and Y.
Then another transaction, (transaction B), spends X after 1 confirmation.   More often than not, it would seem, the argument could be made that looking at both transactions together, that Y was the change in transaction A.   The odds are much less likely to be that X was the change but that change was then used in another transaction from the same wallet shortly thereafter.   It might even be likely that in transaction B, if there is more than one output that the smaller output is the change.  A person that spends funds shortly after receiving them will generally be spending all of the funds received or at least a large chunk of it.
donator
Activity: 2058
Merit: 1054
November 27, 2012, 01:04:55 PM
#1
Given a transaction, what methods exist to detect which of the outputs is most likely to be the change?

How do different clients choose the order of the outputs? Assuming no order information can be used, what kind of other heuristics can be used?

We can assume that no custom anonymization techniques are used. Among other things it means that there will be at most 1 change output.

Looking at some transactions I've made, it looks like Bitcoin-qt always puts the change first, and blockchain.info MyWallet always puts the change last. Do they consistently do so? What about other clients?

Some suggested heuristics:

1. In sendmany, if the outputs have simple integer ratios between them, the transaction is likely to be some proportional distribution, and the one output which isn't an integer multiple of the GCD is the change. Example: In this transaction, all outputs are a multiple of 0.01648442, except for the change.

2. Change outputs are likely to have more decimal places (or be otherwise more complicated). Example: In this transaction, 114 is the real transfer, while 7.527 is the difference which has 3 decimal places since that's what the input had.

3. (Assuming no anonymity-targeted coin selection): The change must be lower in value than all inputs (otherwise that input would not have been selected). This is especially useful for transactions where multiple coins are combined.

4. Transactions involving addresses known to belong to some service can have their own rules to be deduced on a per-service basis.

Any other suggestions?
Pages:
Jump to: