Author

Topic: How does a site like Blockchain.info know which outputs are change? (Read 1569 times)

legendary
Activity: 1120
Merit: 1152
Blockchain.info is very misleading.

The estimated transaction volume is trite, pure utter guess work.

Someone could buy a coffee and it could show up as a $100,000,000 transaction.

Also the IP address stuff is crap too, so misleading. The ip is the node that relays the transaction to the blockchain node and in no way represents where the actual transaction originated from.
Really? I actually tracked this guy that mined on top of the genesis blocks(orphans duh) using the IP address on the site, and he confirmed it was him.

That's a special case because no other node would have relayed those blocks; in the general case the IP addresses are bullshit.
legendary
Activity: 1862
Merit: 1011
Reverse engineer from time to time
Blockchain.info is very misleading.

The estimated transaction volume is trite, pure utter guess work.

Someone could buy a coffee and it could show up as a $100,000,000 transaction.

Also the IP address stuff is crap too, so misleading. The ip is the node that relays the transaction to the blockchain node and in no way represents where the actual transaction originated from.
Really? I actually tracked this guy that mined on top of the genesis blocks(orphans duh) using the IP address on the site, and he confirmed it was him.
legendary
Activity: 2618
Merit: 1007
Quote
Also if there are e.g. a 3 BTC input and a 3 BTC input to a 4 BTC output and a 1 BTC output, the change is likely the 1 BTC, since there would have been no real need to combine the inputs otherwise.


That's some big fees, or I'm toodumbforbitcoin.
Yeah, I meant 2+3BTC inputs, not 3+3... Wink
sr. member
Activity: 448
Merit: 254
Bitcoin-Qt is not the only wallet application...

Sure, I meant the probability skews a bit.  In practice maybe it doesn't help much.
legendary
Activity: 905
Merit: 1012
If it were me, I'd do prime decomposition on the amounts, calculate their relative magnitude, a boolean value indicating whether they'd been seen before, etc., label a number of training examples, and have a support vector machine generate a classifier.

I'm so glad it is not you, that kind of thing is exactly someone fascinated with machine learning would go for. So many thousands and thousands of crap papers where guys blindly go after machine learning -- and it is mostly always svm --, without even considering other methods, reporting results close to 100% accuracy and other metrics just to find out that they don't even know how to setup training/testing sets, neither have a clue about the features they are using.

Yes, because when faced with a classic machine learning problem, the tried and true techniques of machine learning are not what you'd want to use.
hero member
Activity: 910
Merit: 1005
The logic is pretty simple:

- Remove all outputs matching any input addresses.
- If the transaction has one input take the smallest output.
- If a transaction has more than two inputs and exactly two outputs take the output with a value closest to the total input value.
- If a transaction has more than two outputs return the value of the smallest output.

Anyone is welcome to suggest improvements.

If you were really determined the accuracy could be improved by analysing the taint of the inputs used in the next transaction.
member
Activity: 98
Merit: 10
nearly dead
If it were me, I'd do prime decomposition on the amounts, calculate their relative magnitude, a boolean value indicating whether they'd been seen before, etc., label a number of training examples, and have a support vector machine generate a classifier.

I'm so glad it is not you, that kind of thing is exactly someone fascinated with machine learning would go for. So many thousands and thousands of crap papers where guys blindly go after machine learning -- and it is mostly always svm --, without even considering other methods, reporting results close to 100% accuracy and other metrics just to find out that they don't even know how to setup training/testing sets, neither have a clue about the features they are using.
legendary
Activity: 1638
Merit: 1001
Not to mention their 650W/Gh/s electricity consumption nonsense.  Every now and then the MSM picks that up.
legendary
Activity: 1176
Merit: 1015
Blockchain.info is very misleading.

The estimated transaction volume is trite, pure utter guess work.

Someone could buy a coffee and it could show up as a $100,000,000 transaction.

Also the IP address stuff is crap too, so misleading. The ip is the node that relays the transaction to the blockchain node and in no way represents where the actual transaction originated from.
legendary
Activity: 1638
Merit: 1001
Quote
Also if there are e.g. a 3 BTC input and a 3 BTC input to a 4 BTC output and a 1 BTC output, the change is likely the 1 BTC, since there would have been no real need to combine the inputs otherwise.


That's some big fees, or I'm toodumbforbitcoin.
legendary
Activity: 1974
Merit: 1029
Plus, from what I've seen, bc.info doesn't bother with transactions having more than two outputs, the estimated amount is always the whole amount in the tx.
legendary
Activity: 2618
Merit: 1007
Also if there are e.g. a 3 BTC input and a 3 BTC input to a 4 BTC output and a 1 BTC output, the change is likely the 1 BTC, since there would have been no real need to combine the inputs otherwise.

Still it often guesses wrong, maybe there is some research potential in there somehow?
legendary
Activity: 905
Merit: 1012
Bitcoin-Qt is not the only wallet application...
sr. member
Activity: 448
Merit: 254
If it were me, I'd do prime decomposition on the amounts, calculate their relative magnitude, a boolean value indicating whether they'd been seen before, etc., label a number of training examples, and have a support vector machine generate a classifier.

There's a million other ways you can do it and get decent results. Doesn't stop it from being a WAG though.

In a typical two-output tx created before ~2013-01-30, there's a good chance the first output is the change address.  Maybe even longer, depending on how long until the fix was widely deployed.
legendary
Activity: 905
Merit: 1012
If it were me, I'd do prime decomposition on the amounts, calculate their relative magnitude, a boolean value indicating whether they'd been seen before, etc., label a number of training examples, and have a support vector machine generate a classifier.

There's a million other ways you can do it and get decent results. Doesn't stop it from being a WAG though.
staff
Activity: 4284
Merit: 8808
Like a lot of other things on BC.i that are just guesses, its right often enough to confuse people.
member
Activity: 89
Merit: 14
That's just guestimation

It makes a wild-ass guess.

Yup, +1 to these 2. It doesn't know it's a guess.

I imagine there's some guessing logic like, have any of the output addresses been seen on the blockchain before? Yes? It might be the actual "spend"... there might be more logic depending on the ratio of amounts to each address.

But it definitely does not get it correct every time, I've seen plenty of transactions (of my own) where the it has estimated incorrectly.
legendary
Activity: 905
Merit: 1012
It makes a wild-ass guess.
legendary
Activity: 1792
Merit: 1111
I can't find anything on the wiki about transaction change.

I know what it is but how does it appear any different than other outputs?

That's just guestimation
sr. member
Activity: 370
Merit: 250
I can't find anything on the wiki about transaction change.

I know what it is but how does it appear any different than other outputs?
Jump to: