Author

Topic: New heuristic to group addresses based on its ownership (Read 1741 times)

LKC
newbie
Activity: 6
Merit: 0
Any more detail about those patterns? How can you be sure about the ownership of those patterns? I think there are more information in the transaction. Many people specified the purpose of the transaction in "op_return".
Here is the website that collect the op_return message: http://coinsecrets.org/

Thanks for the information. I checked the website. Most of the op_return messages are gibberish. Perhaps there are more data mining techniques that can be applied to decode the message and figure out the purpose of the transactions. So far I don't intend to go that deep in the direction yet. Would be nice to know if other people have done related study about it.

As for the transaction patterns.

Relay: the transaction that only has one input, one output. Usually used to move Bitcoin from one party to another without leaving any change behind. According to my study about the block chain, this kind of transaction are mostly used to mix with other patterns to hide its money flow.

Sweep: This kind of transaction happen when a transaction controller wants to combine multiple separate unspent transaction outputs into a single transaction output that is easier to process and control. Hence, it is very likely that the ownership of the inputs and output are the same.

Peeling Chain: A peeling transaction consists mostly of peel transactions where the change transaction is used as the input to a subsequent peel transaction. A peel transaction has any number of inputs are combined in a transaction and two outputs are created. However, most of the peeling chains that have been studied start with one input and huge amount. It is possible that the pooled mining use peeling chain to pay miners. Hence, we need to cluster peeling chain carefully to reduce the false positive rate.

Those information are quoted from the document in the previous comment.

I think the first two kinds of transaction are easier to detect. Still trying to figure out how to detect a peeling chain precisely.
newbie
Activity: 2
Merit: 0
What do you mean by this?
After reading this document. I think the transaction patterns we can leverage are:

Relay transations(one input, one output)
Peeling chain(consecutive transactions with one input, two outputs)
Sweep transactions(multiple inputs, one output)
Any more detail about those patterns? How can you be sure about the ownership of those patterns? I think there are more information in the transaction. Many people specified the purpose of the transaction in "op_return".
Here is the website that collect the op_return message: http://coinsecrets.org/
LKC
newbie
Activity: 6
Merit: 0
According to the previous discussions about Heuristic2.

I think there are many cases that the payments are going to new addresses not only the change. That’s why the developers set up many restrictions about Heuristic 2 to lower the false positive rate. For detail please refer to the two studies below:

Bitcoin and Beyond: Exclusively Informational Money
 (Section 5.1 entity graph)
http://arxiv.org/pdf/1304.4758.pdf

A Fistful of Bitcoins: Characterizing Payments Among Men with No Names
(Section 4.3 Heuristic 2)
http://www0.cs.ucl.ac.uk/staff/s.meiklejohn/files/imc13.pdf

Perhaps there are more conditions that can be added to the heuristic to identify the change addresses accurately.  Such as the transaction amount like the previous discussion.

Condition:
1. Only two output in the transaction.
2. One of the output is a new address, the other one is an old address.
3*. The new address has the ugly amount of Bitcoin (e.g. 0.1876573 BTC), while the old address of the transaction has the amount that round to the two(?) digits after decimal point(e.g.  0.10 BTC).


In this way, we will be able to avoid the exception below. Even the converted amount has many digits in BTC. It is very unlikely the change amount will be a nice number and send to an old address.
Maybe when btc value will be stable, but more often than not for payments it's a round amount in $ or € or whatever converted to an amount with many digits in BTC...

-------------------------------------------------------------------------------------------------------------------------------------------------

According to the discussions about the new heuristic based on transaction patterns

This is very helpful!! Thanks!!

After reading this document. I think the transaction patterns we can leverage are:

Relay transations(one input, one output)
Peeling chain(consecutive transactions with one input, two outputs)
Sweep transactions(multiple inputs, one output)

I personally also often send the exact input to a casino as the exact amount I want to gamble with is not that important and I want to avoid a change address. This might be very specific to people that have full control over the inputs they spend though.

This is the exception that related to the Sweep transaction pattern. Perhaps we can set a bar for the number of input. It needs to be higher than the bar to be consider as a valid sweep transaction. Otherwise we consider it as an exception. 


Those are the thoughts I have so far. Welcome to comment or propose some new ideas.

Really appreciate for all the ideas and resources.
copper member
Activity: 1498
Merit: 1562
No I dont escrow anymore.
-snip-

How do you distinguish change addresses form normal outputs?

I found the refined version of Heuristic 2.

The simplest way is to identify the transactions that only have two output addresses.

And only one of the output appears first time in the blockchain (new address).

The new address will be identify as the change address.

 -snip-

Its not correct. I have seen people asking to send Bitcoins to new addresses due to privacy concerns. I don't think "heuristic 2" will work. I might be wrong though!
I see that a lot too with businesses, since it also makes it easier to track payments. I think a better way to think of heuristic 2 is that the change addresses are usually weird numbers. You typically sent some nice even number of Bitcoin to someone, like 0.10 BTC, and not something like 0.1876573 BTC. I think in transactions where one output is clean and the other is not, then the assumption can be made that the weird output is the change.

Unless the price was given in USD and caluclated to BTC as is often the case with payment processors. They also use new address every time they request a payment from you. I personally also often send the exact input to a casino as the exact amount I want to gamble with is not that important and I want to avoid a change address. This might be very specific to people that have full control over the inputs they spend though.

Many of the assumptions here are just that, assumptions and as long as you keep that in mind when analyzing the blockchain you should be fine. The problem I see however is that this is often used as concrete evidence of something, while its nothing more than a possibility.
legendary
Activity: 1100
Merit: 1032
You typically sent some nice even number of Bitcoin to someone, like 0.10 BTC, and not something like 0.1876573 BTC. I think in transactions where one output is clean and the other is not, then the assumption can be made that the weird output is the change.
Maybe when btc value will be stable, but more often than not for payments it's a round amount in $ or € or whatever converted to an amount with many digits in BTC...
staff
Activity: 3458
Merit: 6793
Just writing some code
-snip-

How do you distinguish change addresses form normal outputs?

I found the refined version of Heuristic 2.

The simplest way is to identify the transactions that only have two output addresses.

And only one of the output appears first time in the blockchain (new address).

The new address will be identify as the change address.

 -snip-

Its not correct. I have seen people asking to send Bitcoins to new addresses due to privacy concerns. I don't think "heuristic 2" will work. I might be wrong though!
I see that a lot too with businesses, since it also makes it easier to track payments. I think a better way to think of heuristic 2 is that the change addresses are usually weird numbers. You typically sent some nice even number of Bitcoin to someone, like 0.10 BTC, and not something like 0.1876573 BTC. I think in transactions where one output is clean and the other is not, then the assumption can be made that the weird output is the change.
sr. member
Activity: 384
Merit: 270
Is there any other transaction patterns that we can identify its ownership?
Here it is !
hero member
Activity: 560
Merit: 509
I prefer Zakir over Muhammed when mentioning me!
How does walletexplorer.com do it? Has there been any discussions in the past?
I guess it tries to combine chains of txs and associate addresses with eachother. Like address A sending money to address B, and the change to address C. Now it may guess that address A and C are part of the same wallet (probably HD) or belong to the same person. Which is highly uncertain, as there is no proper way to distinguish change from other outputs.

I've tested walletexplorer.com with a few addresses of mine, but it was able to group none of them correctly. So I wouldn't have too much hope this actually works.

AFAIK, it puts all the input addresses of transactions into one wallet. Everybody knows there are exceptions but this is generally accepted way.

-snip-

How do you distinguish change addresses form normal outputs?

I found the refined version of Heuristic 2.

The simplest way is to identify the transactions that only have two output addresses.

And only one of the output appears first time in the blockchain (new address).

The new address will be identify as the change address.

 -snip-

Its not correct. I have seen people asking to send Bitcoins to new addresses due to privacy concerns. I don't think "heuristic 2" will work. I might be wrong though!
LKC
newbie
Activity: 6
Merit: 0
No, because CoinJoin.

I read some researches about it. Seems like they just accept the fact that there are some exceptions.

In most of the cases the Heuristic1 is correct. Unless we can identify which transactions are made by coinjoin.

How do you distinguish change addresses form normal outputs?

I found the refined version of Heuristic 2.

The simplest way is to identify the transactions that only have two output addresses.

And only one of the output appears first time in the blockchain (new address).

The new address will be identify as the change address.

Problem is, most addresses are used only once. Especially now that pretty much every wallet is HD.

Yeah, that's why I want to do this! Many users are using disposal addresses to hide their track to enhance the privacy.

Maybe I didn't write it clear enough.

For example, the transaction pattern that only has one input, one output.

Most likely the input and output addresses are belong to the same user.

This kind of transaction pattern is mostly used to transit Bitcoin.

Moreover, we can also consider the whole peeling chain is belong to the same user who try to hide his/her track.

Or the distributing and converging transaction patterns.

Still trying to figure out how to identify those more complected transaction patterns.


I've tested walletexplorer.com with a few addresses of mine, but it was able to group none of them correctly. So I wouldn't have too much hope this actually works.

Try this! https://bitiodine.net/

The author implemented heuristic 1 & 2 and built this website.



I couldn't find any previous discussion regarding to this topic.

Any information or idea are welcome : ))))
legendary
Activity: 1176
Merit: 1011
How does walletexplorer.com do it? Has there been any discussions in the past?
I guess it tries to combine chains of txs and associate addresses with eachother. Like address A sending money to address B, and the change to address C. Now it may guess that address A and C are part of the same wallet (probably HD) or belong to the same person. Which is highly uncertain, as there is no proper way to distinguish change from other outputs.

I've tested walletexplorer.com with a few addresses of mine, but it was able to group none of them correctly. So I wouldn't have too much hope this actually works.
full member
Activity: 233
Merit: 102
Heuristic1: Grouping all the input addresses of transactions to the same cluster.
No, because CoinJoin.

Quote
Heuristic2: Grouping the change addresses (shadow addresses) to the same cluster of the input addresses.
How do you distinguish change addresses form normal outputs?

Quote
What I want to do is to develop another heuristic. So far I am thinking to group addresses based on its transaction patterns.

Such as peeling chain, or the addresses that distributed bitcoin to many other addresses and converge in the end.

Is there any other transaction patterns that we can identify its ownership?

It is just the initial idea. Any idea, advice or relative works would be appreciated Smiley
Problem is, most addresses are used only once. Especially now that pretty much every wallet is HD.

How does walletexplorer.com do it? Has there been any discussions in the past?
legendary
Activity: 1176
Merit: 1011
Heuristic1: Grouping all the input addresses of transactions to the same cluster.
No, because CoinJoin.

Quote
Heuristic2: Grouping the change addresses (shadow addresses) to the same cluster of the input addresses.
How do you distinguish change addresses form normal outputs?

Quote
What I want to do is to develop another heuristic. So far I am thinking to group addresses based on its transaction patterns.

Such as peeling chain, or the addresses that distributed bitcoin to many other addresses and converge in the end.

Is there any other transaction patterns that we can identify its ownership?

It is just the initial idea. Any idea, advice or relative works would be appreciated Smiley
Problem is, most addresses are used only once. Especially now that pretty much every wallet is HD.
LKC
newbie
Activity: 6
Merit: 0
Hi, all

As we all know that Bitcoin users can have more than one Bitcoin addresses.

I am looking for every possible ways to group addresses based on their ownership.

After doing some research about this topic. There are currently two popular heuristics.

--------------------------------------------------------------------------------------------------------------------

Heuristic1: Grouping all the input addresses of transactions to the same cluster.

Heuristic2: Grouping the change addresses (shadow addresses) to the same cluster of the input addresses.

--------------------------------------------------------------------------------------------------------------------

What I want to do is to develop another heuristic. So far I am thinking to group addresses based on its transaction patterns.

Such as peeling chain, or the addresses that distributed bitcoin to many other addresses and converge in the end.

Is there any other transaction patterns that we can identify its ownership?

It is just the initial idea. Any idea, advice or relative works would be appreciated  Smiley
Jump to: