In
Quantitative Analysis of the Full Bitcoin Transaction Graph, Dorit Ron and Adi Shamir have analyzed statistics and interesting structures in the chains of ownership of bitcoins. To this end, they needed a way to determine when two addresses are owned by the same person.
The method they used is as follows: If there is any transaction which has inputs from both addresses, they have the same owner; this is extended transitively. Address pairs which are not in the transitive closure are assumed to have distinct owners.
This method has several weaknesses:
1. If two addresses by distinct owners are linked in an advanced transaction, they will be assumed to have the same owner.
2. If two addresses are owned by the same person but are never visibly linked (e.g. separate wallets), they will be assumed to have distinct owners.
3. If people are using a shared eWallet, they will all be assumed to be the same person - while technically the keys are indeed all held by the same entity, each person has a separate legal ownership of the coins.
They also had this to say:
1. While the notions of a bitcoin and of an address are completely clear, the notion of an owner is quite fuzzy since it can not be derived in a precise way from the available data (this may be a feature ofthe scheme, rather than a bug!). There are several ways how to deal with this issue:
1.1 Ignore it completely, and derive from the graph only statistical information about the behavior of addresses. However, we believe that this will completely distort many types of statistical informationwe try to extract from the graph, e.g., what is the distribution of the number of bitcoins that users keep, how many bitcoins they receive and spend, how big are their typical transactions, etc. Since thescheme enables (and even encourages) users to keep multiple addresses and to constantly shuffle bitcoins internally among their accounts, we believe that it is essential to find a way to distinguish between"internal" and "external" transactions, and thus to determine in some way what is the common ownership of different addresses.
1.2 Use our methodology, which is to assume that in most of the transactions, sending bitcoins from multiple addresses indicates that all these accounts are owned by a single entity. This classification istechnically easy to apply, but it creates two types of errors: We underestimate the common ownership of accounts just because we never saw it in the given transactions, and we overestimate it sinceoccasionally there may be multiple owners who send bitcoins in a single transaction. All the anecdotal evidence we saw so far indicates that overall we tend to underestimate the number of addresseswhich are associated with a single entity (as was demonstrated in the case of Instawallet, in which we found only about 1/3 ot the actual addresses associated with it), and that the errors in the otherdirection, while they exist, are not likely to distort our statistical conclusions in a major way.
1.3 Use a different methodology, which will be closer to the ground truth, and which can be derived either from the available data, or from reliable alternative sources. Here we need your help insuggesting such a methodology, which will be discussed by and accepted by most of the bitcoin community as better representing the issue of common ownership. It is always easier to complain about theshortcomings of one methodology than to suggest a better one!
For the purpose of statistical analysis of the data, what would be the best way to determine common ownership of addresses?