Dear sirs,
I've read your paper "Quantitative Analysis of the Full Bitcoin Transaction Graph" with great interest.
I am however quite surprised about the assumptions you make in order to analyze the collected Bitcoin network data and the conclusions that you draw from there.
Transactions can be trivially constructed from different senders, without sharing private keys. This makes your assumption of transaction construction legality wrong.
Secondly you completely discard the impact of shared wallets on the way ownership can be followed through the blockchain. Even though you might, at the macroscopic level, follow some large chunks of coins, you do not account for the fact that shared wallets completely break the chain of ownership by sending out different coins than the ones that came in.
I have access to the production database of Instawallet so I took on myself to check some of your conclusions in page 10. At the date of your extraction, Instawallet received Bitcoins on 103,513 different addresses (your paper mentions approximately 23,000).
You may also be interested in this document put online by Gregory Maxwell : https://en.bitcoin.it/wiki/Real_peer_review#Linking_transactions_to_identify_ownership
I think I can speak for the whole Bitcoin community when I thank you for your work.
Don't hesitate to contact me if you wish to further discuss this, you may also want to connect to the #bitcoin-dev IRC channel on Freenode.
Best regards,
David FRANCOIS
And their response :
We would like to thank you for your comments.
The main point you raised is that one cannot claim that different addresses participating as senders in the same transactions belong to the same owner. Our response consists of several observations:
1. We quoted from an official policy statement that this should be the case when transactions have multiple sending addresses.
2. We noted that knowledge of multiple private keys is required in this case, and while it is always possible that different owners will share their private keys, this is not likely to happen very often.
3. All the previous papers on issues of privacy in the bitcoin system which we quote in the bibliography make the same assumption, so this is not something that we invented.
4. The fact that some C++ code do not enforce this requirement is not a proof that this is not true in the vast majority of cases.
5. Most of our results are statistical in nature, and are not affected by a small number of exceptions. We are much more likely to underestimate the number of addresses which should be merged together (because we never saw those addresses in the same transactions) than to overestimate them because a few transactions had multiple owners as senders. You just demonstrated that our analysis indeed underestimated the number of addresses in which Instawallet received Bitcoins. We simply saw no evidence in the data suggesting that we should link the 103,513 different addresses you mentioned, so we gave the number of about 23,000 as a lower bound, not the real number which we had no way of knowing.
6. In particular, it is not clear why the issue of dormant coins would be affected by this issue. For example, we are counting how many old coins were sent to an owner who did not initiate any outgoing transactions for three months. If we mistakenly add more addresses to that owner, we make it harder (and not easier) to satisfy this constraint, so we are underestimating the number of dormant coins.
7. Similarly, we do not understand why it would matter to mistakenly combine addresses in all our graphs. It's effect would be to make the graph look more reasonable, since it is easier to explain why someone would send bitcoinshamir to itself rather than send bitcoinshamir to many unrelated addresses only in order to receive them back at the end.
8. Finally, while one can be over cautious and never try to combine any addresses under any circumstances, this will give a greatly distorted picture about how many coins are kept and spent by owners. We believe that our methodology, which is clearly explained in the paper, gives a much better statistical picture even if a tiny number of decisions to unify addresses turn out to be incorrect.
We hope that this answer will clarify the situation.
Yours,
Adi Shamir and Dorit Ron