A csv with timestamps + balance would be neat to make a cool chart. Damn, now I really need to set up Ubuntu to get your thingie running...
Your wish has come true.
Latest git has csv output for transactions:
./parser closure 1PSf86KnLuzM7Ris5kDhTEZwooR3p2iyfV > PIRATE-CLOSURE
./parser transactions --csv file:PIRATE-CLOSURE > PIRATE-CLOSURE-TX.csv
Here's what it looks like when you graph it :
Well, at least he still seems to be able to get rid of huge blocks of BTC at once...
Next step should be imho to do the following:
Create clusters out of all addresses where money from or to pirate was transmitted, then write the complete sums as incoming/outgoing to each of them.
Example:
Cluster 1: Sent 1500 to pirate, received 170 from pirate (addresses: 1bitcoinaddress1234, ...)
Cluster 2: Sent 10000 to pirates, received 1337 from pirate (addresses: 1otheraddress1234, ...)
Then it would be apparent (maybe) if there's a cluster (or several clusters) where a lot of funds from pirate are going to and/or coming from.
Easier said than done.
I'd need "seed addresses" for each cluster.
So far, the only address I have that is known to belong to pirate
is this one because he himself confirmed that he controls it.
Wouldn't a brute-forcey way to do this be to take every output from all transactions out of the known pirate cluster and try to build clusters of those, too? Each output from pirate's cluster can get the znort treatment to build pseudo-wallets, and then you could make another pass to tally up total movements between all the clusters you care about. Feed into graphviz, make directed arrows between clusters with thickness dependent on how much total movement went in that direction, and make the cluster (visual) size proportional to total balance in the cluster at the end of the period. For extra points, animate this over time with edge thickness showing an exponential weighted average, and cluster size being instantaneous total coins.
Unfortunately, the computational complexity of running your algorithm once for each output sounds prohibitive, so we probably need a smarter approach
P.S: I still haven't had a chance to play with the union-find approach to building clusters. If we wanted to apply the idea to this multi-cluster problem, we'd probably want to store the data on disk somewhere, both for easier access and to avoid eating someone's RAM. Luckily, the union-find structure is pretty trivial to implement in a database, if you don't mind an additional logarithmic factor in the lookup (which some might argue you're getting anyway when dereferencing pointers in memory).