[side note: What did you use to make your graphs?]
I use
R and the
ggplot2 package for the plot, and
mclust for the analysis.
It turns out that investor amounts are not Pareto distributed (the continuous analog of Zipf's law), but rather a mixture of two log normally distributed random variables.
That's interesting.
The first thing I thought of when I saw that was that when JD first launched, I gave some 250 or so separate gifts of 0.01 BTC to forum users. Many of them were probably invested and forgotten. Could those account for the "small investor" group, or isn't it in the right place for that?
I set out to avoid small low bandwidth groups from having any significant effects on the results. As you can see in the density plots and histograms below on the right, the are significant spikes at round numbered amounts at approximately 0.01, 0.1, 1 and ten btc; also theres a bugle in the shoulder of the (right side) density plot at 100 btc.
These are the same sorts of spikes that were present in the SatoshiDice bet data and seem to be due to human preferences for round numbers (give or take a few percent for wins or losses at the time the data was recorded.
The spikes were not what I wanted to model, and in fact just add noise. It's true that I could have added them in, but then where do you stop? In the end you could create a mixture of gaussians consisting of one per datapoint.
That's why I used the Baysian information criterion (BIC) to decide on the number of gaussians used to model the data, and means that the "power of ten" spikes don't obfuscate the more general trends.