Internally, we also have some smallscale R&D on trading algorithms to enable us to move more funds through the bitcoin network itself by buying BTC at the most optimal time possible considering all available information.
Hi Gareth!
I'm sorry I wasn't clear. I'd like to see a reference to a peer-reviewed publication (in the quantitative finance or artificial intelligence fields) that describes the research upon which your R&D team is working.
The Jared Kenna's words I quoted are almost exactly from Pixelon's prospectus, except they had "fractal geometry" thrown in for the good measure.
Would you be so kind as to ask your R&D team for the actual bibliographical references and post them tomorrow or later this week?
Thank you again.
Again, in the interest of transparency, there is no "R&D team" - i'm doing the R&D solo myself, but the algorithmic trading stuff is low priority for now.
There are no published academic papers on the precise application of the methods which we are using, but the methods themselves are pretty standard.
K-means is a clustering algorithm - it takes a dataset and identifies clusters of single items of data that share similarities. The standard example is a 2D plane:
(source: mathworks.com)
In the example above, we have a 2D plane with an X and Y axis, and we have various items of data plotted on it.
Let's imagine (it's more complex than this, but just to explain) the X axis represents time of day and the Y axis represents the transaction size - plot every single transaction and you'll get some natural clusters where there's larger transactions at certain times of day. Now, imagine if scammers like to move very large amounts of money during quiet hours when nobody is around to monitor, we'd get a large cluster during the quiet hours and this would be something that can be thrown into a model.
Now, clustering does nothing in itself to help you identify fraud, and clustering in only 2D is often pretty useless - so you'd identify clusters and then look at the transactions within them to analyse how many resulted in later fraud or had other warning signs, and you'd use that to build a probabilistic model by looking at the variables in your n-dimensional space and looking at what kind of relationships there are between fraud probability and increase/decrease of particular variables.
Combine this with everyone's favourite algorithm - a backprop neural network (not as amazing as most think) and feed it to a naive bayesian classifier and other models and have each classifier output a fraud probability, weight the output of each classifier based on past accuracy and then take a simple mean average across all classifiers. You then refuse transactions that score a fraud probability above your tolerable limit based on expected losses (lookup "expected utility").
See
http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.37.1595 for an academic paper dealing with this approach to classification problems.
With BitInstant, the internal model does not yet have a large amount of data but over time the intention is to refine it more and more. Right now, we've identified a few factors in Dwolla transactions that make them high-risk using simple common sense and natural intelligence (i.e our brains not our computers). These variables are screened for in all Dwolla transactions as a result.
I hope that explains sufficiently, i'm sorry that I can not explain more details as beyond the above the details of the actual variables being analysed and some of the other tricks we use are trade secret territory.