Below is the original short range forecast post. Long range forecast posts
start here.
- Model 0: log(H) ~ 1.74 + 0.94lag1(log(H)) + 0.21lag1(log(p)) - 0.14lag4(log(p)) . This will be within +/- 15.1% of the actual network hashrate with 95% confidence.
You are modeling/training your forecast function using the same data that you use to verify it's forecast accuracy. This is not a proper way to create a data model as all you get is a function that is able to accurately model past price movements and you have no information on it's accuracy
Proper way is to split the historical pricing data into at least two sets:
- learning set, from which you deduce your function parameters. You can use jan-jun 2012 price data for example.
- test set, which you can use to verify function accuracy, that is how successfully it models price data. You can use jul-dec 2012 for example.
- possibly you could have more test sets, 2011 data for example.
Thanks for your feedback.
I've always been aware that I might have been overfitting. I couldn't split the data - there's just so little data to be had. Also, various points in time have had slightly differing auto - and cross correlations and I wanted to be able to produce a simple linear model that could account for all data, especially since I'm only using two variables out of all the variables that can affect the network hashrate from time to time.
Instead I decided to use all the data I had, using a linear model with minimum of coefficients and lagged variables. Since then I have applied the model weekly. Imagine the initial post as the "training" phase and my current posts as the "test set". It seems so far that the model has been predicting future network hashrate - and future network mining difficulty - far better than I expected.
Also, it seems you're assuming I'm modelling price. I'm not modelling price data - I'm using lagged historical price data (and lagged historical network hashrate data) to provide a forecast of the network hashrate. There is no modelling of price at all, just a 1 to 4 week forecast of the network hashrate, with confidence intervals for the error (which I'll be the first to admit are quite large at the 3 and 4 week forecast).
If you were to do what I suggested you would find that your model is not very good at modeling price data and hence a very bad predictor/forcaster of future data.
This is not your fault, it's just that you have taken on a very very difficult problem to solve...
However, as I mentioned I'm not modelling price data. I've used genetic algorithms to do that before using the
ADF test with some success - but over time it was not enough to beat the ask/bid spread. So I'll leave that to the finance geeks
If you look at my weekly update posts, you'll see I'm assessing the models as I go - not changing them, just assessing them. So far they have performed as expected - within the 95% CI for error, except when the reward halving occurred ( a variable I cannot account for in a simple linear/lag function ).
If I haven't explained this well, take a look at the long range forecast post:
http://organofcorti.blogspot.com/2012/12/104-long-range-forecasts-of-network.htmland the most recent update post:
http://organofcorti.blogspot.com/2013/01/weekly-network-forecast-7th-january-2013.htmlI would be interested to read what you think.