Pages:
Author

Topic: Long range forecast of the network hashrate and Difficulty (Read 7467 times)

hero member
Activity: 742
Merit: 500
i'm late to the party, but this is fantastic work.
donator
Activity: 2058
Merit: 1007
Poor impulse control.
Fantastic work organofcorti. Thanks for the update.

Tip inbound.

Thank you very much, creativex. And as a special "thank you" treat, you get:

sr. member
Activity: 434
Merit: 250
Fantastic work organofcorti. Thanks for the update.

Tip inbound.
legendary
Activity: 1330
Merit: 1026
Mining since 2010 & Hosting since 2012
were they right

Were who right? If you mean Dalkore, then no, he wasn't right. My estimates were much closer.

Yes, please take his estimates.  I was just giving commentary.  I follow his work and he is really working on refining his numbers. 
donator
Activity: 2058
Merit: 1007
Poor impulse control.
I haven't updated this for a while, so here are the most recent charts:

http://organofcorti.blogspot.com/2013/03/weekly-network-forecast-25th-march-2013.html









legendary
Activity: 1484
Merit: 1005
I'm guessing that the 20 TH/s brought by Avalon will have the network hash rate up to 40 TH/s in a week or two, since anyone who has one will be mining with it.
donator
Activity: 2058
Merit: 1007
Poor impulse control.
Note: The charts are now taking up so much room I'm no longer posting them here; you can see them in all their glory at the blog.

Weekly network forecast 14th January 2013

0.Introduction
The Canary model error for this week turns out to be more than the 95% confidence interval for error, and well outside the 95% confidence interval for the network hashrate estimate. This  possibly implies the effect of an external influence, or more likely due to the 3.2% increase in price coupled with an unexpected 7.7% decrease in the weekly average network hashrate.
  • If there is some sort of external influence which has caused some proportion of miners to switch off suddenly, then the model recovery will likely be slow, as it was after the block reward halving.
  • If the drop in hashrate with an increase in price is a random event, models should recover to within the expected range next week.
  • If the last two weeks were anomalous and the block reward halving is having a continued effect on the network, then model error will continue to be outside the 95% confidence interval for error an unknown period of time.








donator
Activity: 2058
Merit: 1007
Poor impulse control.
ad 1. start here: http://en.wikipedia.org/wiki/Cross-validation_(statistics). Father Google will provide further references + book recommendations
No mention that it's the only way to test the validity of a model with confidence, just that it is one method.

ad 2. how far into the future are you forecasting?
Model.f1 forecasts 1 week ahead
Model.f2 forecasts 2 weeks ahead
Model.f3 forecasts 3 weeks ahead
Model.f4 forecasts 4 weeks ahead
The Canary model attempts to detect effects of unknown variables.

ad 3. sure! I'll bet you 1 BTC Smiley

OK - I'll bet 1 btc that for the next 5 weeks the actual weekly network average hashrate will be within 13% of the Model.f1 prediction, unless the canary model indicates an external (non hashrate or price) influence.

Now, who will be third party escrow? Smiley
full member
Activity: 219
Merit: 100

Can you provide a source for this assertion? I've found no books on ARIMA modelling that make such a claim - that training and testing sets are necessary for forecasts of auto- and cross-correlative models. These sets might be necessary when using symbolic regression and (sometimes) when using genetic algorithms to model time series data, but it's not mentioned in the ARIMA texts I've read.

In the meantime, I provide 95% confidence intervals for the hashrate forecasts based on historical data and so far the forecasts have only exceeded the confidence interval only when an unknown variable (the reward halving) has an effect on the network. I'm surprised by the model's rapid recovery post the block halving.

We'll just have to see how it goes. Care to make a wager? Wink


ad 1. start here: http://en.wikipedia.org/wiki/Cross-validation_(statistics). Father Google will provide further references + book recommendations

ad 2. how far into the future are you forecasting?

ad 3. sure! I'll bet you 1 BTC Smiley
donator
Activity: 2058
Merit: 1007
Poor impulse control.
First let me correct what I said in the earlier post when I spoke of price history data: I meant difficulty history data. Indeed you are modeling difficulty and not price I must have confused the two Smiley

I read your report and found this: http://2.bp.blogspot.com/-YgYQ-OX3S5E/UOvLIj9-6SI/AAAAAAAAEoQ/drCPQ70jf7U/s1600/DifficultyForecast.2013-01-08.png

As soon as you try to forcast data into any significant future, your model is way off real data.


As mentioned in the post, I expected the model to fail when the block reward halving occurred. In the chart to which you link, I asume you refer to the inability of the network hashrate models to predict difficulty retargets correctly? I think it's done a rather good job. The only prediction that has been significantly different from the actual difficulty retarget is the one immediately after the reward halving. The other difficulty predictions (after making the model in November) have been better than I expected, since I'm not modelling it directly and can't provide a confidence interval.

Anyway splitting data into learning and test sets is a must. That's the only way to test the validity of the model with any confidence.

Can you provide a source for this assertion? I've found no books on ARIMA modelling that make such a claim - that training and testing sets are necessary for forecasts of auto- and cross-correlative models. These sets might be necessary when using symbolic regression and (sometimes) when using genetic algorithms to model time series data, but it's not mentioned in the ARIMA texts I've read.

Do keep in mind that in (for example) a univariate series of one hundred weekly average hashrate data points, we have an unknown number of known independent variables (historical price and hashrate data) and a unknown number of unknown independent variables. Some of those unknown variables have been / will be rare but have had / will have a significant effect on the network hashrate which cannot be accounted for by the model. Splitting the data into two smaller sets risks the unknown variables having a significant effect on the accuracy of the model.

Another suggestion would be to try other modelling techniques such as neural networks. You can find information about all these in books covering Machine Learning...

That could be fun, but I'm not trying to provide the most accurate forecast possible - I'm using the simplest possible method to achieve an aim, explain how it's done and hopefully interest some readers to try it for themselves - or go one step further. Anyone who has a basic level of math and coding skills should be able to replicate my work and at the same time know what they're doing. If my work encourages someone to develop a truly accurate forecast, I'll be very happy.

In the meantime, I provide 95% confidence intervals for the hashrate forecasts based on historical data and so far the forecasts have only exceeded the confidence interval only when an unknown variable (the reward halving) has an effect on the network. I'm surprised by the model's rapid recovery post the block halving.

We'll just have to see how it goes. Care to make a wager? Wink
full member
Activity: 219
Merit: 100
First let me correct what I said in the earlier post when I spoke of price history data: I meant difficulty history data. Indeed you are modeling difficulty and not price I must have confused the two Smiley

I read your report and found this: http://2.bp.blogspot.com/-YgYQ-OX3S5E/UOvLIj9-6SI/AAAAAAAAEoQ/drCPQ70jf7U/s1600/DifficultyForecast.2013-01-08.png

As soon as you try to forcast data into any significant future, your model is way off real data.

Anyway splitting data into learning and test sets is a must. That's the only way to test the validity of the model with any confidence.

Another suggestion would be to try other modelling techniques such as neural networks. You can find information about all these in books covering Machine Learning...





donator
Activity: 2058
Merit: 1007
Poor impulse control.
Below is the original short range forecast post. Long range forecast posts start here.

  • Model 0: log(H) ~ 1.74 + 0.94lag1(log(H)) + 0.21lag1(log(p)) - 0.14lag4(log(p)) . This will be within +/- 15.1% of the actual network hashrate with 95% confidence.

You are modeling/training your forecast function using the same data that you use to verify it's forecast accuracy. This is not a proper way to create a data model as all you get is a function that is able to accurately model past price movements and you have no information on it's accuracy

Proper way is to split the historical pricing data into at least two sets:
- learning set, from which you deduce your function parameters. You can use jan-jun 2012 price data for example.
- test set, which you can use to verify function accuracy, that is how successfully it models price data. You can use jul-dec 2012 for example.
- possibly you could have more test sets, 2011 data for example.

Thanks for your feedback.

I've always been aware that I might have been overfitting. I couldn't split the data - there's just so little data to be had. Also, various points in time have had slightly differing auto - and cross correlations and I wanted to be able to produce a simple linear model that could account for all data, especially since I'm only using two variables out of all the variables that can affect the network hashrate from time to time.

Instead I decided to use all the data I had, using a linear model with minimum of coefficients and lagged variables. Since then I have applied the model weekly. Imagine the initial post as the "training" phase and my current posts as the "test set". It seems so far that the model has been predicting future network hashrate - and future network mining difficulty - far better than I expected.

Also, it seems you're assuming I'm modelling price. I'm not modelling price data - I'm using lagged historical price data (and lagged historical network hashrate data) to provide a forecast of the network hashrate. There is no modelling of price at all, just a 1 to 4 week forecast of the network hashrate, with confidence intervals for the error (which I'll be the first to admit are quite large at the 3 and 4 week forecast).

If you were to do what I suggested you would find that your model is not very good at modeling price data and hence a very bad predictor/forcaster of future data.

This is not your fault, it's just that you have taken on a very very difficult problem to solve...

However, as I mentioned I'm not modelling price data. I've used genetic algorithms to do that before using the ADF test with some success - but over time it was not enough to beat the ask/bid spread. So I'll leave that to the finance geeks Wink

If you look at my weekly update posts, you'll see I'm assessing the models as I go - not changing them, just assessing them. So far they have performed as expected - within the 95% CI for error, except when the reward halving occurred ( a variable I cannot account for in a simple linear/lag function ).

If I haven't explained this well, take a look at the long range forecast post:
http://organofcorti.blogspot.com/2012/12/104-long-range-forecasts-of-network.html

and the most recent update post:
http://organofcorti.blogspot.com/2013/01/weekly-network-forecast-7th-january-2013.html

I would be interested to read what you think.
full member
Activity: 219
Merit: 100
Below is the original short range forecast post. Long range forecast posts start here.

  • Model 0: log(H) ~ 1.74 + 0.94lag1(log(H)) + 0.21lag1(log(p)) - 0.14lag4(log(p)) . This will be within +/- 15.1% of the actual network hashrate with 95% confidence.

You are modeling/training your forecast function using the same data that you use to verify it's forecast accuracy. This is not a proper way to create a data model as all you get is a function that is able to accurately model past price movements and you have no information on it's accuracy.

Proper way is to split the historical pricing data into at least two sets:
- learning set, from which you deduce your function parameters. You can use jan-jun 2012 price data for example.
- test set, which you can use to verify function accuracy, that is how successfully it models price data. You can use jul-dec 2012 for example.
- possibly you could have more test sets, 2011 data for example.

If you were to do what I suggested you would find that your model is not very good at modeling price data and hence a very bad predictor/forcaster of future data.

This is not your fault, it's just that you have taken on a very very difficult problem to solve...

donator
Activity: 2058
Merit: 1007
Poor impulse control.
This week's update, report from the blog:

http://organofcorti.blogspot.com/2013/01/weekly-network-forecast-7th-january-2013.html

Note: The charts are now taking up so much room I'm no longer posting them here; you can see them in all their glory at the blog.

Weekly network forecast 7th January 2013

0.Introduction
If you're happy with < 10% error in your forecasts, this week's errors were acceptably useful. Since (barring the changes due to the reward halving) accuracy has been generally good,  it's possible to use the weekly hashrate forecasts to estimate the date of a retarget and a forecast estimate for the retarget difficulty.

For example based on this weeks model f1, f2, f3 and f4 weekly hashrate forecasts, we can forecast the retarget at block 217728 will be on 21st January and difficulty will change to ~ 3657934.

So this for this week I've included:
  • A table of current retarget date and difficulty estimates
  • A table of previous retarget date and difficulty estimates, and the actual retarget dates and difficulties as a comparison
  • A chart of the last twenty six weeks of retarget date and difficulty estimates, actual retarget dates and actual difficulties.

It should be noted that each retarget will often be forecast by two consecutive weekly forecasts, hence the multiple points per retarget on the new chart. Please post a comment if it's not clear.


1. Models and datasets:
The model datasets have been collected into one paste to save time. Model estimates have been likewise aggregated.

Forecast and canary model analysis
All datasets
All estimates
Difficulty data and estimates

2. Results
  • Canary model (current hashrate estimate based only on current price and previous network hashrates): This model's error has recovered to 4% of the actual weekly average network hashrate this week. The Canary model was outside the expected range for only 3 weeks after the reward halving, so it seems to remain a good indicator of the onset of changes other than the MTGOX BTCUSD price and previous network hashrate averages.
  • Model.f1 (one week forecast): Model.f1's error recovered to 5% of the weekly average network hashrate, almost as low an average error as in the weeks leading up to the reward halving. Hopefully the model will remain useful until the ASIC hashrates are added.
  • Models f2, f3 and f4 errors are high, and although within the estimated 95% confidence interval for the error are not yet useful for long range forecasts.
  • The large negative difficulty change after the block reward halving was (of course) not predicted and stands out as a clear error. However the new estimates look on target, and I think an estimate of Difficulty ~ 3.6 million after the next retarget is reasonable.









donator
Activity: 2058
Merit: 1007
Poor impulse control.
Regardless of whether or not ASICs drop and change the way modelling estimates are done, someone needs to be doing this so those of us both trading and mining can have some sort of foresight. This is impressive work, organofcorti. Thanks for doing this.

I'm glad you find it useful - I was thinking the simplicity and accuracy of the models were interesting enough to follow up with a weekly update, but I wasn't sure anyone would be able to use the data given the way the errors can increase as the length of the forecast increases.
 

donator
Activity: 1419
Merit: 1015
Regardless of whether or not ASICs drop and change the way modelling estimates are done, someone needs to be doing this so those of us both trading and mining can have some sort of foresight. This is impressive work, organofcorti. Thanks for doing this.
donator
Activity: 2058
Merit: 1007
Poor impulse control.
estimate without asic introduciton... ? right Smiley

Yes. As mentioned previously, the method breaks down when changes to the network are effected by causative factors other than price or the previous network hashrate. That's why the forecasts were so inaccurate after the reward halving. I'm surprised the forecasts are already becoming accurate again so soon.

This could help pinpoint when the first ASICs start hashing.

More details:

https://bitcointalksearch.org/topic/detecting-asics-and-other-network-hashrate-variables-127379
http://organofcorti.blogspot.com/2012/11/103-canaries-coal-mines-and-black-swans.html
420
hero member
Activity: 756
Merit: 500
estimate without asic introduciton... ? right Smiley
donator
Activity: 2058
Merit: 1007
Poor impulse control.
This week's update, report from the blog:

http://organofcorti.blogspot.com.au/2013/01/weekly-network-forecast-31st-december.html

Weekly network forecast 31st December 2012

0.Introduction
I apologise for the delay in posting this update. The real world intervened in a most pleasant manner.

I've added some more error information this week to enable an easier comparison between forecast and actual network hashrate. The forecast models are recovering already, with all predictions within the estimated error range. It seems the reward halving has only had a temporary effect on the forecast model accuracy so far.

1. Models and datasets:
The model datasets have been collected into one paste to save time. Model estimates have been likewise aggregated.

Forecast and canary model analysis
All datasets
All estimates

2. Results
  • Canary model (current hashrate estimate based only on current price and previous network hashrates): This model's error has recovered to 4% of the actual weekly average network hashrate this week. The Canary model was outside the expected range for only 3 weeks after the reward halving, so it seems to remain a good indicator of the onset of changes other than the MTGOX BTCUSD price and previous network hashrate averages.
  • Model.f1 (one week forecast): Model.f1's error recovered to 5% of the weekly average network hashrate, almost as low an average error as in the weeks leading up to the reward halving. Hopefully the model will remain useful until the ASIC hashrates are added.
  • Models f2, f3 and f4 errors are high, and although within the estimated 95% confidence interval for the error are not yet useful for long range forecasts.









Donations help give me the time to analyse bitcoin mining related issues and write these posts. If you enjoy or find them helpful, please consider a small bitcoin donation:
12QxPHEuxDrs7mCyGSx1iVSozTwtquDB3r

Thanks to the following for use of their data:
blockexplorer.com:  1Cvvr8AsCfbbVQ2xoWiFD1Gb2VRbGsEf28
blockchain.info
molecular: 1MoLECHau3nb9zV78uKraNdcZsJ2REwxuL
donator
Activity: 2058
Merit: 1007
Poor impulse control.
0.Introduction
It could be suggested that the reward halving continues to produce a significant effect outside of the direct USDBTC comparison, and certainly that's what I intuit. I suspect only a doubling of the MTGOX US$BTC price or a halving of the network hashrate will bring the model on track, but time will tell.


1. Models and datasets:
The model datasets have been collected into one paste to save time. Model estimates have been likewise aggregated.

Forecast and canary model analysis
All datasets
All estimates

2. Results
Model.f1, the one week forecast, recovered to within the 95% confidence interval range for last weeks's forecast of this week. I think this shows the importance of including the network hashrate lag variables. As somewhat expected the Canary model is still modelling a hashrate much higher than the current network hashrate, which is outside the 95% confidence interval for that model for the third week in a row.

http://organofcorti.blogspot.com/2012/12/weekly-network-forecast-24th-december.html




Pages:
Jump to: