All the Bitcoin protocol cares about when re-targeting the difficulty is how close we were to 10 minutes per block since the last difficulty change. Anything before that is really irrelevant.
Irrelevant at the most technical level, sure. But irrelevant on a fundamental level? Hardly.
I still disagree. If you include anything prior to the last difficulty change, your estimate is going to be off because it includes data that the Bitcoin protocol doesn't care about.
Your statement about hashrate determines average block time is sort of correct. Luck is a factor as well, which is why all pools can suddenly stop finding blocks for hours on end. Granted that shouldn't mean too much at the end of the difficulty period change, but it does influence it. And, as I believe you stated as well, at the end of the day what's used to determine the next difficulty is how long it took to solve X number of blocks since the last change compared to 1 per 10 minutes.
Let's say the difficulty has just changed 10 blocks ago.... less than 2 hours. Your app can rely on either the last 9 blocks, or you can rely on a larger dataset telling you that the network hashrate has averaged 280,000-290,000 TH/s during the last month. Which do you think is going to give you a better estimate? If your answer is just "there will be wild swings during the first few blocks after a difficulty re-target" then your app should simply not provide an estimate during that time rather than look like a fool giving an answer based on a tiny sample size. What if the first block is found 10 seconds after the difficulty adjustment, and your app shows a 5000% projected increase? You've got to at least have some sanity checks in there.
I will agree with you there about sample size. I should put an indicator in there about the difficulty changing within the last day, or something, so therefore the sample size isn't large enough to produce a good indication of next change.
Regards,
M
mdude77, you're correct in that data from before the last retarget isn't used by the protocol for the current retarget. However, rates of change in network hashrate can be fairly constant for weeks or even months, and weekly average hashrates can be quite predictable.
When you're predicting a new difficulty at the next retarget, you are in effect predicting the average hashrate between retargets. Since the rate of change in the network hashrate changes very slowly, hashrates from weeks previous to the last retarget. The trick is in using a forecasting methodology that will not overfit and won't create massive forest confidence intervals.
Personally, I use exponential smoothing to forecast the change in hashrate until the 2016th block since the last retarget, based on the last month or so. I has been fairly reliable, although I'm sure there are better methods.