What is the probability of a 40 min 6 block streak?

organofcorti

donator

Activity: 2058

Merit: 1007

Poor impulse control.

Quote from: spin on September 03, 2015, 06:55:47 AM

Quote from: organofcorti on August 27, 2015, 08:31:13 PM

As is noted in the thread, the timestamps are inaccurate. Trying to make timestamps accurate requires the assumption that blocks are generated a particular way - but this is what you're testing, so you can't do that.

I have a source for more accurate "timestamps" (actually the first time a block has been recorded by a well connected monitor), but this doesn't fix the problem.

The problem is that blocks appear wrt to time as a non-homogenous Poisson process rather than a homogenous (usual type) Poisson process. They are only a homogenous Poisson process with respect to hashes.

This is not usually an issue unless considered over many days, but if there are sudden changes in hashrate the block rate will be affected in a significantly non-homogenous way. For example, I've noticed that block durations aren't actually exponentially distributed even if you try to normalise the data to account for the non-homogenous nature of the process. I *think* this has something to do with miner hashrate changes at the start of a block, but it's hard to prove.

I don't doubt that some of what you see is the effect of the generation process being non-homogenous. However, it might be that in the relatively small sample you took which look non-Poisson might actually be ok. You could use R package dgof to do some discrete goodness of fit tests, or you could find the confidence intervals for the histogram bins and see if the bins are either under or overfilled, or within the expected range (if bins are the same size they should have a binomial distribution where p = 1/ number of bins)

Great post. I found that quite interesting.

What is your thinking on the hash rate at the "start of a block". Do you mean the "orphaned hash rate" due to miners working on the old headers before learning of a new block?

Is it homogenous w.r.t. the hashes? I'm assuming the hash rate is continuously changing?

I think yes to both questions, but that's opinion based on old data. It seemed to be the case before stratum, not sure if it's a significant effect now.I hope to get time to look at that again soon.

Quote from: spin on September 03, 2015, 06:55:47 AM

I also had a look at your site. Great work b.t.w. If you don't mind answering here, or is there a thread on your site, but your CI for the forecast appears narrower than the CI of the the hash rate estimate.

Thanks for the kind words

They're about the same, but one is offset wrt the other. It's annoying and I think it's because the forecast method makes assumptions about residuals for the forecast that is not the case. I'm not sure how to fix that.

spin

sr. member

Activity: 362

Merit: 264

Quote from: organofcorti on August 27, 2015, 08:31:13 PM

As is noted in the thread, the timestamps are inaccurate. Trying to make timestamps accurate requires the assumption that blocks are generated a particular way - but this is what you're testing, so you can't do that.

I have a source for more accurate "timestamps" (actually the first time a block has been recorded by a well connected monitor), but this doesn't fix the problem.

The problem is that blocks appear wrt to time as a non-homogenous Poisson process rather than a homogenous (usual type) Poisson process. They are only a homogenous Poisson process with respect to hashes.

This is not usually an issues unless considered over many days, but if there are sudden changes in hashrate the block rate will be affected in a significantly non-homogenous way. For example, I've noticed that block durations aren't actually exponentially distributed even if you try to normalise the data to account for the non-homogenous nature of the process. I *think* this has something to do with miner hashrate changes at the start of a block, but it's hard to prove.

I don't doubt that some of what you see is the effect of the generation process being non-homogenous. However, it might be that in the relatively small sample you took that although it looks non-Poisson, it might actually be ok. You could use R package dgof to do some discrete goodness of fit tests, or you could find the confidence intervals for the histogram bins and see if the bins are either under or overfilled, or within the expected range (if bins are the same size they should have a binomial distribution where p = 1/ number of bins)

Great post. I found that quite interesting.

What is your thinking on the hash rate at the "start of a block". Do you mean the "orphaned hash rate" due to miners working on the old headers before learning of a new block?

Is it homogenous w.r.t. the hashes? I'm assuming the hash rate is continuosly changing?

I also had a look at your site. Great work b.t.w. If you don't mind answering here, or is there a thread on your site, but your CI for the forecast appears narrower than the CI of the the hash rate estimate.

organofcorti

donator

Activity: 2058

Merit: 1007

Poor impulse control.

As is noted in the thread, the timestamps are inaccurate. Trying to make timestamps accurate requires the assumption that blocks are generated a particular way - but this is what you're testing, so you can't do that.

I have a source for more accurate "timestamps" (actually the first time a block has been recorded by a well connected monitor), but this doesn't fix the problem.

The problem is that blocks appear wrt to time as a non-homogenous Poisson process rather than a homogenous (usual type) Poisson process. They are only a homogenous Poisson process with respect to hashes.

This is not usually an issue unless considered over many days, but if there are sudden changes in hashrate the block rate will be affected in a significantly non-homogenous way. For example, I've noticed that block durations aren't actually exponentially distributed even if you try to normalise the data to account for the non-homogenous nature of the process. I *think* this has something to do with miner hashrate changes at the start of a block, but it's hard to prove.

I don't doubt that some of what you see is the effect of the generation process being non-homogenous. However, it might be that in the relatively small sample you took which look non-Poisson might actually be ok. You could use R package dgof to do some discrete goodness of fit tests, or you could find the confidence intervals for the histogram bins and see if the bins are either under or overfilled, or within the expected range (if bins are the same size they should have a binomial distribution where p = 1/ number of bins)

DannyHamilton

legendary

Activity: 3528

Merit: 4945

Quote from: ProfMac on August 26, 2015, 03:19:45 PM

I think that everyone will agree that two consecutive timestamps that show a negative interval have an incorrect timestamp somewhere.

The timestamps in the blocks are not intended to be completely accurate. I believe they can vary by plus or minus a few hours. I think I've read that some miners (and/or mining pools) will use the block timestamp as a an extra nonce so that they don't need to rebuild the merkle root as often. The timestamp is only intended to be used for calculating the new difficulty every 2016 blocks. A variation of 7200 seconds (2 hours) over the course of 2016 blocks works out to only about 3.6 seconds per block. That's relatively insignificant when compared to the natural variations that will occur due to the random nature of the proof-or-work process.

I'm not sure what you are investigating, or what you are trying to determine, but modifying unreliable data to make it fit some preconceived expectation is typically a bad idea.

ProfMac

legendary

Activity: 1246

Merit: 1002

I'm still looking at some of the blockchain timestamp data.

I typed in the block numbers, 370,944 to 371,087, and the block times from blockchain.info and saved it as a .csv file. This is the start of the current difficulty epoch continuing for approximately 24 hours. I can post the whole file somewhere if someone has a suggestion.

Code:

> temp[ c(1:5,140:144), ]
     block mon day year hr min sec
1   370944   8  22 2015  0  49  43
2   370945   8  22 2015  1   4  59
3   370946   8  22 2015  1  10  29
4   370947   8  22 2015  2   5   5
5   370948   8  22 2015  2  10  31
140 371083   8  22 2015 22   7  41
141 371084   8  22 2015 22  21  10
142 371085   8  22 2015 22  24  50
143 371086   8  22 2015 22  33   5
144 371087   8  22 2015 22  35   2

The blocks following these blocks show a negative time increment.
It might be interesting to see if these pairs of blocks are over represented by any particular miner. I don't know how to find who mined a particular block.

Code:

> blocktimes[ delta[] < 0, ]
     block mon day year hr min sec  time
7   370950   8  22 2015  2  38  33  9513
21  370964   8  22 2015  6  11  29 22289
34  370977   8  22 2015  7  18  53 26333
50  370993   8  22 2015  9  22  24 33744
114 371057   8  22 2015 19   7  28 68848
131 371074   8  22 2015 20  51  29 75089

I manipulated this data in R with commands similar to these. These are from notes made not exactly from the log file...
I don't know yet if it is algorithmically possible to "fit" a Poisson to the distribution data.

Code:

temp <-  read.csv("Documents/blockchain calctimes.csv", header=T)
blocktimes <- 60*(60*temp[,"hr"]+temp[,"min"])+temp[,"sec"]
delta <- blocktimes[ 2:144, "time" ] - blocktimes[ 1:143, "time"]
#note: min blocktimes is -711
png(filename="blockchain-poisson.png")
plot( tb1 <- table( cut( delta+711, seq(0, 3276, 300), right=FALSE)), ylim=c(0, 50))
n <- 9; x <- c( 0:n ); y <- dpois( x, 2.0 ); points( 2+x, 136*y, ylim=c(0, 0.5), col="red")
dev.off()

I wasn't able to link to the image. I put it on Google+ as https://plus.google.com/u/0/photos/115426745065196075335/albums/6187408748966855121/6187408753859123554

I think that everyone will agree that two consecutive timestamps that show a negative interval have an incorrect timestamp somewhere. I am pretty sure I can repair the data by modifying one of those two timestamps to give data that is much closer to a realistic Poisson distribution. I try to be very conservative when I repair data. I haven't explored that process yet.

deepceleron

legendary

Activity: 1512

Merit: 1036

Quote from: ProfMac on August 25, 2015, 11:44:02 AM

Is there a tool that can take a block number, and a block count, and return the number of minutes between successive blocks. While I can build this by hand from blockchain.info, there is a certain tediousness to it.

For example, I would like to start at block 370944, the beginning of the current epoch, and continue for some small number, perhaps 18 or 24.

Each bitcoin block has a timestamp, but it is added by the miner (by the local time on the bitcoind machine or on the pool server) when the block was generated to be hashed. There are many blocks that have a negative timestamp offset compared with the previous block due to differences in computer clocks.

It may be more reliable to have a listening node monitor the time that new blocks are published on the network (which propagate everywhere within seconds) if you are not doing historical analysis.

In Bitcoin Core, you can get the timestamps out of your local blockchain, but it requires chaining two RPC commands: one to get the block hash, and one to dump that block using the hash.

Here's a post I wrote describing a script to do this. Replace "bitcoind" with "bitcoin-cli" when using the latest Bitcoin software.

Quote from: deepceleron on September 30, 2014, 09:17:13 PM

Here's a PM I wrote someone else with the details of wot to do.

Quote from: deepceleron

I have a CSV of block times: https://bitcointalksearch.org/topic/m.1453722

I dumped them on Windows with this "dumptime.cmd" in the bitcoind directory (and then added some more spreadsheet columns to make epoch time readable time), here it dumps times from block 50000-99999:

Code:

@echo off 
setlocal enableextensions 
set /a height=50000
rem echo --start > timeout.txt
:beg
for /f "tokens=* delims=:" %%a in ( 
'bitcoind getblockhash %height%' 
) do ( 
set hash=%%a 
) 

for /f "tokens=*" %%a in ( 
'bitcoind getblock %hash% ^| find "time"'
) do ( 
set blktim=%%a 
) 
echo %height%: %blktim%
echo %height%: %blktim% >> timeout.txt

set /a height = height + 1
IF %height% LEQ 99999 goto beg

endlocal

Code:

blocknum,epochtime,blocksec,datetime
0,1231006505,0,2009-01-03T18:15:05Z
1,1231469665,0,2009-01-09T02:54:25Z
2,1231469744,79,2009-01-09T02:55:44Z
3,1231470173,429,2009-01-09T03:02:53Z
4,1231470988,815,2009-01-09T03:16:28Z
5,1231471428,440,2009-01-09T03:23:48Z
6,1231471789,361,2009-01-09T03:29:49Z
7,1231472369,580,2009-01-09T03:39:29Z
8,1231472743,374,2009-01-09T03:45:43Z
9,1231473279,536,2009-01-09T03:54:39Z
10,1231473952,673,2009-01-09T04:05:52Z
11,1231474360,408,2009-01-09T04:12:40Z
12,1231474888,528,2009-01-09T04:21:28Z
13,1231475020,132,2009-01-09T04:23:40Z
14,1231475589,569,2009-01-09T04:33:09Z

ProfMac

legendary

Activity: 1246

Merit: 1002

Is there a tool that can take a block number, and a block count, and return the number of minutes between successive blocks. While I can build this by hand from blockchain.info, there is a certain tediousness to it.

For example, I would like to start at block 370944, the beginning of the current epoch, and continue for some small number, perhaps 18 or 24.

grau

hero member

Activity: 836

Merit: 1030

bits of proof

I think a more interesting question is, how big is the probability of not finding a block within time period in a mining pool of x% market share.

Knowing this enables you to audit pools.

Example:

Slush could not to mine a single block for more than 2 days between 19 and 21. Jun 2015, see
https://mining.bitcoin.cz/stats/blocks/?page=23

Slush' market share was 2.2% around that time, see http://organofcorti.blogspot.hu/2015/06/june-14th-2015-block-maker-statistics.html

Such a bad luck has a probability of only 0.17%

grau

hero member

Activity: 836

Merit: 1030

bits of proof

Quote from: ?? on ??

Quote from: grau on August 24, 2015, 04:08:38 PM

- snip -
the probability of 5 blocks within next 40 mins is 15%.

Do I understand it correctly then if I say that the probability of 5 or more blocks within the next 40 minutes (and therefore within the 40 minutes immediately following the broadcast of a block) is 26%?

The probability of 5 or more blocks within the next 40 minutes is 1 - N[CDF[PoissonDistribution[4], 4]] = 37%, I made a correction in my first reply, see striketrhough.

Since events are independent, it does not matter if you are just after a block or not.

Probability of 5 or more blocks includes probabilty of 5, probability of 6 ....

The way you deal with this is using cummulative probability function as I enumerated in the previous post. Using that table you can compute the probability of any block number range withing 3 hours.

Examples:

The probability of less or equal 15 blocks is 28.6 %
The probability of more than 20 blocks in 3 hours is 1-0.73072 = 26.9%
The probability of 16-20 blocks is 0.73072 - 0.286653 = 44.4%

grau

hero member

Activity: 836

Merit: 1030

bits of proof

Quote from: ProfMac on August 24, 2015, 07:00:49 PM

Based on what I have seen, however, I wish I had another plot, this time the Poisson distribution as you have presented it, only for lambda = 18 (3 hour expected network block production), and running out to k = 36.

While I am wishing, I also want a numeric table of the cumulative distribution function for the same distribution.

here you are:

1. probability of exactly n block within 3 hours:

2. cummulative numeric, that is probability to have <= n blocks within 3 hours:

Code:

{
 {0, 1.523E-8},
 {1, 2.8937E-7},
 {2, 2.75663E-6},
 {3, 0.0000175602},
 {4, 0.0000841761},
 {5, 0.000323993},
 {6, 0.00104345},
 {7, 0.00289347},
 {8, 0.00705601},
 {9, 0.0153811},
 {10, 0.0303663},
 {11, 0.0548874},
 {12, 0.0916692},
 {13, 0.142598},
 {14, 0.208077},
 {15, 0.286653},
 {16, 0.37505},
 {17, 0.468648},
 {18, 0.562245},
 {19, 0.650916},
 {20, 0.73072},
 {21, 0.799124},
 {22, 0.85509},
 {23, 0.89889},
 {24, 0.93174},
 {25, 0.955392},
 {26, 0.971766},
 {27, 0.982682},
 {28, 0.9897},
 {29, 0.994056},
 {30, 0.996669},
 {31, 0.998187},
 {32, 0.99904},
 {33, 0.999506},
 {34, 0.999752},
 {35, 0.999879},
 {36, 0.999942},
 {37, 0.999973},
 {38, 0.999988}
}

johnnewbery

newbie

Activity: 4

Merit: 1

Small pedantic point to add here, in that the question in the OP and the thread title are slightly different:

OP: What is the probability that 6 blocks will be found in 40 minutes? - Poisson distribution is appropriate because block discoveries are independent events.

Thread title: Re: What is the probability of a 40 min 6 block streak? - I assume the word streak here means chain? 6 independently discovered blocks do not necessarily form a block-chain of height 6 due to orphaned blocks.

So to answer the thread title: the probability of a 6 block chain in 40 minutes is slightly less than (1 - N[CDF[PoissonDistribution[4], 6]]) because one of those discovered blocks may become orphaned (due to block propagation speeds or other reasons).

ProfMac

legendary

Activity: 1246

Merit: 1002

Thanks! I have read some of the WiKi page till my head got full. I'll go read more later.

I don't have access to Mathematica, and the documentation in R is a bit more than I want to tackle in the next short while.

Based on what I have seen, however, I wish I had another plot, this time the Poisson distribution as you have presented it, only for lambda = 18 (3 hour expected network block production), and running out to k = 36.

While I am wishing, I also want a numeric table of the cumulative distribution function for the same distribution.

If anyone has the R to do this, I would be really appreciative to see it. While I myself can code this correctly in R from the WiKi definitions, it will take some time.

grau

hero member

Activity: 836

Merit: 1030

bits of proof

Yes, as you see on the plot I added in parallel to my first reply, the probability of 5 blocks within next 40 mins is 15%.

DannyHamilton

legendary

Activity: 3528

Merit: 4945

Of course, it is important to note that the numbers reported by grau assume that an arbitrary 40 minute period is chosen at random without selection bias.

If on the other hand you start with an already solved block and ask what the probability is that another 5 blocks will be solved within the 40 minutes immediately after seeing the first solved block, I believe the probability is higher. In that case you are essentially asking what the odds are that 5 (or more) blocks will be solved in the next 39 minutes and 59.999... seconds.

grau

hero member

Activity: 836

Merit: 1030

bits of proof

In probability theory and statistics, the Poisson distribution (French pronunciation [pwasɔ̃]; in English usually /ˈpwɑːsɒn/), named after French mathematician Siméon Denis Poisson, is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since the last event.

From: https://en.wikipedia.org/wiki/Poisson_distribution

Mathematica says that: N[PDF[PoissonDistribution[4], 6]] is 10.4 % that is the prpbability of exactly 6 blocks per 40 min.
The probability of 6 or more blocks in 40 minutes is: ~~1 - N[CDF[PoissonDistribution[4], 6]] or 11%~~ 1 - N[CDF[PoissonDistribution[4], 5]] or 21%

Below the plot of probabilty of n blocks per 40 min:

ProfMac

legendary

Activity: 1246

Merit: 1002

When the difficulty and network hash rate are in sync, what is the probability that 6 blocks will be found in 40 minutes?

Topic: What is the probability of a 40 min 6 block streak? (Read 2354 times)