1. Take a point in time over some very long time period.
2. For that point, take the time difference between the last and the next blocks created (unless the randomly selected point in time is exactly a block creation time, in which case pick again)
3. If you repeat 1 and 2 some large number of times and take the average time between blocks, it will be about 2 minutes.
This has to be wrong, so have I just totally misunderstood you?
You've understood correctly, but it isn't wrong.
For (2), if you happen to pick exactly a block creation time, use the time between that block and the one before it. No need to disregard your selection.
Let's make it even simpler:
1. Pick 10k random numbers between 0 and 10k. Sort them in order.
2. Pick random numbers in the same range. Find where they fit in the sorted list above. Find the size of the gap they fit into.
3. Average those gap sizes.
4. Get 2.00
Astonishing, isn't it? 10k points in a 10k range are an average of 1 unit apart from each other. But randomly pick points in that range and find that the average size of the gaps you land in is actually 2!
Some code:
When you pick a random point in time you are more likely to end up in the middle of a long block than a short one. (Think about it: There are blocks that are 1 second or less. How likely is a random point in time to end up in one of those?).
It turns out that the average block time you end up picking with this method is 2 minutes, even though the average length of all CLAM blocks is 1 minute.
I've given it some thought and I think that explanation is correct but it's not the entire answer.
If we assume you're picking one interblock duration from an infinite sequence, you're essentially picking a block weighted by the duration.
The probability density is of the interblock durations is exponential, ie
P(x) = lambda * exp(-lambda*x)
where x is the number of blocks and lambda is the block rate function.
So expectation is:
E(x) = integral_0_to_inf (x*P(x)) = 1/lambda
The weighted probability density is the above multiplied by x*lambda, so the weighted expectation is:
= integral_0_to_inf (x*lambda*x*P(x))
= integral_0_to_inf (x*lambda*x*lambda * exp(-lambda*x) = 2/lambda
which is what we see. Yay! Incidentally, if you take the averages generated by your script, you'll can check that the the histograms match the above probability density.
But what if we instead want to sample lots of blocks, maybe take a sample every n/lambda seconds? I don't have a derivation for it, but I wrote a nice simple simulator in R:
library(data.table)
library(ggplot2)
# n is number of blocks
# lambda is block rate in seconds (eg 1/60 for clams, 1/600 for bitcoin)
# sample at n*lambda seconds
test_fun1 <- function(n, lambda){
### sum(rexp(n*10, 1/lambda)) will usually be > n*lambda
testdata <- data.table(duration = rexp(n*10, 1/lambda), block_height = 1:(n*10))
testdata[, time := cumsum(duration)]
rndtime <- runif(1, 0, n* lambda)
duration <- testdata[which(time > rndtime)[1], duration]
### in case blocks don't cover the time range
if(length(duration) > 0) return(duration)
}
n <- seq(0.5, 50 , 0.5)
lambda <- 60
plotdata <- data.table(minutes = n*lambda/60, mean_duration = pbsapply(n, function(x) mean(replicate(10000, test_fun1(x, lambda = lambda)))))
ggplot(plotdata, aes(minutes, mean_duration)) + geom_point(alpha=0.25) + geom_smooth(formula= y ~ log(x)) + theme_bw()
For clams, this gives us the following plot:
So if you're doing a lot of samples, the expected sampled duration tends to the expected 1/lambda (60 seconds in this case). As you reduce the sample rate, it will approach 2/lambda in the way illustrated in the plot above.
Thanks for the puzzle, doog