Neighbourhood Pool Watch - page 10. | Bitcointalksearch.org

organofcorti

donator

Activity: 2058

Merit: 1007

Poor impulse control.

Neighbourhood Pool Watch 4.1 Slush's pool is posted.

Quote

6. Conclusions

Slush has a good reputation for honesty, openness, and responsiveness to his miners enquiries and preferences.
Slush's pool is generally trusted by the community.
The current payout method, although better than proportional payouts, is flawed and increases variance for miners and doesn't eliminate the possibility of strategic miners reducing the payout of full time miners.
Once the payout method is changed to Meni Rosenfeld's DGM, full time miners will not lose earnings to strategic miners. Variance in earnings can be controlled by this method, and can be minimised.

Also, the update to Neighbourhood Pool Watch 3.2 The Risks of SMPPS is also posted. Thanks Meni for your help and advice on correcting it.

Quote

5. Conclusions
......

Average maturity time is approximately sqrt(n/(2*pi)). Arsbitcoin.com's maturity time of approximately 1000 btc at 1700 rounds is not much more than the average at 1700 rounds, 822 btc.
A reduction in pool hashrate is never of benefit to miners.

.....[/list]

Please post any questions or comments you have either here or on the blog.

Meni Rosenfeld

donator

Activity: 2058

Merit: 1054

Quote from: organofcorti on April 13, 2012, 09:29:06 AM

I'd not thought of looking at the expectation of the average maturity time over N. Does it make sense to graph it as a function of N? I want to use whichever measure will be the most intuitive to understand.

Yes. I guess "average maturity time at round N" will be simpler.

organofcorti

donator

Activity: 2058

Merit: 1007

Poor impulse control.

Quote from: Meni Rosenfeld on April 12, 2012, 03:55:48 AM

Finding the expected maturity time at round N is easy enough - as you say with a normal approximation (equivalent to a Brownian motion model) it's simply sqrt (N/(2Pi)), and the graphs match that. I thought it would be interesting to look at the expectation of the average maturity time in rounds 1 to N, but that's harder so the former would suffice.

(Edit: Actually, since expectation is linear, the expected average over 1-N is just the average of the expectation over 1-N, which is roughly 2/3 of the expectation at N. The pmf, though, is probably difficult.)

I'd not thought of looking at the expectation of the average maturity time over N. Does it make sense to graph it as a function of N? I want to use whichever measure will be the most intuitive to understand.

Quote

I think the explanation could use some more detail.

Yes, indeed.

organofcorti

donator

Activity: 2058

Merit: 1007

Poor impulse control.

Well, as Luke-jr (the inventor of the SMPPS payout method) would say:

Quote

That was never promised to become Bitcoins or retain any value.

But I'm sorry you had to lose coins to find that out. Even besides the SMPPS payout, the poor pool shutdown has affected many miners. Found a new place to mine yet?

finway

hero member

Activity: 714

Merit: 500

Yeah, i still have 2 bitcoins unpaid on arsbitcoin, it seems i'll never get it back.

Meni Rosenfeld

donator

Activity: 2058

Merit: 1054

Finding the expected maturity time at round N is easy enough - as you say with a normal approximation (equivalent to a Brownian motion model) it's simply sqrt (N/(2Pi)), and the graphs match that. I thought it would be interesting to look at the expectation of the average maturity time in rounds 1 to N, but that's harder so the former would suffice.

(Edit: Actually, since expectation is linear, the expected average over 1-N is just the average of the expectation over 1-N, which is roughly 2/3 of the expectation at N. The pmf, though, is probably difficult.)

I think the explanation could use some more detail.

organofcorti

donator

Activity: 2058

Merit: 1007

Poor impulse control.

Quote from: Meni Rosenfeld on April 11, 2012, 02:38:41 AM

I guess. Still doesn't seem like a very intuitive quantity to look at.

Well that's clear. It took a thread page to explain it, which you ended up doing anyway. No, I don't think it's a useful quantity to help explain anything to a miner who might use SMPPS. I do think mean maturity time will be much more useful.

Quote from: Meni Rosenfeld on April 11, 2012, 02:38:41 AM

The pmf of this average can fairly easily be simulated, and is more useful than the current graphs and IMO can replace them (as long as it is very clear what they represent). I can't think of a closed form for the pmf, but maybe it can be approximated by modeling it as Brownian motion.

I got as far as Donskers theorem before I realised that the mean negative round time could be calculated using a shifted negative binomial distribution with mean = 0, if I could find some way of calculating the mean of the negative buffer half of the distribution.

Since the negative binomial distribution approaches a normal distribution for large , I modelled it using a half normal distribution for which a method of caluculating the mean was already available. Since the positive buffer half of the normal distribution has weight zero when calculating the mean maturity time, the mean calculated using the half normal distribution method is halved.

The results from n = 6 to n= 10000 are within +/- 2% compared to several repeat simulations I ran, so I'm happy with the accuracy of the half normal distribution method and I'd like to use that rather than present a simulation since if readers want to they can follow the logic which is harder with a simulation written in code they won't know.

What I'd like to know is - did I explain that clearly enough? Is it correct? I'm not sure about the best way to explain the weighted means [Edit wrt probability distributions], and I 'll have to go into a bit more detail, but I did want your optinion before I went ahead and replaced that section. Also, does the graph (below) of mean maturity time up to 1000 rounds seem likely to you? If it is, then 1000 rounds into an SMPPS pools life you can expect a mean of about 12 rounds wait to be paid your average earnings.

Meni Rosenfeld

donator

Activity: 2058

Merit: 1054

Quote from: organofcorti on April 11, 2012, 02:04:20 AM

I thought number 2 would be of interest because knowing that for example one can expect 300 rounds out of the first 1000 to have a negative buffer of 10 rounds or worse, a miner might not decide to mine at an SMPPS pool, since for 30% of the time the average maturity time will be of the order of the amount of time it takes to solve ten blocks or longer.

I guess. Still doesn't seem like a very intuitive quantity to look at.

Quote from: organofcorti on April 11, 2012, 02:04:20 AM

If I haven't made a wrong assumption there in 2, it also occurs to me that there will be an expected and non-zero maturity time, since an increasing positive buffer doesn't reduce the maturity time as an increasing negative buffer increases it.

Right, the maturity time for future rounds is nonnegative and can be positive, so it has a positive expectation. And that - the average maturity time over the next N rounds - may be of interest to some people.

For an optimal miner it is inconsequential, because he can just leave when the buffer is low. But for a miner that wants to sign up for an extended period of time, it is a useful quantity that can be compared to the maturity time of other pools. The pmf of this average can fairly easily be simulated, and is more useful than the current graphs and IMO can replace them (as long as it is very clear what they represent). I can't think of a closed form for the pmf, but maybe it can be approximated by modeling it as Brownian motion.

organofcorti

donator

Activity: 2058

Merit: 1007

Poor impulse control.

Quote from: Meni Rosenfeld on April 10, 2012, 04:35:15 AM

Quote from: organofcorti on April 09, 2012, 11:01:40 PM

Maybe this makes clear what I'm recording in the simulation (if not the why to which I'll give some more thought).

The what has been clear for some time.

As for the why - you only need to provide a clear, verbal, unambiguous description of what you are trying to find. In the second part that is fairly clear and as far as I can tell you have a simulation that serves the purpose (you get the cmf of M, from which you can obtain the pmf if you want). In the first part it looks like you just took a bunch of numbers and mashed them together for no reason. The solution as far as I can see is to just remove the first part.

EDIT: It's actually easy to give a verbal description of the first graphs. The one with the cusp is a graph of "the average % of rounds that end with a balance of exactly X", and the accumulated area under it is "the average % of rounds that end with a balance of at most X". (eg, if there are 1000 rounds and 300 of them end with a balance less than -1000 BTC, the value of the variable will be 0.3 - and the graph lists the average of this over all instantiations). Note that as described these are not a pmf and cmf, though they can be looked at as the pmf of cmf of the mixture variable earlier discussed. And again, what you're looking for is not the average % of rounds that end with at most X, but rather the probability that at least one round ends with at most X.

Exactly! I was glad of that edit. I was about to post a description of the same using a random walk, but you've done a much simpler and clearer job. So, it just remains for me to try to explain why I thought "the average % of rounds that end with a balance of exactly X" or "the average % of rounds that end with a balance of at most X" is useful to know.

As you've shown, a negative buffer at an SMPPS pool will contribute to the a delay in the amount of time (in terms of blocks solved) that it takes for a miner to be paid. So the following two results will be of interest:
1. The maximum a negative buffer can be before n rounds.
2. The percentage of n rounds that one can expect the buffer to be any arbitrary amount (or worse/better).

Number 1 is of interest because a significantly negative buffer could delay payments long enough to kill a pool.

I thought number 2 would be of interest because knowing that for example one can expect 300 rounds out of the first 1000 to have a negative buffer of 10 rounds or worse, a miner might not decide to mine at an SMPPS pool, since for 30% of the time the average maturity time will be of the order of the amount of time it takes to solve ten blocks or longer.

If I haven't made a wrong assumption there in 2, it also occurs to me that there will be an expected and non-zero maturity time, since an increasing positive buffer doesn't reduce the maturity time as an increasing negative buffer increases it.

Meni Rosenfeld

donator

Activity: 2058

Merit: 1054

Quote from: organofcorti on April 09, 2012, 11:01:40 PM

Maybe this makes clear what I'm recording in the simulation (if not the why to which I'll give some more thought).

The what has been clear for some time.

As for the why - you only need to provide a clear, verbal, unambiguous description of what you are trying to find. In the second part that is fairly clear and as far as I can tell you have a simulation that serves the purpose (you get the cmf of M, from which you can obtain the pmf if you want). In the first part it looks like you just took a bunch of numbers and mashed them together for no reason. The solution as far as I can see is to just remove the first part.

EDIT: It's actually easy to give a verbal description of the first graphs. The one with the cusp is a graph of "the average % of rounds that end with a balance of exactly X", and the accumulated area under it is "the average % of rounds that end with a balance of at most X". (eg, if there are 1000 rounds and 300 of them end with a balance less than -1000 BTC, the value of the variable will be 0.3 - and the graph lists the average of this over all instantiations). Note that as described these are not a pmf and cmf, though they can be looked at as the pmf of cmf of the mixture variable earlier discussed. And again, what you're looking for is not the average % of rounds that end with at most X, but rather the probability that at least one round ends with at most X.

Quote from: organofcorti on April 09, 2012, 11:01:40 PM

I've noticed that for larger probabilities the simulation always diverges slightly from the limit theorem result. If you repeated the simulation with fewer rounds or for a less probable -50, the simulation results will agree with the limit theorem error function much more closely. Cool, huh?

If I had to take a wild guess, I'd say it's because the tail of the exponential distribution (corresponding to bad luck) is more similar to normal than the head. If that's true, your observation will be different if you take the negative of each shifted geometric variable (so effectively, instead of looking at hitting a low value, you're looking at the probability to hit a high value).

organofcorti

donator

Activity: 2058

Merit: 1007

Poor impulse control.

Quote from: Meni Rosenfeld on April 09, 2012, 10:22:03 AM

You should probably think about it some more.

Yes, I'm going to try to explain what I mean in terms of a random walk, I think. But right now I've just got a flashing cursor in my head going "thinking.......thinking.....thinking....". I touch on it briefly below in terms of simulation though.

Quote from: Meni Rosenfeld on April 09, 2012, 10:22:03 AM

It's just the tallying of the empirical pmf. If I wanted to simulate the probabilities of the results 1-6 on a die roll, I'd have variables num1,... , num6, and I'd add +1 to num1 whenever I roll the die and get a 1. Likewise, you need to tally the number of times each value of M appears. Unless you have a different approach to this?

After looking at your simulation below, I'm doing much the same thing except in R. In the limit theorem part, the simulation does exactly the same thing, except instead of measuring a specific amount I plot the overall density.

However for the first part, with the cuspy sGED modelable density, I'm not taking a minimum of the cusum of 1000 rounds but collecting all the cusums. I end up with (in this case, and with the number of iterations you use below) a 1000*1000000 matrix of variables. I then plot the density of the variables.

So to emulate this on Mathematica, you would:
1. create a 1000 col *1000000 row array or matrix, name it MAT
2. put Accumulate[RandomReal[ExponentialDistribution[1], {1000}]] - 1] in the first row
3. Iterate step 2 for rows 2 to 1000000
4. not sure this is correct, but try SmoothHistogram(MAT, Automatic, "PDF") and SmoothHistogram(MAT, Automatic, "CDF"). You might need to convert the MAT to a vector first.
5. Repeat for 10, 50, or 100 rounds on a 10 or 50 or 100 column by 1000000 row array for the histogram at a varying number of rounds.

Maybe this makes clear what I'm recording in the simulation (if not the why to which I'll give some more thought).

Quote from: Meni Rosenfeld on April 09, 2012, 10:22:03 AM

Maybe a better estimation can be found with a simulation after all, with 100000 iterations of 1000 rounds I get about 50.8% chance of going below -1000 BTC. My code for this is:

Code:

M[] := Min[Accumulate[1 - RandomReal[ExponentialDistribution[1], {1000}]]];
N[Mean[Table[If[M[] <= -20, 1, 0], {1000000}]]]

Your results are very similar to the ones I get in simulation. I've noticed that for larger probabilities the simulation always diverges slightly from the limit theorem result. If you repeated the simulation with fewer rounds or for a less probable -50, the simulation results will agree with the limit theorem error function much more closely. Cool, huh?

Meni Rosenfeld

donator

Activity: 2058

Merit: 1054

Quote from: organofcorti on April 09, 2012, 10:03:02 AM

Quote from: Meni Rosenfeld on April 09, 2012, 08:36:47 AM

@organ: I don't really speak R, but as far as I can see, your simulation matches the mean pmf because you simulated the mixture variable I talked about rather than your target quantity. That's also evident in your verbal description of the simulation. But I think I finally understand the source of your confusion.

The variable we are interested in is M="the minimum balance over the next n rounds". Its cmf at point X is the probability that the minimum is at most X, which means that the balance reaches X or lower at some point. So once we find the pmf of M, we can calculate the chance of hitting any X by the area under the curve.

"the minimum balance over the next n rounds" is what I derived in the second part of the section using Erdos and Kac's probability limit theorem I.

What I wanted to do as a first step was find the probability of any given balance or less before n rounds (after finding the area under the density curve), so that after the second derivation I could compare it to the probability of a given minimum occurring. Is this the mixture variable?

You should probably think about it some more. "Reaching a balance of X or less at any point" is equivalent to "the minimum balance is at most X", which is the cmf of M. If you find the pmf of M, you can use it to find the cmf. In any case I'm talking about balance rather than debt, but that's only a matter of flipping the sign.

Quote from: organofcorti on April 09, 2012, 10:03:02 AM

where I've made the share-debit a (non intuitively) positive number. I'm subtracting 1/p (or D) each round. What did you mean by the +1 or +1/n? I think there's a step I'm missing.

It's just the tallying of the empirical pmf. If I wanted to simulate the probabilities of the results 1-6 on a die roll, I'd have variables num1,... , num6, and I'd add +1 to num1 whenever I roll the die and get a 1. Likewise, you need to tally the number of times each value of M appears. Unless you have a different approach to this?

Quote from: organofcorti on April 09, 2012, 10:03:02 AM

Using the probability limits theorem, I find the probability of going below -1000 BTC in 1000 rounds as 52.71%, and not until 1000 BTC at 1050 rounds did it reach 50.66%. Do you think this an acceptable variation given the different methods used, or a problem for me?

It's within reason, the model solved with this recursion is not very accurate (and I'm guessing the application of the general theorem also has some error). Maybe a better estimation can be found with a simulation after all, with 100000 iterations of 1000 rounds I get about 50.8% chance of going below -1000 BTC. My code for this is:

Code:

M[] := Min[Accumulate[1 - RandomReal[ExponentialDistribution[1], {1000}]]];
N[Mean[Table[If[M[] <= -20, 1, 0], {1000000}]]]

organofcorti

donator

Activity: 2058

Merit: 1007

Poor impulse control.

I'm going to make this quick because I'm being called to bed but you got me thinking and I wanted to get a post out in reply, so sorry if my response seems terse and not well thought out.

Quote from: Meni Rosenfeld on April 09, 2012, 08:36:47 AM

@organ: I don't really speak R, but as far as I can see, your simulation matches the mean pmf because you simulated the mixture variable I talked about rather than your target quantity. That's also evident in your verbal description of the simulation. But I think I finally understand the source of your confusion.

The variable we are interested in is M="the minimum balance over the next n rounds". Its cmf at point X is the probability that the minimum is at most X, which means that the balance reaches X or lower at some point. So once we find the pmf of M, we can calculate the chance of hitting any X by the area under the curve.

"the minimum balance over the next n rounds" is what I derived in the second part of the section using Erdos and Kac's probability limit theorem I.

What I wanted to do as a first step was find the probability of any given balance or less before n rounds (after finding the area under the density curve), so that after the second derivation I could compare it to the probability of a given minimum occurring. Is this the mixture variable?

Quote

To simulate the pmf, we need of course to make many instantiations of the random process, and add +1 to the empirical weight of the resulting value of M. What is the value of M? The minimum among all the values of the (shifted) cumulative sum. But instead of adding +1 to the minimum, it seems you added +1 (or +1/n after normalization) to all the values in the cumulative sum. If this is fixed you'll get a proper simulation.

I'm not following this bit, even accounting for the aim of finding the minimum. I'm thinking in terms of the cumulative sums being:

[(Geom var1 - 1/p)], [(Geom var1 - 1/p) + (Geom var2 - 1/p)], [(Geom var1 - 1/p) + (Geom var2 - 1/p)+ (Geom var2 - 1/p)] , .......

where I've made the share-debit a (non intuitively) positive number. I'm subtracting 1/p (or D) each round. What did you mean by the +1 or +1/n? I think there's a step I'm missing.

Quote

Instead of a generative simulation, you can also find the cmf by approximating the process as a simple random walk, and solving (numerically or, if possible, symbolically) the following recursion (in Mathematica syntax):

Code:

f[a_, n_] := f[a, n] = If[a < 0, 1, If[a >= n, 0, 0.5 (f[a + 1, n - 1] + f[a - 1, n - 1])]]

Using this, I find that f[20, 1000], the probability of going below -1000 BTC (20 blocks worth) in 1000 rounds, is approximately 50.67%.

I'm going to have to spend a bit of time on this tomorrow since I only know enough mathematica to get by on wolframaplha when natural language doesn't work, but thanks for giving me something new to work with which looks to be much quicker than my current method.

Using the probability limits theorem, I find the probability of going below -1000 BTC in 1000 rounds as 52.71%, and not until 1000 BTC at 1050 rounds did it reach 50.66%. Do you think this an acceptable variation given the different methods used, or a problem for me?

Cheers again.

Meni Rosenfeld

donator

Activity: 2058

Merit: 1054

@organ: I don't really speak R, but as far as I can see, your simulation matches the mean pmf because you simulated the mixture variable I talked about rather than your target quantity. That's also evident in your verbal description of the simulation. But I think I finally understand the source of your confusion.

The variable we are interested in is M="the minimum balance over the next n rounds". Its cmf at point X is the probability that the minimum is at most X, which means that the balance reaches X or lower at some point. So once we find the pmf of M, we can calculate the chance of hitting any X by the area under the curve.

To simulate the pmf, we need of course to make many instantiations of the random process, and add +1 to the empirical weight of the resulting value of M. What is the value of M? The minimum among all the values of the (shifted) cumulative sum. But instead of adding +1 to the minimum, it seems you added +1 (or +1/n after normalization) to all the values in the cumulative sum. If this is fixed you'll get a proper simulation.

Instead of a generative simulation, you can also find the cmf by approximating the process as a simple random walk, and solving (numerically or, if possible, symbolically) the following recursion (in Mathematica syntax):

Code:

f[a_, n_] := f[a, n] = If[a < 0, 1, If[a >= n, 0, 0.5 (f[a + 1, n - 1] + f[a - 1, n - 1])]]

Using this, I find that f[20, 1000], the probability of going below -1000 BTC (20 blocks worth) in 1000 rounds, is approximately 50.67%.

organofcorti

donator

Activity: 2058

Merit: 1007

Poor impulse control.

Quote from: Meni Rosenfeld on April 07, 2012, 04:14:56 PM

Quote from: organofcorti on April 06, 2012, 06:57:10 AM

The set (S1, S2, S3, ...,Sn) is the cumulative sum of the geometrically distributed variables.The probability that X is to be found somewhere in that set is given by the sum of all probabilities that X = some S divided by n:

Pr(X∈(S1, S2, S3, … ,Sn))
= sum from 1 to n [Γ(X+n)/(Γ(n) * Γ(X+1)) * p^n * (1-p)^X]/n (1)

Here is where things start to get horribly, horribly wrong. Both in what you're trying to get, and how you get it.

First, I'll say again: If what you want is the balance after n rounds (which is what the text implies), it is simply (n - Sn/D)*B. That's it. Sn is the number of shares the pool had to pay. Any other derivation is nonsense. So I can only assume you really are trying to figure out the probability that a balance of X will be reached at some point within the first n rounds. Even if this is the case, the above quantity is misguided.

Let's begin with this formula. It is absolutely, and obviously, wrong.

The event X∈(S1, S2, S3, … ,Sn) contains the even X=Sj for any j. Therefore its probability must be at least the probability that X=Sj, which is the pmf of Sj at point X. So the probability must be greater than all the pmfs, so it can't be equal to their average.

The one place where the average of pmfs is meaningful is when you have a mixture variable. For example, if you have a variable Sr which is generated by first choosing a uniform random integer j between 1 and n and then setting it equal to Sj, its pmf will be given by this formula. But such a variable has no relevance whatsoever to the problem at hand.

If you do want to find the probability that X will be in the set, you need to treat this as a union (logic or) of the equalities X=Sj, which can be expanded with the inclusion-exclusion principle. I don't know if the expansion has a nice closed form in this case, I think it's extremely messy because of the dependencies between the Sj's.

Quote

I'm not following your train of thought because you're not explaining what, why and how you are trying to get.

Yes, my aim was to find the probability that a cumulative sum or balance of X or lower will be reached before n rounds. Firstly to define a pdf, compare with simulated results, and then sum the pdf to a cdf.

Quote

So now maybe you can understand why the quantity Pr(X∈(S1, S2, S3, … ,Sn)) is not meaningful. It is not a pmf of anything. The total area under this curve is more than 1, and you can't take areas under the curve for any insight. The events 0∈(S1, S2, S3, … ,Sn) and 1∈(S1, S2, S3, … ,Sn) are not mutually exclusive, so you can't disjunct them by simply adding the probabilities.

After more reading (and learning) I have to say I agree with you. So I really don't know how the simulation matches the 'mean negative binomial pdf' (which you show isn't meaningful). An error in concept maybe? I wrote a simpler simulation with no special libraries required, commented, doesn't make shortcuts and should be possible to follow if you have time.

The sim works as follows: Take n geometrically distributed random variables with the same p, subtracting 1/p from each variable so the expectation of each variable is 0. The take the cumulative sum of the variables. For example let p = 10^-6 and n = 10:

Code:

Geometrically distributed random variables:
[1] 1182417 3279310 482644 550187 404841 183692 1106328 1663533 1716093 5213483

Geometrically distributed random variables minus 10^6
[1] 182417 2279310 -517356 -449813 -595159 -816308 106328 663533 716093 4213483

Cumulative sum of (geometrically distributed random variables minus 10^6)
[1] 182417 2461727 1944371 1494558 899399 83091 189419 852952 1569045 5782528

Now make m iterations of the cumulative sums to n and plot the histogram or sample density for the entire m*n samples. For example, if m=10, you'd plot the density or histogram of:

Code:

m=1 m=2 m=3 m=4 m=5 m=6 m=7 m=8 m=9 m=10
n=1 182417 -567342 -844019 -992762 -965404 -621776 -163689 34795 -790737 219609
n=2 2461727 -1328229 -1116118 -1352713 -1693465 -574916 -744127 -769836 -966007 -259975
n=3 1944371 -370917 -1951890 -1044311 -2358626 -1014358 -1149823 755121 125528 -1139511
n=4 1494558 1608634 -1645397 16903 -1972138 -1336607 -1674819 318716 -393838 1008137
n=5 899399 1333825 1530411 -96895 -2960904 -1578059 -2312004 299624 -1354225 897368
n=6 83091 414391 4826197 9793 -3182523 -2087554 -2619065 -374199 -1590228 91992
n=7 189419 568167 4014970 -681510 -372781 -2077967 -1813568 -159466 -1914558 -357282
n=8 852952 295560 3170795 -1481067 -239713 -2468778 -226886 -342762 -2101092 -1334599
n=9 1569045 683593 3718213 -2315456 1281720 -1220656 -684308 56256 -2411408 -913902
n=10 5782528 169347 2806338 -3215373 1084351 -1431807 -1526684 -254378 -3073730 -1742821

where n = the number of "fails" before each success, and m = the iteration number (the 'mth' time a sample of n was taken)

This gives a density plot or histogram which corresponds to the probability density of the distribution from which it came. If I then multiply the cumulative sums by B/D, I can then create a cdf and find, as you state, "the probability that the balance will reach X or lower at some point within the first n rounds".

So, at some point I somehow I came up with the supposed "mean negative binomial density" and wrote such a simulation which appeared to 'confirm' it. The sim I linked to plots the simulated cumulative sum densities against the "mean negative binomial pdf" for comparison, as below.

If you click on the graph for a closer view, you'll see that the densities match very well.

I'm aware that random variable based simulations have limitations, but such a clear correlation is unlikely to be random. Since the "mean negative binomial pdf" is not just wrong but not meaningful, have I misunderstood what I was finding or looking for in the simulation?

Secondly, and possibly more interestingly, after your comment about the clt I looked for and found that for large n (larger than ~ 100) the simulated densities match skewed generalized error distributions quite well. Since I don't think I'd be able to use the inclusion-exclusion principle easily (tbh a the method to achieve a sum to a non-specific 'n' for this principle is beyond me), I'd like to be able to replace the error on the blog with a skewed GED instead as a model. However I'm unwilling to do more work on it until I can get a consensus (well, your advice

) as to whether I've simulated the cumulative sum (balance) at all times before n rounds.

I know I'm asking a lot for you to go through the sim and look at the results. I wish I could say "I'll owe you one" but I'll just settle for being grateful for either a confirmation that the simulation works or a reason why it doesn't. Right now, I'm just confused. Huh

Thanks for your help.

Meni Rosenfeld

donator

Activity: 2058

Merit: 1054

Quote from: organofcorti on April 06, 2012, 06:57:10 AM

Sorry, Meni, I wasn't ignoring you just wanted to make sure I had a clearer reply.

Didn't think so, just wanted to make sure my query wasn't lost in the fold.

Quote from: organofcorti on April 06, 2012, 06:57:10 AM

Also looked for my notes again and failed. Further, realised I should be using the term "probability mass function' instead of 'probability density function' but I'll continue as I started for consistency.

You can use the continuous approximation, and then you're talking about the pdf of the Erlang distribution.

Quote from: organofcorti on April 06, 2012, 06:57:10 AM

The probability that Sn, the sum of n geometrically and identically distributed random variables (x1 to xn) will be some amount X is a negative binomially distributed variable where the probability density is = Γ(X+n)/(Γ(n) * Γ(X+1)) * p^n * (1-p)^X

Similarly for Sn-1, the probability that this sum will equal some variable X:
= Γ(X+(n-1))/(Γ(n-1) * Γ(X+1)) * p^(n-1) * (1-p)^X

and so on for all S.

No arguments there. (There could be off-by-one errors but that's immaterial.)

Quote from: organofcorti on April 06, 2012, 06:57:10 AM

The set (S1, S2, S3, ...,Sn) is the cumulative sum of the geometrically distributed variables.The probability that X is to be found somewhere in that set is given by the sum of all probabilities that X = some S divided by n:

Pr(X∈(S1, S2, S3, … ,Sn))
= sum from 1 to n [Γ(X+n)/(Γ(n) * Γ(X+1)) * p^n * (1-p)^X]/n (1)

Here is where things start to get horribly, horribly wrong. Both in what you're trying to get, and how you get it.

First, I'll say again: If what you want is the balance after n rounds (which is what the text implies), it is simply (n - Sn/D)*B. That's it. Sn is the number of shares the pool had to pay. Any other derivation is nonsense. So I can only assume you really are trying to figure out the probability that a balance of X will be reached at some point within the first n rounds. Even if this is the case, the above quantity is misguided.

Let's begin with this formula. It is absolutely, and obviously, wrong.

The event X∈(S1, S2, S3, … ,Sn) contains the even X=Sj for any j. Therefore its probability must be at least the probability that X=Sj, which is the pmf of Sj at point X. So the probability must be greater than all the pmfs, so it can't be equal to their average.

The one place where the average of pmfs is meaningful is when you have a mixture variable. For example, if you have a variable Sr which is generated by first choosing a uniform random integer j between 1 and n and then setting it equal to Sj, its pmf will be given by this formula. But such a variable has no relevance whatsoever to the problem at hand.

If you do want to find the probability that X will be in the set, you need to treat this as a union (logic or) of the equalities X=Sj, which can be expanded with the inclusion-exclusion principle. I don't know if the expansion has a nice closed form in this case, I think it's extremely messy because of the dependencies between the Sj's.

So now maybe you can understand why the quantity Pr(X∈(S1, S2, S3, … ,Sn)) is not meaningful. It is not a pmf of anything. The total area under this curve is more than 1, and you can't take areas under the curve for any insight. The events 0∈(S1, S2, S3, … ,Sn) and 1∈(S1, S2, S3, … ,Sn) are not mutually exclusive, so you can't disjunct them by simply adding the probabilities.

You need to define the problem as "what is the probability that the balance will reach X or lower at some point within the first n rounds" (taking into account, obviously, the shift of +B for every round). This is a problem that should be solvable with recursion, for either the exact problem or for an approximation for it.

Quote from: organofcorti on April 06, 2012, 06:57:10 AM

This is where my lack of notes fails me. I can't for the life of me remember where I came across this way of summing probabilities, or how I applied it. It doesn't appear to be the result of a convolution. Maybe you know?

Intuitively I see that, since the sum from x = -inf to +inf of each separate pdf = 1, and that the sum from x = -inf to +inf of the individual pdfs for S1 to Sn will be n, then dividing by n is 'normalising' (not the right word, but I hope you know what I mean) the probability to give (1), or if you like, providing the mean probability.

The mean of each separate pdf increases with increasing n and the 'normalised' probability approaches a uniform probability density from 0 to the mean of Sn as n gets very large.

(2) Simply defines the mean = 0 for each separate pdf and then follows the same logic.

Does this make sense to you? I remember the solution making complete sense to me at the time.

Not in the least. I'm not following your train of thought because you're not explaining what, why and how you are trying to get.

Quote from: organofcorti on April 06, 2012, 06:57:10 AM

Interestingly, running simulations for uniformly randomly generated variables with a mean of 0 and negative binomially distributed variables with a mean of 0 and large n, give similar looking curves to each other, although on slightly different scales even with the same p.

Probably the CLT at work.

organofcorti

donator

Activity: 2058

Merit: 1007

Poor impulse control.

Sorry, Meni, I wasn't ignoring you just wanted to make sure I had a clearer reply. Also looked for my notes again and failed. Further, realised I should be using the term "probability mass function' instead of 'probability density function' but I'll continue as I started for consistency.

It might be better if I just described what I'm proving, since I made some mistakes there which muddied the water.

The probability that Sn, the sum of n geometrically and identically distributed random variables (x1 to xn) will be some amount X is a negative binomially distributed variable where the probability density is = Γ(X+n)/(Γ(n) * Γ(X+1)) * p^n * (1-p)^X

Similarly for Sn-1, the probability that this sum will equal some variable X:
= Γ(X+(n-1))/(Γ(n-1) * Γ(X+1)) * p^(n-1) * (1-p)^X

and so on for all S.

The set (S1, S2, S3, ...,Sn) is the cumulative sum of the geometrically distributed variables. The probability that X is to be found somewhere in that set is given by the sum of all probabilities that X = some S divided by n:

Pr(X∈(S1, S2, S3, … ,Sn))
= sum from 1 to n [Γ(X+n)/(Γ(n) * Γ(X+1)) * p^n * (1-p)^X]/n (1)

(note I've fixed it by removing the erroneous j's and changing x's to X's)

This is where my lack of notes fails me. I can't for the life of me remember where I came across this way of summing probabilities, or how I applied it. It doesn't appear to be the result of a convolution. Maybe you know?

Intuitively I see that, since the sum from x = -inf to +inf of each separate pdf = 1, and that the sum from x = -inf to +inf of the individual pdfs for S1 to Sn will be n, then dividing by n is 'normalising' (not the right word, but I hope you know what I mean) the probability to give (1), or if you like, providing the mean probability.

The mean of each separate pdf increases with increasing n and the 'normalised' probability approaches a uniform probability density from 0 to the mean of Sn as n gets very large.

(2) Simply defines the mean = 0 for each separate pdf and then follows the same logic.

Does this make sense to you? I remember the solution making complete sense to me at the time.

Interestingly, running simulations for uniformly randomly generated variables with a mean of 0 and negative binomially distributed variables with a mean of 0 and large n, give similar looking curves to each other, although on slightly different scales even with the same p.

Meni Rosenfeld

donator

Activity: 2058

Merit: 1054

Organ, any news about the pdf issue?

kano

legendary

Activity: 4634

Merit: 1851

Linux since 1997 RedHat 4

Quote from: organofcorti on April 04, 2012, 10:52:49 PM

Quote from: kano on April 04, 2012, 03:51:05 PM

Quote from: organofcorti on April 04, 2012, 10:29:37 AM

...

Quote

Which of course is true.
(though you didn't seem to bother to point out that it can be both positive and negative in the same way Tongue

)

True, but hardly useful when discussing the risks of losing shares. A positive buffer is not good for a pool in the way a negative buffer is bad for a pool. Maturity time won't get any less as the buffer becomes more positive, but maturity time will increase significantly as the buffer becomes more negative.

Quote

That is of course true of DGM also.

No, I don't think it is. I'll believe you if you can prove it using Meni's DGM functions though.

Quote

The DGM argument is that the pool will shutdown before that happens and the pool does not need to disclose the buffer.
I see that as a conflicting statement in itself.

As they say on wikipedia, "says who?".
...

Meni.

I'll find the posts he said this later if I really need to?

When making a statement that relies on your interpretation of someone's else statements or even quoting them, a citation is vital to allow a reader to come to the same decision (or not) for themselves.

Top of the previous page here Tongue

https://bitcointalk.org/index.php?topic=66026.180

Then of course also our earlier discussion that he linked:
https://bitcointalksearch.org/topic/m.807765 ... etc. after that post

organofcorti

donator

Activity: 2058

Merit: 1007

Poor impulse control.

Quote from: kano on April 04, 2012, 03:51:05 PM

Quote from: organofcorti on April 04, 2012, 10:29:37 AM

...

Quote

Which of course is true.
(though you didn't seem to bother to point out that it can be both positive and negative in the same way Tongue

)

True, but hardly useful when discussing the risks of losing shares. A positive buffer is not good for a pool in the way a negative buffer is bad for a pool. Maturity time won't get any less as the buffer becomes more positive, but maturity time will increase significantly as the buffer becomes more negative.

Quote

That is of course true of DGM also.

No, I don't think it is. I'll believe you if you can prove it using Meni's DGM functions though.

Quote

The DGM argument is that the pool will shutdown before that happens and the pool does not need to disclose the buffer.
I see that as a conflicting statement in itself.

As they say on wikipedia, "says who?".
...

Meni.

I'll find the posts he said this later if I really need to?

When making a statement that relies on your interpretation of someone's else statements or even quoting them, a citation is vital to allow a reader to come to the same decision (or not) for themselves.

Topic: Neighbourhood Pool Watch - page 10. (Read 49940 times)