I wanted to bring up the gambler's fallacy several pages ago, but I didn't have an account and so had to wait a while.
Your summary of gamblers fallacy is spot on. However it doesn't apply in this case.
It is relatively easy to calculate a mathematically perfect probability, in the form of "AVERAGE time to solve share", based on the difficulty of the coin and the difficulty of the pool. As well is it possible to calculate average time for the pool to solve a block given the pool's hashrate.
Since we are dealing with averages, this has nothing to do with gamblers fallacy. Instead we have a statistically provable probability.
So, if in this process we determine the average time to solve a block as a pool is 1 minute.... And we determine for a given miner the average time to solve a share is 45 seconds... What are really saying?
We are saying that about the same amount of times that the pool solves a block in 30 seconds, they also solve it in 1 minute 30 seconds.
We are also saying that about the same amount of times we solve a share in 30 seconds, we do so in 60 seconds.
Block [0=====15=====30=====45=====
60=====75=====90=====105=====120]
Share [0=====15=====30=====
45=====60=====75=====90]
So here's a way to think of it - the block solve will fall somewhere along that range, and the share will solve somewhere along its range, at random.
On any given go around, i could easily solve a share first, in fact in this case that will happen more times than not. But MANY times, the block will be solved first, and I will get nothing. This is the loss we are talking about. The work I did means nothing, it is not counted by the pool and credited as such. This loss is present in every crypto currency, but it doesn't really become a serious issue until you have these coins that are so easy that we find blocks in seconds and minutes.
But, we can combat this with a smaller diff. Imagine the same scenario, but I lower my diff such that I solve a share on an average of 30 seconds.
Block [0=====15=====30=====45=====
60=====75=====90=====105=====120]
Share [0=====15=====
[30]=====45=====60]
Now, you can imagine again - numbers falling randomly on these scales representing the time it takes to solve a block vs a share.
This time, you can visually see its much more likely that the share time will be less than the block solve time.
So why is this a big deal? Because its biased towards fast miners. With everyone at 512 diff, they achieve something closer to the second depiction than the first, by virtue of their faster hash rate.
So just to illustrate, say we had two miners only, one had 90% of the hash rate and the other had 10%. You would expect those to also be the shares of the profit, but that would not be the case. It might end up looking something more like 92% to 8% because of this effect.
Lowering the diff evens the playing field.
Edit: Another example of the advantage of fast miners.
Say we are working on a block of WhateverCoin. We have been solving WhateverCoin blocks at 60 seconds on average.
John, a 1 m/h miner, has been submitting shares on average of every 50 seconds.
Tim, a 10 m/h miner, has been submitting shares on average of every 5 seconds.
Lets focus on one particular block:
We solve a block of WhateverCoin in 50 seconds, a little ahead of average. Again, its random.
This time John doesn't solve until 55 seconds! Uh oh... John gets nothing.. Actually, John will stop at 50 seconds and start on the next block... but he wasted 50 seconds of hashing power.
Tim got 9 shares in the 50 seconds. He was 5 seconds into the 10th share when the block was found. Tim wasted time too - he wasted 5 seconds. Oh well.
If Tim is 10 times faster then John, you'd expect he earn 10 times as much. But John got nothing this time!
I can hear you now - "over time, that loss will be made up for statically"
Nope. You see, John is much, much more likely to waste significantly less of his hashing power, on average. If they both lost 30 seconds of their respective hashing power, fine, the loss is even.
My earlier research led me to the following conclusion:
X = Average time for miner to find a share
Y = Average block find time
x/2*y = Percentage of LOST profit/hashpower
So in this example:
John:
X= 50
Y= 60
50/2*60 = 41% Wasted Hash Power! (this is an extreme example but possible with some coins)
Tim:
X= 5
Y=60
5/2*60 = 4.1% Wasted Hash power.
If i did this again with higher and higher BLOCK solve times, the effect would dissipate and dissipate until it the difference between the two were negligible.
When the share solve rate starts approaching the block solve rate, its going to get bad without lowering diff.