Pages:
Author

Topic: Pool shutdown attack (Read 5105 times)

kjj
legendary
Activity: 1302
Merit: 1026
May 31, 2011, 06:22:22 PM
#35
I like how you move the goalposts.  But really, you've already agreed with me.

I never said that we know nothing of the hash rate, over a long period of time.  I said that we are not measuring the hash rate, and that our estimates of it aren't very accurate over short periods of time.

Kaji, you've evaded the clear points I raised about the nature of measurement only to continue asserting that what we have isn't one.   Why should I even continue to discuss with you when you won't explain why you say that this isn't a measurement when e.g. using a scale (which is also not infinitely precise, and is subject to complicated biases and noise), taking a survey, or taking a census are all considered to be measurements.

If you will say that there is no such thing as a measurement of a physical quantity then I will agree with you, given this definition of measurement, that we haven't made a measurement here.  Otherwise, I fail to see what you're arguing— that the pool operators are liars when they said they went down— that the pool users were hallucinating when they saw their miners idle— that the rate didn't really go down and that the system was just really unlucky at the time of these lies and hallucinations?

 Huh

There is an actual real number of hashes performed, and there is an actual real rate of them being performed over a given interval.  These numbers are only known at the miner itself.  They are not collected by the pools, they are not collated globally.  If they were, we would have actual measurements.

Instead, we have an interval, and the number of hashes that we expect it would take, on average, to perform the work demonstrated.  Divide one by the other, and we have an estimate of the hash rate.  This is only an estimate because, and this is key, the number of hashes it takes to make any given block is non-deterministic.  You can't measure a single non-deterministic event, or even a small number of them, and call it a measurement.

I guess if we can't agree on that, it is pointless to continue, but you should totally write a paper and claim your Nobel Prize, because a way to determine non-deterministic systems would probably be the biggest discovery since relativity.  Maybe since ever.  Or maybe you can only go so far as to say that SHA256 is predictable, in which case the world's cryptographers would love to hear what you have to say.

By the way, I've never said that the pool operators were liars, nor that the pool users were hallucinating.  I merely said, and I'll quote myself:

No one has any idea where the error bars on the hash rate graphs should be.  On the 1 day line, they will be astronomically huge (not to mention the 8 hour window, lol).  Like several times the width of the plotted channel huge.

DO NOT MAKE ASSUMPTIONS BASED ON THOSE GRAPHS.  They are estimates, not measurements.
full member
Activity: 154
Merit: 100
May 31, 2011, 06:07:56 PM
#34
Better example.  You have a billion sided die, and you throw it and it comes up showing "1".  Should I infer from that event that you had actually thrown the die 500 million times?

Nope, Kaji, that isn't a better example.  A better example would be you roll a billion-sided die repeatedly and you measure how many rolls it takes between each occurrence of, say, a face <= 100 coming up.  You can assign probabilities to how many expected rolls it takes on average between getting <= 100, and then if you get something way outside of that range, you can say that something very unexpected happened.

But if it comes up more or less often than you'd predict, you don't then pretend that you've "measured" my rolling speed.

Not necessarily.  But the null hypothesis is "Events are occurring at X speed."  If I consistently get experimental data with extremely low probability values at X speed, say they are outside a 99% confidence interval, then I can reject the null hypothesis and posit that the actual speed is different.  To calculate the actual speed, you would divide the number of events into the time window, but of course this is very noisy in a smaller time window.
kjj
legendary
Activity: 1302
Merit: 1026
May 31, 2011, 05:58:38 PM
#33
Better example.  You have a billion sided die, and you throw it and it comes up showing "1".  Should I infer from that event that you had actually thrown the die 500 million times?

Nope, Kaji, that isn't a better example.  A better example would be you roll a billion-sided die repeatedly and you measure how many rolls it takes between each occurrence of, say, a face <= 100 coming up.  You can assign probabilities to how many expected rolls it takes on average between getting <= 100, and then if you get something way outside of that range, you can say that something very unexpected happened.

But if it comes up more or less often than you'd predict, you don't then pretend that you've "measured" my rolling speed.
kjj
legendary
Activity: 1302
Merit: 1026
May 31, 2011, 05:52:38 PM
#32
Who the fuck is Kaji?
full member
Activity: 154
Merit: 100
May 31, 2011, 04:00:37 PM
#31
Better example.  You have a billion sided die, and you throw it and it comes up showing "1".  Should I infer from that event that you had actually thrown the die 500 million times?

Nope, Kaji, that isn't a better example.  A better example would be you roll a billion-sided die repeatedly and you measure how many rolls it takes between each occurrence of, say, a face <= 100 coming up.  You can assign probabilities to how many expected rolls it takes on average between getting <= 100, and then if you get something way outside of that range, you can say that something very unexpected happened.
staff
Activity: 4284
Merit: 8808
May 31, 2011, 03:58:09 PM
#30
I like how you move the goalposts.  But really, you've already agreed with me.

I never said that we know nothing of the hash rate, over a long period of time.  I said that we are not measuring the hash rate, and that our estimates of it aren't very accurate over short periods of time.

Kaji, you've evaded the clear points I raised about the nature of measurement only to continue asserting that what we have isn't one.   Why should I even continue to discuss with you when you won't explain why you say that this isn't a measurement when e.g. using a scale (which is also not infinitely precise, and is subject to complicated biases and noise), taking a survey, or taking a census are all considered to be measurements.

If you will say that there is no such thing as a measurement of a physical quantity then I will agree with you, given this definition of measurement, that we haven't made a measurement here.  Otherwise, I fail to see what you're arguing— that the pool operators are liars when they said they went down— that the pool users were hallucinating when they saw their miners idle— that the rate didn't really go down and that the system was just really unlucky at the time of these lies and hallucinations?
kjj
legendary
Activity: 1302
Merit: 1026
May 31, 2011, 03:51:53 PM
#29
I like how you move the goalposts.  But really, you've already agreed with me.

I never said that we know nothing of the hash rate, over a long period of time.  I said that we are not measuring the hash rate, and that our estimates of it aren't very accurate over short periods of time.

Oh, and there were gaps with one in a thousand chances in the other direction (quick solutions) during that same timeframe too.

http://blockexplorer.com/b/127594

http://blockexplorer.com/b/127588

And more.
staff
Activity: 4284
Merit: 8808
May 31, 2011, 03:34:54 PM
#28
The only thing that you can tell from a single sample is how unlikely it would be for you to repeat it in the same amount of time.  You can tell absolutely nothing at all about the amount of work actually done.

Say I get a block, and returned the result to you in a nanosecond.  What is more likely?

Option A) I got really lucky.
Option B) I have more hashing power at my disposal than the theoretical hashing power of the entire solar system, should it ever be converted into a computer.

Obviously the former. really really really really really lucky. In fact, I might instead think it's more likely that you've found a weakness in SHA-256 and are able to generate preimages fast.

And by the same token if the network goes a day without a result, whats more likely ... that an astronomically unlikely run of bad luck happened— p=2.8946403116483e-63  a one in a wtf-do-they-have-a-word-for-that event—  or that it lost hashrate or had some other kind of outage?

We had block gaps with combined one in a thousand chances given the expected nominal rate and at the same time big pools were down and their operators were saying they were suffering an outage.  But you say we know nothing about the hash rate?

*ploink*

kjj
legendary
Activity: 1302
Merit: 1026
May 31, 2011, 01:08:17 PM
#27
Because bitcoin solutions arise from such a fantastically large number of fantastically unlikely events— giving a fairly high overall rate, the confidence bounds are fairly small even for a single example.

Rubbish.

The only thing that you can tell from a single sample is how unlikely it would be for you to repeat it in the same amount of time.  You can tell absolutely nothing at all about the amount of work actually done.

Say I get a block, and returned the result to you in a nanosecond.  What is more likely?

Option A) I got really lucky.
Option B) I have more hashing power at my disposal than the theoretical hashing power of the entire solar system, should it ever be converted into a computer.
staff
Activity: 4284
Merit: 8808
May 31, 2011, 12:55:26 PM
#26
Better example.  You have a billion sided die, and you throw it and it comes up showing "1".  Should I infer from that event that you had actually thrown the die 500 million times?

We're not measuring any outcome.  We're measuring a small set of outcomes.

If I send you a block header and you come back with a solution at the current difficulty how many hashes did you do before you found it?   The most likely answer is 18,678,355,68,419,371  (I changed this number after my initial post because I'd accidentally done a nice precise calculation with the wrong value for the current difficulty Wink )

Maybe you did one hash, maybe you did 1000 times that.  But those cases are VERY unlikely.  Far more unlikely than the error that would be found in virtually any measurement performed of any physical quantity.   So we can easily specify whatever error bounds we like, and say with high confidence that the work was almost certainly within that range and that the chance of the result coming from chance is small— just as if we were comparing the weights of things on a scale (though the difference in process gives a different error distribution, of course).  Because bitcoin solutions arise from such a fantastically large number of fantastically unlikely events— giving a fairly high overall rate, the confidence bounds are fairly small even for a single example.

E.g. at the normal expectation of 10 minutes, we'd expect 99% of all gaps to be under 46.05 minutes and our average rate has been faster than that.  So you're asking us to accept that there were multiple p<1% events but not a loss of hashrate when at the same time the biggest pool operator is obviously taking an outage which is visible to all.  OOooookkkaaay...

kjj
legendary
Activity: 1302
Merit: 1026
May 31, 2011, 10:25:51 AM
#25
Kuji, let us example a hypothetical scenario.  Let's say I am flipping a fair coin several thousand times.  Do you believe it is impossible for me to calculate the odds of getting ten heads in a row because "You can speak with statistical certainty about it in bulk, but not at small scales"?  Because it is, in fact, possible to calculate exactly the odds of that happening, as any introductory Statistics student could tell you.

This is no different than Bitcoin.  In fact, the distribution of measurements is exactly the same between the two scenarios (results of coin-flipping and results of hashing to find improbable hashes); both fit a Poisson distribution.

I will refer you to the earlier chapters of a typical college-level Statistics book, especially the parts on calculating probabilities.

Who the hell is Kuji?

And your example is nothing at all like bitcoin.

You can calculate the odds of getting 10 in a row out of X thousands of flips.  And then if you do X thousands flips repeatedly, say Y times, the number of 10-in-a-row events in reality will approach your calculation as Y grows larger.

Better example.  You have a billion sided die, and you throw it and it comes up showing "1".  Should I infer from that event that you had actually thrown the die 500 million times?

If you are looking for a good book on statistics, I suggest Savage's Foundations of Statistics.  Slog your way through that and you'll have a MUCH better understanding of what statistics can, and cannot, do.  Oh, and a drinking problem.
full member
Activity: 154
Merit: 100
May 31, 2011, 10:07:15 AM
#24
Kuji, let us example a hypothetical scenario.  Let's say I am flipping a fair coin several thousand times.  Do you believe it is impossible for me to calculate the odds of getting ten heads in a row because "You can speak with statistical certainty about it in bulk, but not at small scales"?  Because it is, in fact, possible to calculate exactly the odds of that happening, as any introductory Statistics student could tell you.

This is no different than Bitcoin.  In fact, the distribution of measurements is exactly the same between the two scenarios (results of coin-flipping and results of hashing to find improbable hashes); both fit a Poisson distribution.

I will refer you to the earlier chapters of a typical college-level Statistics book, especially the parts on calculating probabilities.
hero member
Activity: 575
Merit: 500
The North Remembers
May 31, 2011, 10:01:37 AM
#23
Is there a good alternative to deepbit?

I switched to eligius a while back when I saw deepbit growing too quickly. If eligius gets too large someday I will switch again. Eventually I would like to be strictly solo but I need to earn more coins to buy more hardware so I can earn more coins and buy more hardware so I can earn more coins until the entire midwest of the united states is covered under a mountain of GPUs and spontaneously combusts into a second sun. Anyone want to invest in some windturbines/solar panels I can put up on my farm to make a totally off-grid, self sufficient mining facility? Tongue
kjj
legendary
Activity: 1302
Merit: 1026
May 31, 2011, 10:00:38 AM
#22
No, they are not measurements.  There is no way to measure how much work went into finding any given hash, unless you are actually monitoring each and every miner involved.

I think you don't actually know what a measurement is.   If you ask everyone coming into the emergency room complaining of stomach pains "Did you eat cucumber in the last 24 hours?"  is that not a measurement?  Even though the results will be be contaminated by noise and biases?   If the census randomly selects 100,000 houses out of a million to get demographics from, is this not a measurement?

In this case we know the hashrate by virtue of the network reporting when it finds a block.

No.  The only thing you know at this point is the time (roughly) that this block was found.

This is a measurement of _well_ defined process with explicable behavior and we can speak with certainty about it.

Yes, well defined, but non-deterministic.  You can speak with statistical certainty about it in bulk, but not at small scales.  Think radioactive decay.  I can give you a very accurate estimate of how long it will take for a mole of U-235 to decay halfway, but I can't tell you anything at all about how long it will be until the next hit on your geiger counter.

Unlike my silly examples its not subject to much in the way surprising biases or unknown sources of noise. (Because, e.g. if broken clients diminish our hash rate— thats not a random effect we'd wish to exclude) about all there is to worry about is how you time the blocks: If you time from the blocktimes you're subject to node timestamp stupidity if you measure locally you will be subject to a small amount of network propagation noise. But bitcoin propagation time is miniscule compared to the average time between blocks.

Regardless, If the expected solution rate is r  then the proportion of times in which a block will take longer than x is e^(-1/r*x).

Moreover, the particular process at play here has its maximum likelihood value at the mean. So you don't have to do any fancier math than taking an average to reach the most accurate measurement possible given the available data.

So, e.g. if we see long gaps then we can say with very high confidence that the _measurement_ being performed by the network is telling us that the hashrate is almost certainly low.

As far as the timestamps go— yes, bitcoin network time can be wonky. So don't pay any attention to it. If you have a node with good visibility you'll observe the blocks directly.

Gah.  You understand the statistics, but you can't seem to accept the implications.

I'll say it again.  The time to find one block tells you nothing about the amount of work that went into finding it.  When you start talking about large numbers of blocks, you can start saying things like "the probability is very high that the network hashing rate was X over this long interval".

This is not the language of measurements.  It is the language of estimation.
staff
Activity: 4284
Merit: 8808
May 31, 2011, 09:27:05 AM
#21
No, they are not measurements.  There is no way to measure how much work went into finding any given hash, unless you are actually monitoring each and every miner involved.

I think you don't actually know what a measurement is.   If you ask everyone coming into the emergency room complaining of stomach pains "Did you eat cucumber in the last 24 hours?"  is that not a measurement?  Even though the results will be be contaminated by noise and biases?   If the census randomly selects 100,000 houses out of a million to get demographics from, is this not a measurement?

In this case we know the hashrate by virtue of the network reporting when it finds a block.  This is a measurement of _well_ defined process with explicable behavior and we can speak with certainty about it.  Unlike my silly examples its not subject to much in the way surprising biases or unknown sources of noise. (Because, e.g. if broken clients diminish our hash rate— thats not a random effect we'd wish to exclude) about all there is to worry about is how you time the blocks: If you time from the blocktimes you're subject to node timestamp stupidity if you measure locally you will be subject to a small amount of network propagation noise. But bitcoin propagation time is miniscule compared to the average time between blocks.

Regardless, If the expected solution rate is r  then the proportion of times in which a block will take longer than x is e^(-1/r*x).

Moreover, the particular process at play here has its maximum likelihood value at the mean. So you don't have to do any fancier math than taking an average to reach the most accurate measurement possible given the available data.

So, e.g. if we see long gaps then we can say with very high confidence that the _measurement_ being performed by the network is telling us that the hashrate is almost certainly low.

As far as the timestamps go— yes, bitcoin network time can be wonky. So don't pay any attention to it. If you have a node with good visibility you'll observe the blocks directly.


member
Activity: 107
Merit: 10
May 31, 2011, 09:24:28 AM
#20
Is there a good alternative to deepbit?
kjj
legendary
Activity: 1302
Merit: 1026
May 31, 2011, 08:48:26 AM
#19
No one has any idea where the error bars on the hash rate graphs should be.  On the 1 day line, they will be astronomically huge (not to mention the 8 hour window, lol).  Like several times the width of the plotted channel huge.

DO NOT MAKE ASSUMPTIONS BASED ON THOSE GRAPHS.  They are estimates, not measurements.
I'm not making assumptions based on "those graphs". I'm looking at the highly improbable gaps between blocks that can only really be justified on the basis of large hashrate graphs. e.g. times 50 minutes or longer has a p-value of .0067 if the expectation is 10 minutes— lower considering that we'd been running a rate higher than one per ten minutes.

Unrelated, it's no "No one has any idea"— Those graphs _are_ a measurement but of a noisy process, however the process in which coins are found is well understood, and you can easily draw confidence intervals based on the known distribution and the number of points in the average. I'll nag sipa to do this. It would be reasonable.

No, they are not measurements.  There is no way to measure how much work went into finding any given hash, unless you are actually monitoring each and every miner involved.

What these graphs do is divide the average amount of work to find a block by the actual time to find a given block.  Note that it is the "average" amount of work, not the actual amount of work.

Oh, and there is other nonsense around too.  Did anyone else notice the two consecutive blocks over the weekend, where the second block had a timestamp before the first block?  I'm pretty sure that block didn't really require a negative amount of hashing to get created.
staff
Activity: 4284
Merit: 8808
May 31, 2011, 08:28:08 AM
#18
No one has any idea where the error bars on the hash rate graphs should be.  On the 1 day line, they will be astronomically huge (not to mention the 8 hour window, lol).  Like several times the width of the plotted channel huge.

DO NOT MAKE ASSUMPTIONS BASED ON THOSE GRAPHS.  They are estimates, not measurements.

I'm not making assumptions based on "those graphs". I'm looking at the highly improbable gaps between blocks that can only really be justified on the basis of large hashrate graphs. e.g. times 50 minutes or longer has a p-value of .0067 if the expectation is 10 minutes— lower considering that we'd been running a rate higher than one per ten minutes.

Unrelated, it's no "No one has any idea"— Those graphs _are_ a measurement but of a noisy process, however the process in which coins are found is well understood, and you can easily draw confidence intervals based on the known distribution and the number of points in the average. I'll nag sipa to do this. It would be reasonable.

ene
newbie
Activity: 42
Merit: 0
May 31, 2011, 06:16:16 AM
#17
It hasn't been healing when these big pools keep going down— we lose hashrate, and most doesn't come back until the pool does.

We lose hashrate, but do we lose too much hashrate?  Roll Eyes
kjj
legendary
Activity: 1302
Merit: 1026
May 30, 2011, 08:55:25 PM
#16
No one has any idea where the error bars on the hash rate graphs should be.  On the 1 day line, they will be astronomically huge (not to mention the 8 hour window, lol).  Like several times the width of the plotted channel huge.

DO NOT MAKE ASSUMPTIONS BASED ON THOSE GRAPHS.  They are estimates, not measurements.
Pages:
Jump to: