Pages:
Author

Topic: Fascinating what Michael Deletes [BTCguild 90% luck and dropping] (Read 4194 times)

legendary
Activity: 1540
Merit: 1001
Kinda sucky that they're using live pools as test beds, but I doubt there's much you can do about it.  Their error has cost us all.  C'est la vie.

I fail to see how it cost us all?

If A == how many blocks would be solved without them
If B == how many blocks could have been solved with them

You ended up with a value > A, but less than A + B.  In other words, you did better with them, than without them.  But not as good as you could have if their software was working properly.

Right?

M
Hmmmm.  Is this a trick question?  The way I look at it is that if they solo-mined with those miners they would have got nothing.  As it is they got value for their shares.

I think actually A=B in this case.  But A is now divided between the pool and them, so my share of A is less than it would have been.  At least that's my simplified view of things.  I'm happy to be shown how I might be wrong though.

Or put another way, they added nothing to the pool as it was impossible for them to solve a block, but they took payouts that would have been paid to other miners.

I think I misunderstood.  If they weren't solving *any* blocks, then you are right, their shares made yours less valuable.

I read it as only certain types of blocks were failing.  I see now I read it wrong.  Nevermind! Smiley

M
sr. member
Activity: 672
Merit: 250
Buy, sell and store real cryptocurrencies
But not as good as you could have if their software was working properly.
And coincidentally I think this is a very good way of saying that it cost us.  The difference between doing as well as we did and as well as we could have done is a direct cost.
sr. member
Activity: 672
Merit: 250
Buy, sell and store real cryptocurrencies
Kinda sucky that they're using live pools as test beds, but I doubt there's much you can do about it.  Their error has cost us all.  C'est la vie.

I fail to see how it cost us all?

If A == how many blocks would be solved without them
If B == how many blocks could have been solved with them

You ended up with a value > A, but less than A + B.  In other words, you did better with them, than without them.  But not as good as you could have if their software was working properly.

Right?

M
Hmmmm.  Is this a trick question?  The way I look at it is that if they solo-mined with those miners they would have got nothing.  As it is they got value for their shares.

I think actually A=B in this case.  But A is now divided between the pool and them, so my share of A is less than it would have been.  At least that's my simplified view of things.  I'm happy to be shown how I might be wrong though.

Or put another way, they added nothing to the pool as it was impossible for them to solve a block, but they took payouts that would have been paid to other miners.
legendary
Activity: 1540
Merit: 1001
Kinda sucky that they're using live pools as test beds, but I doubt there's much you can do about it.  Their error has cost us all.  C'est la vie.

I fail to see how it cost us all?

If A == how many blocks would be solved without them
If B == how many blocks could have been solved with them

You ended up with a value > A, but less than A + B.  In other words, you did better with them, than without them.  But not as good as you could have if their software was working properly.

Right?

M
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
So what's the confidence level that this has been isolated and we're back to normal variance?  Are these miners going to be delivered to paying customers?  Or have they been already?  (i.e. is there a significant number of broken miners out in the wild.)  A 1.8 PH/s user is a lot easier to spot than 1.8PH/s sent out in 1TH/s units.
This is why customers should buy only miners where the software is free and they can audit and modify it. Already there are miners out there using binary forks of cgminer without distributing their code, and there are so many of them that I can't go chasing them (mostly out of China so far). But as per always, people do not buy on principle, just on cost, availability, potential profit etc. (which is understandable but a shame), and just look at how badly they even made those decisions (which still never ceases to amaze me).
sr. member
Activity: 672
Merit: 250
Buy, sell and store real cryptocurrencies
So what's the confidence level that this has been isolated and we're back to normal variance?  Are these miners going to be delivered to paying customers?  Or have they been already?  (i.e. is there a significant number of broken miners out in the wild.)  A 1.8 PH/s user is a lot easier to spot than 1.8PH/s sent out in 1TH/s units.
sr. member
Activity: 672
Merit: 250
Buy, sell and store real cryptocurrencies
Kinda sucky that they're using live pools as test beds, but I doubt there's much you can do about it.  Their error has cost us all.  C'est la vie.
legendary
Activity: 1750
Merit: 1007
I think the options are:

- withholding attack
- malicious op
- some other attack we don't know about
- ridiculous bad luck streak

Keep in mind Eligius is at 93% luck for 90 days as well.  But no one is complaining there.

M

At least a significant portion of the recent bad luck streak was already traced to a group of users that are developing their own ASICs, currently at ~1.8 PH/s.  They had previously mined on Eligius back in March and had solved blocks back then.  They moved to BTC Guild in early April, after difficulty crossed 4.2b.  They were not 1.8 PH/s at the start, they slowly grew over time.

We've since identified that their software was having problems at diff > uint32 maximum value (~4.2b).  They patched their software and went to Eligius to test the fix.  They solved a few blocks.  They had their accounts unfrozen on BTC Guild, and have since found a number of blocks in the last 24 hours for BTC Guild as well.

This being the Bitcoin community, most people are going to assume malice forever, even though these users got back to me immediately after their accounts were frozen when they could've easily proxied their miners and split them up more to make it impossible to trace.  They fixed the issue, and have since been reporting block solves as expected.
legendary
Activity: 1540
Merit: 1001
The reality is you can not conclusively say if the miner is or is not cheating in any of those three scenarios.

So are we back to square one?

It seems everyone is assuming there's a withholding attack going on?

M

Not really, but are we back assuming it's just bad luck? No other option? Have we ruled out everything?

I think the options are:

- withholding attack
- malicious op
- some other attack we don't know about
- ridiculous bad luck streak

Keep in mind Eligius is at 93% luck for 90 days as well.  But no one is complaining there.

M
legendary
Activity: 1904
Merit: 1007
The reality is you can not conclusively say if the miner is or is not cheating in any of those three scenarios.

So are we back to square one?

It seems everyone is assuming there's a withholding attack going on?

M

Not really, but are we back assuming it's just bad luck? No other option? Have we ruled out everything?
legendary
Activity: 1540
Merit: 1001
The reality is you can not conclusively say if the miner is or is not cheating in any of those three scenarios.

So are we back to square one?

It seems everyone is assuming there's a withholding attack going on?

M
legendary
Activity: 1904
Merit: 1007
The reality is you can not conclusively say if the miner is or is not cheating in any of those three scenarios.

So are we back to square one?
legendary
Activity: 2128
Merit: 1065
It's more than "additional configuration" if one's router doesn't support IPSEC.  Plus, and this is more to the point, if the pool doesn't support securing the other end this won't even be an option.

Of course this will protect only up to the pool boundary, but I'm not going to go there...  Smiley
In my experience these days even the cheapest "ADSL modem & WiFi gateway router" (that are practically given away by ISPs) do support IPsec. And even if they don't support IPsec directly they support passthrough of IPsec tunneled in UDP. It really is a matter of expending some effort to configure the endpoints, be they routers or the hosts. All the well-known OSes support IPsec for about a decade now.

Also in my experience the reluctance to enable IPsec on somebody's behalf typically points to other problems:

1) generally lousy network infrastructure with high BER
2) inside jobs at vendors/clients who then try to blame MITM or bitsquatting or whatever esoteric enough
3) generally incompetent operations/sysop/sysadmin staff
4) some other "layer 8" political/religious problems in the organizations

In my experience cost was never a problem. In addition to the above doing a "show crypto ipsec" (in Cisco IOS or equivalents in other routers) was the surest way to get the problem escalated to the appropriate personnel at the ISP, NOC, DC, etc. in the rare cases that there was an actual problem somewhere in the infrastructure.
sr. member
Activity: 278
Merit: 251
Quote
Nowadays the cost of the defense against MITM is also extremely low: IPsec. AH is sufficient, more complex ESP is not required. No software changes are required either, only some additional configuration on the routers. I've discussed this already with Slush in the original "Stratum" thread, not the later "Stratum Mining" thread.

It's more than "additional configuration" if one's router doesn't support IPSEC.  Plus, and this is more to the point, if the pool doesn't support securing the other end this won't even be an option.

Of course this will protect only up to the pool boundary, but I'm not going to go there...  Smiley
donator
Activity: 1218
Merit: 1079
Gerald Davis
Oining ASICs are nearly 100% deterministic, more precisely (100% - allowable fault rate).

On the off chance you aren't trolling, lets imagine the "easiest" possible scenario for detecting a cheater.  

Lets assume all miners use "perfect" hardware.  Once they start a unit of work, they always check all nonce values (0% unchecked nonces), they always increment ntime rolling by one second and one second only (0% unchecked seconds), and they never have any errors have any hardware errors (either reported or not reported). 

Your pool sends test work to 10% of your users and waits 6 seconds (roughly 1% increase in orphan rate) for a response and then broadcast the block.

The miner in question:
a) returns the test work prior to broadcasting the block.
b) returns the test work but after you had broadcast the block.
c) does not return the test work.

In each scenario is the miner cheating or not?

I think you would agree that this scenario is about as simplistic as possible.   Right?  The real world is far more chaotic but this simplistic scenario takes ASIC complexity out of the picture.  The reality is you can not conclusively say if the miner is or is not cheating in any of those three scenarios.
legendary
Activity: 2128
Merit: 1065
A man in the middle attack can accomplish the same effect as bad hardware.  This might be more than a spiteful attack, it could even be profitable if the cost of being a MITM can be made low enough. The attacker would mine solo or on other pools that he is not attacking.  The payoff would come in a lower difficulty.  There are various ways that this attack can be detected if miners are looking for it.
Nowadays the cost of the defense against MITM is also extremely low: IPsec. AH is sufficient, more complex ESP is not required. No software changes are required either, only some additional configuration on the routers. I've discussed this already with Slush in the original "Stratum" thread, not the later "Stratum Mining" thread.
legendary
Activity: 2128
Merit: 1065
OK, D&T again went into the mode where he posts some strawman arguments then deletes them. I'll have to paraphrase:

Mining ASICs are nearly 100% deterministic, more precisely (100% - allowable fault rate). The problem we are facing is that thus far only "false positive" faults were measured by the mining software. Maybe the threat of the block withholding attacks will force everyone to cooperate on the measurement of the "false negative" fault rate.

Hopefully vendors will cease to obfuscate like this:
It won't work. Our miners for example don't check all nonce range, and a lot of cgminer generated jobs get dropped. Also we use ntime-rolling differently from other pools. It's impossible to know what the miner will generate from stratum template.
or like eldentyrell did with the timebomb in his Tricone Mining bitstream for Xilinx Spartans.
sr. member
Activity: 278
Merit: 251
A man in the middle attack can accomplish the same effect as bad hardware.  This might be more than a spiteful attack, it could even be profitable if the cost of being a MITM can be made low enough. The attacker would mine solo or on other pools that he is not attacking.  The payoff would come in a lower difficulty.  There are various ways that this attack can be detected if miners are looking for it.




legendary
Activity: 2128
Merit: 1065
I assume you are confused because you believe that hardware normally checks all nonces all the time.
Dude! I have to assume that you've again went into the mode where you can't read with comprehension and started arguing with the voices in your head. I normally agree with over 90% what you say, except when you start ranting like when you've claimed that Apple Computer doesn't take money orders. Please chill out or maybe switch to decaf?
Obviously this test has to be hardware specific, because e.g. bitfuries only test 768/1024 of the available nonce space. This would have to be even more complex to account for failed sub-engines in the multi-engine chips.
Anyway, I think that the problem is solvable. Maybe a Bloom filter of "hashing blind spots". It certainly would require a cooperation from the ASIC vendors with the developers of the miner and the pool software. As most of the things in Bitcoin mining it will be some sort of probabilistic solution, not a mathematical proof of failure.

The first person posting on this forum about false negatives in mining was hardcore-fs with his XUPV5 FPGA miner, but I don't think that he published his results nor code.

Likewise Spondoolies made promises of their ASIC designer to start posting on the forum after Passover, but apparently they haven't had time to do this yet.

In general, I'm optimistic that problem will be solved. Nowadays, who remembers when the long-polls were the major problems for mining pools? From my experience in the electronic and software industry: one person's problem is other person's opportunity. For example: NTSC started as "Never Twice the Same Color" and ended up where even the cheapest TV set receiver chipsets have been precisely calibrating themselves 29.97 times per second (using Vertical Interval Reference and Ghost Canceling Reference). And that was done quite effectively over all-analog broadcast retransmission networks including satellite links, where the end-users were completely oblivious to the problems in the transmission channel. All it took was to build a reasonable model of the distortion and run the tests frequently enough.

Quoting below just for future reference:
You are correct that there isn't any observable difference between the 2 types of problems.  And in practice, I think it's extremely unlikely that someone with hardware that has catastrophic levels of false negative errors is unaware of the issue, so the intent is likely equivalent as well.

When you have people of very questionable competence designing silicon on crash schedules it's inevitable that serious hardware defects are going to be in play.  And with the money involved, if there is a way to externalize the cost of failure to pool participants people are going to choose that over eating the loss themselves.

I think it's an absolute must that stratum allow pools to send test jobs to validate hardware integrity in a blind way that allows detection of malicious withholding as well.  Without that, I expect the days of public pools are numbered, and with them all hope of even modest decentralization will go as well.

It has nothing to do with hardware errors (although they are another minor source of false positives). 

Simple version:
If nonce 728937289 solves the block for a given unit of work and a particular hardware for a variety of reasons doesn't check nonce 728937289 it is not going to find a solution.

If a user doesn't return a solution is it because he is cheating or is it because his hardware didn't check that nonce with that work?  The answer is there is absolutely no way to know.
It isn't a hardware fault or error.

If nonce 728937289 solves the block for a given unit of work and a particular hardware for a variety of reasons doesn't check nonce 728937289 it is not going to find a solution.  It is that simple.  I assume you are confused because you believe that hardware normally checks all nonces all the time.  Nothing could be further from the truth.  There are dozens of reasons why a nonce wouldn't be checked and no software is going to report that as an error.  HW errors as reported by cgminer and the like are the result of the hardware reporting that work A + nonce B produces a hash of particular difficulty and it doesn't.

Quote
I think that Stratum Mining protocol can be extended to allow the client to send to the pool server the maps of the "blind spots" in the nonce space.
It isn't a static range.  Say you have an ASIC with 64 cores the designer may decide to take the nonce range (2^32) and break it into 64 chunks.  It does this by assinging an offset to each core.  So all the cores get the same work, each one starts at a difference nonce value and they all increment 2^26.  However due to yields not all chips will have all 64 cores operating.  The seller may have designed it around 60 cores being "good enough" to achieve the hashrate.   So if 4 cores are "off" then that means millions of nonces will never even be attempted.  Most ASICs also work on dynamic load so as the hardware error rate and/or temp rise it shuts off the cores.   So the same nonces won't be covered all the time.

Still it goes way beyond just which nonces are checked.  Both the ASICs and the mining software queue work.  So what happens when a miner has 30 seconds worth of work queued up and you send him the "test work".  Are you going to hold the block for 30 seconds which would mean a >7% orphan rate?  What happens if due to latency the miner returns the proper solution but only after you have broadcast the block?  If he cheating or just slow or just had a lot queued up?   How are you going to avoid the attacker just sharing info between workers?  Say you send the test work to 10% of workers and wait 30 seconds.  You probably will end up with hundreds maybe thousands of false positives AND 7%+ orphan rate (on top of whatever you are losing to attacks).  If the attacker had say 30 accounts there is a less than 3% chance that 1 and only one would be given the test work and end up withholding it.  So 3% of the time you will catch the worker, due to false positives lets say you don't boot someone until they fail 3 times.  That means on average the attacker will pass 99 blocks before failing out.  However you are only testing him 10% of the time so you MIGHT (I doubt it) catch him after he withholds or attempts to withhold 990 blocks.

Of course even your legit users are going to start working against you (or just flee to where they aren't attacked by the pool).  Those who want to stay will probably design a work relay system so they can identify your test works if for no other reason than to be falsely kicked out (especially if you confiscate the work completed).  The attacker could join those relay networks and all but guarantee he would pass all tests.

Also don't take this is exhaustive I just don't feel like covering every possible scenario.  These issues are just the tip of the iceberg.  The only way to prevent these types of attacks is for the pool to know something the miner doesn't.  The current block hashing protocol doesn't make that possible.





donator
Activity: 1218
Merit: 1079
Gerald Davis
Can someone explain to me how is "block withholding attack" different from the "false negative" type of hardware fault? I mean in the observable symptoms, not in the intent.

The last time I looked into the miners source codes (late FPGA era,early bitfuries) I've only seen the "false positives" counted as "hardware errors": nonce comes up as gold, but software re-check shows that it isn't.

I haven't seen any published source code that does the opposite check for "false negatives": chip should find a golden nonce but hadn't.

From the skimming of the available hardware documentation I see that only Spondoolies has a BIST operation that allows to cheaply test for false negative. By "cheaply" I mean "cheaper than doing a regular, full scan".

Obviously this test has to be hardware specific, because e.g. bitfuries only test 768/1024 of the available nonce space. This would have to be even more complex to account for failed sub-engines in the multi-engine chips.

I think that Stratum Mining protocol can be extended to allow the client to send to the pool server the maps of the "blind spots" in the nonce space. Then the pool can sensibly proctor the tests meant to discover withholding attacks.


You are correct that there isn't any observable difference between the 2 types of problems.  And in practice, I think it's extremely unlikely that someone with hardware that has catastrophic levels of false negative errors is unaware of the issue, so the intent is likely equivalent as well.

When you have people of very questionable competence designing silicon on crash schedules it's inevitable that serious hardware defects are going to be in play.  And with the money involved, if there is a way to externalize the cost of failure to pool participants people are going to choose that over eating the loss themselves.

I think it's an absolute must that stratum allow pools to send test jobs to validate hardware integrity in a blind way that allows detection of malicious withholding as well.  Without that, I expect the days of public pools are numbered, and with them all hope of even modest decentralization will go as well.

It has nothing to do with hardware errors (although they are another minor source of false positives). 

Simple version:
If nonce 728937289 solves the block for a given unit of work and a particular hardware for a variety of reasons doesn't check nonce 728937289 it is not going to find a solution.

If a user doesn't return a solution is it because he is cheating or is it because his hardware didn't check that nonce with that work?  The answer is there is absolutely no way to know.
Pages:
Jump to: