Pages:
Author

Topic: 2013-03-12 IEEE Spectrum: Major Bug In The Bitcoin Software Tests The Community (Read 6739 times)

legendary
Activity: 1596
Merit: 1100
Can you explain how locking is handled better by leveldb for the particular type of database transaction that can require unbounded locks on bdb?

It is simply a different design.

leveldb does not need per-page locks, and is not an ACID database.  leveldb has a concept called a "batch", and batches are committed atomically.

legendary
Activity: 1904
Merit: 1002
We knew there was a limit. What nobody knew is that the limit was low enough to be hit with blocks smaller than 1 megabyte. Remember that 1mb blocks had been tried before and worked.

The BDB manual doesn't give a formula for selecting the number of locks, presumably because not even the BDB developers know how to do it correctly, or perhaps because it seems to vary across versions/platforms. It just says "pick a number that seems to work and then double it". That kind of nonsense is fortunately not present in LevelDB.

Quote
But good to see that you have finally gotten to grips with BDB lock limits ... AFTER implementing the new levelDB that ignored these two lines of code in all pre-0.8 bitcoin reference code ... which recall by your own words ... to paraphrase "if you want a protocol read the source code"

Seems I was right about that, wasn't I? Hence the warnings against re-implementations. Understanding and documenting all these edge case behaviours is very difficult even for people who work on the existing code. So if you rewrite everything from a wiki page it's going to be even harder to match things precisely.

Quote
0.8 maybe a superior solution but it has been rushed out. The fact you haven't fully understood the previous code and implemented the upgrade correctly may be more your deficiency than pre-0.7 code.

In what sense was 0.8 rushed out? The lock limits issue has been in the code for a very long time and went away entirely with 0.8, so spending more time testing it would not have helped reveal the problem. You'd have to have been testing old releases to find this issue, and moreover, testing it in non-obvious ways.

I'm tired of these sorts of threads and will not be posting on the issue further. The code that Satoshi left behind is certainly not ideal in many ways, and the only way to fix it is incremental upgrades. That means real work. If you are going to make yourself useful instead of just insulting people anonymously, then get to it and start finding bugs.

Can you explain how locking is handled better by leveldb for the particular type of database transaction that can require unbounded locks on bdb?
legendary
Activity: 3920
Merit: 2349
Eadem mutata resurgo
Quote
I'm tired of these sorts of threads and will not be posting on the issue further. The code that Satoshi left behind is certainly not ideal in many ways, and the only way to fix it is incremental upgrades. That means real work. If you are going to make yourself useful instead of just insulting people anonymously, then get to it and start finding bugs.

Probably best if we leave at that then. I'm happy the problem has been laid bare for all to see and the community is made aware of the issues in a transparent way so informed decisions can be made. Fear not, real work is happening.
legendary
Activity: 1526
Merit: 1134
We knew there was a limit. What nobody knew is that the limit was low enough to be hit with blocks smaller than 1 megabyte. Remember that 1mb blocks had been tried before and worked.

The BDB manual doesn't give a formula for selecting the number of locks, presumably because not even the BDB developers know how to do it correctly, or perhaps because it seems to vary across versions/platforms. It just says "pick a number that seems to work and then double it". That kind of nonsense is fortunately not present in LevelDB.

Quote
But good to see that you have finally gotten to grips with BDB lock limits ... AFTER implementing the new levelDB that ignored these two lines of code in all pre-0.8 bitcoin reference code ... which recall by your own words ... to paraphrase "if you want a protocol read the source code"

Seems I was right about that, wasn't I? Hence the warnings against re-implementations. Understanding and documenting all these edge case behaviours is very difficult even for people who work on the existing code. So if you rewrite everything from a wiki page it's going to be even harder to match things precisely.

Quote
0.8 maybe a superior solution but it has been rushed out. The fact you haven't fully understood the previous code and implemented the upgrade correctly may be more your deficiency than pre-0.7 code.

In what sense was 0.8 rushed out? The lock limits issue has been in the code for a very long time and went away entirely with 0.8, so spending more time testing it would not have helped reveal the problem. You'd have to have been testing old releases to find this issue, and moreover, testing it in non-obvious ways.

I'm tired of these sorts of threads and will not be posting on the issue further. The code that Satoshi left behind is certainly not ideal in many ways, and the only way to fix it is incremental upgrades. That means real work. If you are going to make yourself useful instead of just insulting people anonymously, then get to it and start finding bugs.
legendary
Activity: 1988
Merit: 1012
Beyond Imagination
My rule of thumb after 20+ years in software engineering: Never be the first to try the latest version until it has been used by others for months Smiley In fact some of my machines still run 0.3

Maybe the newer version is superior, but people could use old version to do many kind of fancy things that developer can never imagine and test. A new version typically affect many infrastructures and relations that were built upon those old version

Luckily bitcoin has not reached such maturity level that there are hundreds of other services are dependant on the blockchain
legendary
Activity: 1470
Merit: 1006
Bringing Legendary Har® to you since 1952
I'm pretty sure Mt Gox, blockchain.info, and BitPay were all on 0.7, and probably some other major sites as well. Sites like that are probably less capable of changing versions overnight compared to mining pools.

It baffles me why they would not have switched to 0.8. That version was out for weeks before the fork happened. Oh well, live and learn.

Then you don't really know a lot about large institutions, do you ?

I would bet that many banks run software written in 1999 or even before, you know why ? Because when huge risks and costs are involved, you don't change something that works properly unless there is a serious reason. Hell, some large companies still run COBOL code written 30-40 years ago !

Changing software versions is a significant risk each time it is done.
sr. member
Activity: 247
Merit: 250
I'm pretty sure Mt Gox, blockchain.info, and BitPay were all on 0.7, and probably some other major sites as well. Sites like that are probably less capable of changing versions overnight compared to mining pools.

It baffles me why they would not have switched to 0.8. That version was out for weeks before the fork happened. Oh well, live and learn.

Time, money, etc to test, implement, copy over any custom code, etc.  And the consequences if something is missed.
legendary
Activity: 3920
Merit: 2349
Eadem mutata resurgo
Mike H.

Quote
The lock limit in 0.7 is not a protocol rule - it serves no useful purpose, was not previously known about and doesn't even appear to be consistent across different versions of Berkeley DB, so 0.7 nodes are already inconsistent with each other. What's more, the lock limit also applies to re-orgs. What that means is that some 0.7 nodes are in an unstable state in which they may be unable to process a valid re-org and thus permanently hose themselves, even with a 250kb soft block size limit.

Incorrect that the lock limit rule was not known about. It is stated explicitly in the pre-0.7 source code, db.cpp lines 82 and 83

Code:
   
dbenv.set_lk_max_locks(10000);
dbenv.set_lk_max_objects(10000);

But good to see that you have finally gotten to grips with BDB lock limits ... AFTER implementing the new levelDB that ignored these two lines of code in all pre-0.8 bitcoin reference code ... which recall by your own words ... to paraphrase "if you want a protocol read the source code"

0.8 maybe a superior solution but it has been rushed out. The fact you haven't fully understood the previous code and implemented the upgrade correctly may be more your deficiency than pre-0.7 code.
legendary
Activity: 1106
Merit: 1001
I'm pretty sure Mt Gox, blockchain.info, and BitPay were all on 0.7, and probably some other major sites as well. Sites like that are probably less capable of changing versions overnight compared to mining pools.

It baffles me why they would not have switched to 0.8. That version was out for weeks before the fork happened. Oh well, live and learn.
legendary
Activity: 1400
Merit: 1013
I'm pretty sure Mt Gox, blockchain.info, and BitPay were all on 0.7, and probably some other major sites as well. Sites like that are probably less capable of changing versions overnight compared to mining pools.
hero member
Activity: 520
Merit: 500
Yeah, I don't get it, why didn't everyone just get forced to switch to 0.8 since it didn't have the BDB limit on number of transactions that 0.7 has? Why was it the other way around?
Based on what I've read my guess is that many significant non-mining businesses are still on 0.7 and couldn't upgrade quickly enough.

+1

Doesn't matter which chain is longer if a majority of the people aren't on it.  Breaking changes need to be given lots of warning to be effective.  Trying to force everyone to use 0.8 would have only made the situation worse.  From the chat discussion, I don't think mtgox was using 0.8.  So trading at the largest exchange would be halted until it could be upgraded.  If that doesn't sound disastrous, I'm not sure what does.

Thanks. It makes more sense to me now. Basically, 0.8 had created an unplanned hard fork, and took many users (not necessarily miners) hadn't upgraded, so transactions would have been dubious until they did so. Functionally, it sounds to me like 0.80 was a better version, but it wasn't backwards compatible and no one had planned for a hard fork.
sr. member
Activity: 247
Merit: 250
Yeah, I don't get it, why didn't everyone just get forced to switch to 0.8 since it didn't have the BDB limit on number of transactions that 0.7 has? Why was it the other way around?
Based on what I've read my guess is that many significant non-mining businesses are still on 0.7 and couldn't upgrade quickly enough.

+1

Doesn't matter which chain is longer if a majority of the people aren't on it.  Breaking changes need to be given lots of warning to be effective.  Trying to force everyone to use 0.8 would have only made the situation worse.  From the chat discussion, I don't think mtgox was using 0.8.  So trading at the largest exchange would be halted until it could be upgraded.  If that doesn't sound disastrous, I'm not sure what does.
legendary
Activity: 1526
Merit: 1134
isn't this kind of different as on the previous forks, clients of old versions where still able to accept 100% percent of all blocks the new versions accepted, but the new versions "filtered" some out, that where created with old versions? So old versions where still able to get on the new chain as soon as it was the longer chain.

No. OK, a bit of history. I was referring two previous hard-forking changes that I'm aware of. There were probably more.

When Bitcoin was first released, it contained two completely fatal bugs that made the entire system worthless. Fortunately, they were found and fixed before Bitcoin  actually had any serious value.

The first bug was that scripts were concatenated before being run instead of just using a shared stack. This meant that anyone could write a scriptSig that always evaluated to true and claim anyone elses coins. Fixed here in v0.3.2:

https://github.com/bitcoin/bitcoin/commit/73aa262647ff9948eaf95e83236ec323347e95d0

https://en.bitcoin.it/wiki/Incidents#CVE-2010-5141

Needless to say, if somebody when this version was first released actually wrote such a scriptSig and stole some coins, that would have caused a chain split between old and new versions. Nobody did because, why bother? I'm not even sure Mt Gox existed back then, iirc that came some months later. Many script opcodes were disabled around this time (which is also a hard-forking change).

The second bug was found some months later, and this time it was exploited:

https://bitcointalksearch.org/topic/strange-block-74638-822

A fixed version of the software was quickly released. There was a race between the old and new chains. After some time (was it 12 hours?) the new chain beat out the old one. Even then, miners who didn't upgrade kept trying to extend the broken chain for a long time.

Back then there was no talk about clearly broken behaviour being some kind of unbreakable "protocol rule". You may see some people say that bugs in the old software become a part of the protocol and yes, this is true within limits. For instance, OP_CHECKMULTISIG contains a dumb bug, but it's not dangerous and can be easily worked around. So it's better for new versions to just not fix the bug, the cost of a hard fork isn't worth fixing it. Old versions being unable to digest even quite tiny blocks is clearly far more serious and the cost of a hard fork is worth it in that case.
legendary
Activity: 1458
Merit: 1006

I wish Mike Hearn wasn't out there saying stuff like this in the press. This is purely his opinion.

Some people have massive and expensive infrastructure built around pre-0.8 bitcoind, scripts, supporting code, etc.

0.8 levelDB bitcoin needs to be backwardly compatible for better or worse, for now and the foreseeable future.

Code:
We definitely need a hardfork.  Version 0.3 and version 0.7 are incompatible, we just didn't know it

Yes, he is free to say whatever he likes ... but since he is one of main devs. people listen to him ... even if he is a loose canon, imho Wink

Code:
We _never_ would have release 0.8 with this behavior if we knew about it.

Everyone can't go on being bug compatible with 0.7 forever just because some people may have painted themselves into a corner. And why 0.7? I'm sure the same argument applied to 0.3 too.
Pre-0.8 does NOT have a bug, it was 0.8 that was not backwardly compatible ... do some reading.

Maybe, but it's not that simple:

Code:
[2013-03-12 17:37:15] zephirum: this was NOT 0.7 vs 0.8
[2013-03-12 17:37:19] websrfr: well the longest is the one that's accepted and you could let it show which chain your transaction is in or if it's in both..
[2013-03-12 17:37:32] zephirum: this was 0.8 vs EVERYTHING ELSE FOR THE ENTIRE HISTORY OF BITCOIN
[2013-03-12 17:37:44] the latter category INCLUDING 0.8
[2013-03-12 17:40:25] zephirum: yes, but that's only due to a bug in 0.8
[2013-03-12 17:40:40] Luke-Jr: technically, it was 0.5.x-0.7.x vs 0.8/0.5.0.1/0.4.1/everything_else
[2013-03-12 17:41:07] Luke-Jr: Is this a bug-bug in 0.8?  Or a pseudo bug because it doesn't adapt to a limitation in 0.7?
[2013-03-12 17:41:13] GMP: no, because *every* client accepted the "everything" chain
[2013-03-12 17:41:20] zephirum: the latter
[2013-03-12 17:41:38] zephirum: the only bug in 0.8 is "not mimicking the behaviour of older nodes on the network"
[2013-03-12 17:41:57] Luke-Jr: i agree that current chain better
[2013-03-12 17:42:11] sipa: a behaviour that was not even known before experienced...
[2013-03-12 17:42:18] grau_: indeed

[2013-03-12 17:42:23] Luke-Jr: okay, so a hard-fork is required to move forward, and it's preferable to do that in a planned manner.  Understood.
[2013-03-12 17:42:23] zephirum: 0.8 wouldn't have made the blocks in the way that they are in the 0.7 chain, but they are at least still valid blocks

[2013-03-12 17:42:30] but from the network-consensus view, 0.8 has the bug
[2013-03-12 17:42:45] as it implicitly widened the rules for block acceptance
[2013-03-12 17:42:46] sipa: but not from miner consensus point of view
[2013-03-12 17:43:06] sipa: and this was addressed by _asking_ (i.e. human intervention) miners to move to 0.7
[2013-03-12 17:43:19] zephirum: bitcoin is ultimately a consensus of its users
[2013-03-12 17:43:48] and we chose for the 'larger' consensus
[2013-03-12 17:43:56] namely the largest portion of users
[2013-03-12 17:44:55] sipa: the final outcome was for devs to recommend a downgrade, which I'll generalize as meaning a recommendation to stay in sync with the majority nodes
[2013-03-12 17:45:10] zephirum: sure, just have a big button in your program to suspend stuff untill stuff becomes clear

[2013-03-12 17:45:22] there ought to be a different term. "Bug" means different things depending on context. 0.7 had a garden-variety bug where it did unexpected things. But 0.8 had a prococol-bug (a fiat-bug, anyone?) where it make blocks old clients wouldn't accept
[2013-03-12 17:45:31] and if a hardfork is going to be required in the future, that would be a good thing to make sure to include in the new protocol
[2013-03-12 17:45:35] since it's a consensus system, everyone would have to know what the consensus would be to know how to proceed

[2013-03-12 17:45:39] num1: i'd say 0.7 has the bug, but 0.8 was incorrect :)
[2013-03-12 17:45:48] incorrect because it failed to mimick the bug
[2013-03-12 17:45:51] sipa: thats a nice way of putting it.

tl;dr:

Code:
num1: i'd say 0.7 has the bug, but 0.8 was incorrect :)
incorrect because it failed to mimick the bug
sipa: thats a nice way of putting it.
hero member
Activity: 731
Merit: 503
Libertas a calumnia
Satoshi made hard-forking rule changes multiple times to fix bugs before 99% of you were even around, so the fact that we now need to do another should not shock or anger anyone. If you can't upgrade past 0.7 or reconfigure it with a larger lock size then your involvement with Bitcoin will end there, but I'd hope nobody has been stupid enough to get themselves into such a situation. Especially as 0.8 is API compatible with 0.7!
+100

Thanks Mike for putting some sense on some of the crazy rants I've read in this thread.
legendary
Activity: 1232
Merit: 1001
Just one question Mike,

isn't this kind of different as on the previous forks, clients of old versions where still able to accept 100% percent of all blocks the new versions accepted, but the new versions "filtered" some out, that where created with old versions? So old versions where still able to get on the new chain as soon as it was the longer chain.

With the new soon necessary fork it's the other way around, old clients will reject new client blocks, so they will stay on their fork if they don't upgrade.

So it's kind of a new problem.

Correct me if I'm wrong.


But I totally agree with the rest +1 from me too.

We can't expect that bitcoin will never have upgrades that make it downwards incompatible. This will happen again and on the bright side this is an opportunity to test this how it can be done in a smooth way on an issue where (almost) everyone agrees that it needs to be addressed.
legendary
Activity: 1148
Merit: 1008
If you want to walk on water, get out of the boat
legendary
Activity: 1526
Merit: 1134
Yes, unexpected crash upgrades are bad news, but if this event had happened at a later time when 0.7 was not widely used, it might have made more sense to just abandon it.
legendary
Activity: 1400
Merit: 1013
Yeah, I don't get it, why didn't everyone just get forced to switch to 0.8 since it didn't have the BDB limit on number of transactions that 0.7 has? Why was it the other way around?
Based on what I've read my guess is that many significant non-mining businesses are still on 0.7 and couldn't upgrade quickly enough.
hero member
Activity: 520
Merit: 500
Yeah, I don't get it, why didn't everyone just get forced to switch to 0.8 since it didn't have the BDB limit on number of transactions that 0.7 has? Why was it the other way around? Technically, 0.8 was winning in chain length also.
Pages:
Jump to: