Pages:
Author

Topic: [FIXED] MacOS X LevelDB Corruption Bounty (10.00 BTC + 200.2 LTC) - page 5. (Read 83883 times)

hero member
Activity: 697
Merit: 501
I run the latest versions of both bitcoin QT and litcoin QT on my Mac running the Maverick OS.  I don't use time machine and I've never had a problem with either program. 

I won't be using Time machine now either.

sr. member
Activity: 263
Merit: 250
Since 0.8x I have the corrupt db (and thus forced rescan) each time that my computer powers off due to battery/electrical failure.

Wtogami: try to power off your computer without shuttind down Bitcoin-QT, the db will probably be damaged.

0.8.5 OMG3 has three patches that prevent other corruption, although not the particular issue of this bounty.  Also we have never seen corruption on 10.6.8.
legendary
Activity: 1148
Merit: 1018
Since 0.8x I have the corrupt db (and thus forced rescan) each time that my computer powers off due to battery/electrical failure.

Wtogami: try to power off your computer without shuttind down Bitcoin-QT, the db will probably be damaged.
sr. member
Activity: 263
Merit: 250
This is on Litecoin 0.8.5.2-rc5 (same as Bitcoin 0.8.5 OMG3) running on MacOS 10.6.8 where it does not corrupt itself.

Does not corrupt or you cannot get it corrupted ? If the latter, then you solved the first step in this bounty but didn't announce it ?

MacOS X 10.6.8 does not seem to corrupt with native Bitcoin-Qt as far as I can tell, so this test doesn't tell us anything.  I am only pointing out that it is possible.
member
Activity: 98
Merit: 10
nearly dead
This is on Litecoin 0.8.5.2-rc5 (same as Bitcoin 0.8.5 OMG3) running on MacOS 10.6.8 where it does not corrupt itself.

Does not corrupt or you cannot get it corrupted ? If the latter, then you solved the first step in this bounty but didn't announce it ?
hero member
Activity: 772
Merit: 500
What filesystems are in use on Mac? And did anyone try my std::stream branch Wink?

Dia
newbie
Activity: 59
Merit: 0
+1 on that... give me some kind of log to start.

I have a possible repro (and potentially solution) on OS X for a db corruption... not sure if its the same issue however.
staff
Activity: 4284
Merit: 8808
Can we get a couple of useful bits of data for someone to work on this:

* Earliest confirmed version of 10.8 with the problem
* A sample of a corrupted DB
* console logs from *during time of corruption* including dmesg and system.log
* Information on how bitcoin built/installed, clang? gcc42? macports/brew for deps?
* if the people experiencing the problem have filevault (FDE) turned on or not, whether it was turned on during the install or after, and if it's ever been cycled on/off
* also whether people who have hit this are using stock fs settings or if have case-sensitivity/etc turned on
legendary
Activity: 1526
Merit: 1134
You can blame me for LevelDB. We switched to it because it was a large (>2x) speedup over BDB and performance is critical for Bitcoin, for obvious reasons. Also BDB sucks in lots of different ways and LevelDB is very well written.

We already know Apple have made some .... questionable ... decisions in their kernel, with regard to fsync (hint: fsync doesn't). That was at least one source of corruptions, which we already fixed.

Given that rather astonishing approach to data integrity there may well be other equally questionable decisions lurking under the covers. The fact that this only happens on MacOS and not any other platform is strongly indicative that Apple have done more than one bad thing.

I am wondering if there is something going wrong with mmap.

https://code.google.com/p/leveldb/issues/detail?id=196

The behaviour of mmap seems like it can sometimes be broken by kernel developers in subtle ways, I got a bug report for the Android app a few months ago which strongly implies mmap on Motorola devices is broken in ways that can cause data corruption. I wonder if POSIX specifies its behaviour tightly enough.
full member
Activity: 121
Merit: 103
If anything, this should serve as a warning for picking up cool new shiny things.

I take there was some discussion about why picking LevelDB was the right choice, surely it wasn't considered only because it performs faster than BDB and is developed at Google ? After that surely there were some good testing in various systems, since this is a very new low level storage, yes ?

the motivation for using leveldb vs other dbs is due to the fact that with large numbers of records, e.g. over roughly 10 mln records, most "normal" dbs start to get really sluggish on inserts and selects. you can see the behavior for yourself by stuffing a ton of records in sqlite, mysql, psql, etc.

leveldb is not so much a db as a key-value store, which means that insert speed can be maintained even when there are a massive number of records, e.g. 250 mln. this is where the "level" in leveldb comes from - it load levels on inserts. the only price you pay for the load leveling is episodic compaction by leveldb. however, when doing selects/lookups on data that is already in leveldb, you must do several seeks, similar to more common databases.

the likely reason leveldb was chosen is that there aren't a ton of great choices for key-value stores. many of the key-value stores besides leveldb have only a few devs and may not be actively maintained. there are also many key-value stores that have questionable data integrity. using a dependency that goes unmaintained means having to change that dep out later, a giant PITA.

the reason the issue that is cited in this thread is so nasty is that not only does bitcoind use leveldb, it uses it in conjunction with flat file storage for the blocks. the act of storing data in flat files and referencing them in the db substantially increases the number and severity of error and failure paths in the combined structure (leveldb + flat file storage). as we can now see, hunting these bugs is very difficult.

perhaps something can be inferred from the way in which leveldb + blocks are corrupted. this would require a dev looking at the db and blocks after they have been hosed.
sr. member
Activity: 263
Merit: 250
If anything, this should serve as a warning for picking up cool new shiny things.

I take there was some discussion about why picking LevelDB was the right choice, surely it wasn't considered only because it performs faster than BDB and is developed at Google ? After that surely there were some good testing in various systems, since this is a very new low level storage, yes ?

I'm just mocking here, obviously. Good luck finding and fixing the issues.

It's working quite well on Linux and Windows.  Also the old BDB corrupted on all platforms, although less often than Mac users experience this current issue.
member
Activity: 98
Merit: 10
nearly dead
If anything, this should serve as a warning for picking up cool new shiny things.

I take there was some discussion about why picking LevelDB was the right choice, surely it wasn't considered only because it performs faster than BDB and is developed at Google ? After that surely there were some good testing in various systems, since this is a very new low level storage, yes ?

I'm just mocking here, obviously. Good luck finding and fixing the issues.
newbie
Activity: 16
Merit: 0
Some observations. My setup uses two drives, one with the OS and a lower speed one for general storage. I don't use time machine like the poster above, and there's nothing else non-standard about my software.

  • only the blockchain stored on the internal SSD boot disk gets corrupted, a blockchain stored on the second SATA HDD is never corrupted
  • corruption seems to happen most often after a system sleep (deep or not), though not always
  • corruption can happen during the initial sync if it is stopped and then restarted
  • corruption can happen with FileVault 2 turned on and off
  • has happened less often since updating to 10.9 only twice so far instead of every few days, though it could just be chance

That's it really. No other behaviour is specific to corruptions for me. Sometimes they happen twice in a day, sometimes not for weeks.
sr. member
Activity: 263
Merit: 250
wtogami,

I can confirm that it has nothing to do with time machine, I do not have time machine.

What version exactly are you running?  There have been multiple fixes.  Please verify specifically with Bitcoin 0.8.5 OMG3.
newbie
Activity: 23
Merit: 0
wtogami,

I can confirm that it has nothing to do with time machine, I do not have time machine.
newbie
Activity: 14
Merit: 0
I'd like to point everyone's attention to this thread on the LiteCoin forums --

https://forum.litecoin.net/index.php/topic,7147.msg55666.html#msg55666

I have an LTC wallet that doesn't play well with others.  I have no problems being someone's guinea pig as I'd really like to get it working again on my laptop.  

For the new post; I DO have TimeMachine enabled.  

Just for consistency;

Here is the error that Litecoin-Qt keeps throwing;

Code:
Last login: Mon Nov 18 18:27:48 on ttys000
Bismarcks-MacBook-Pro-2:~ Bismarcks$ /Applications/Litecoin-Qt.app/Contents/MacOS/Litecoin-Qt ; exit;
2013-11-18 18:32:21.744 Litecoin-Qt[12289:507] CoreText performance note: Client called CTFontCreateWithName() using name "Arial" and got font with PostScript name "ArialMT". For best performance, only use PostScript names when calling this API.
2013-11-18 18:32:21.745 Litecoin-Qt[12289:507] CoreText performance note: Set a breakpoint on CTFontLogSuboptimalRequest to debug.
2013-11-18 18:32:21.748 Litecoin-Qt[12289:507] *** WARNING: Method userSpaceScaleFactor in class NSView is deprecated on 10.7 and later. It should not be used in new applications. Use convertRectToBacking: instead.
2013-11-18 18:32:27.518 Litecoin-Qt[12289:507] CoreText performance note: Client called CTFontCreateWithName() using name "Courier New" and got font with PostScript name "CourierNewPSMT". For best performance, only use PostScript names when calling this API.
Assertion failed: (pindexFirst), function GetNextWorkRequired, file ../litecoin/src/main.cpp, line 1149.
Abort trap: 6
logout

[Process completed]

full member
Activity: 121
Merit: 103
it is funny to see this considering that marco just penned a blog entry

https://blog.conformal.com/deslugging-in-go-with-pprof-btcd/

about how bitcoind uses leveldb vs what we do in btcd. to quote

"Dealing with corrupt journals/flat-file/database is not only complex it has the potential of a very negative user experience. If corruption of any sort is detected then the database components must be validated, this is inherent to the its size a very long operation."

apparently when using flat file storage for blocks and referencing by offset versus storing the entire block in leveldb, there are lots of unsavory ways for leveldb to fail.

leveldb is a harsh mistress.
newbie
Activity: 23
Merit: 0
Litecoin wallet was crashing for me, saying DB corruption, if I open terminal and enter

cd /Applications/Litecoin-Qt.app/Contents/MacOS

./Litecoin-Qt -reindex

It works..

These messages are then displayed in terminal,

2013-11-18 19:57:36.821 Litecoin-Qt[991:507] CoreText performance note: Client called CTFontCreateWithName() using name "Arial" and got font with PostScript name "ArialMT". For best performance, only use PostScript names when calling this API.

2013-11-18 19:57:36.821 Litecoin-Qt[991:507] CoreText performance note: Set a breakpoint on CTFontLogSuboptimalRequest to debug.

2013-11-18 19:57:37.657 Litecoin-Qt[991:507] CoreText performance note: Client called CTFontCreateWithName() using name "Courier New" and got font with PostScript name "CourierNewPSMT". For best performance, only use PostScript names when calling this API.
sr. member
Activity: 437
Merit: 255
  • Document how anyone can consistently reproduce the data corruption.
  • Explain why it happens.
  • Write a code fix that is acceptable to the Bitcoin core developers and merged into Bitcoin git master.

Please refer to my posting: https://bitcointalksearch.org/topic/m.3622968

Since I use Windows not IOS the situation may differ slightly. But at least it may be a hint.

If you want to donate me:  1METhkrvz2r9d3zkFPQrHnpFC1BjCs64Zf
hero member
Activity: 772
Merit: 500
Is it not possible that LevelDB or something else related to the data files is failing silently?

I would say that's at least not impossible...

Dia
Pages:
Jump to: