Author

Topic: Frequent database corruption? (Read 1059 times)

newbie
Activity: 55
Merit: 0
April 13, 2013, 08:22:10 AM
#14
Again - for the record FWIW. When arranging to replace the problem RAM stick, I discover that the problem RAM is incorrect for the mboard, but presumably only marginally.... It looks as if the (Asus) mboard was originally made for ddr2 and later versions could use ddr3, but only to a speed of 1066MHz. Some DDR3 1333 MHz will down speed to 1066,  but many will not. The RAM which works well is labelled as 1066. The problem RAM is only labelled 1333. I think this is the problem. The original purchase of the new kit had a ram 'upgrade' fitted at my request, obviously with incorrect RAM.
newbie
Activity: 55
Merit: 0
April 08, 2013, 03:05:20 PM
#13
For the record: the new machine has continued to behave without problems, helping to confirm the belief that the old machine has some sort of hardware problem. Various work on the old machine: xsensors did not seem to indicate anything unusual including  PSU voltages, so, with a first assumption of PSU ok, I thought the next easiest test would be the RAM. I had recently done a 6 hour RAM test, which indicated no errors at all. However the RAM consists of two sticks each 2GB, so by removing one stick at a time, and also trying each RAM slot, I eventually found that one RAM stick was giving intermittent trouble. (Happy relief).
However hard I worked the remaining single good RAM stick, it never gave trouble, I got no database corruption when nearing the end of complete block chain download :-)
I think this has been the problem. It was a difficult fault to troubleshoot because over the few years I have had this machine, it has worked well. Mostly. With only very occasional apparent crashes, not repeatable. So this RAM fault survives a 6 hour multi data test, but falls down, more or less repeatably, during the more recent block chain calculations in a way which caused database corruption.
newbie
Activity: 55
Merit: 0
March 24, 2013, 06:17:23 AM
#12
Just to say I tried new hardware, a faster machine anyway, and it has been stable. So the conclusion has to be that the original machine has a problem which appears more or less reliably when the demands increase. What a pain. Plan to try a new PSU first.....
newbie
Activity: 55
Merit: 0
March 23, 2013, 08:06:54 AM
#11
For the record, following post #9, I repeated the download from a clean start, and this looks pretty similar fwiw:
It happens very near to the end of the downloads, when most blocks have already been downloaded.

Code:
bitcoin1@novatech1:~/Downloads/bitcoin-0.8.1-linux/bin/32$ ./bitcoin-qt
*** glibc detected *** ./bitcoin-qt: double free or corruption (!prev): 0xa5bb8bb0 ***
======= Backtrace: =========
/lib/i386-linux-gnu/libc.so.6(+0x75ee2)[0xb675eee2]
/usr/lib/i386-linux-gnu/libstdc++.so.6(_ZdlPv+0x1f)[0xb698651f]
./bitcoin-qt[0x80fc1b5]
./bitcoin-qt[0x80fc1a7]
(etc etc)
newbie
Activity: 55
Merit: 0
March 23, 2013, 03:08:35 AM
#10
.... The client is very buggy and sluggish so I'm not surprised.
Thanks for your comment. But which client are you thinking of bitcoin-qt or system monitor?
newbie
Activity: 55
Merit: 0
March 23, 2013, 02:10:28 AM
#9
Whoops. I spoke too soon.
I left bitcoin-qt running last night, and it looks like it did not get very far. something crashed out , and the terminal i used to start bitcoin-qt reported a lot of information, starting with the lines:
Code:
bitcoin1@novatech1:~/Downloads/bitcoin-0.8.1-linux/bin/32$ ./bitcoin-qt
*** glibc detected *** ./bitcoin-qt: free(): invalid next size (normal): 0x12a11f30 ***
*** glibc detected *** ./bitcoin-qt: corrupted double-linked list: 0x129ec9d0 ***
======= Backtrace: =========
/lib/i386-linux-gnu/libc.so.6(+0x75ee2)[0xb6736ee2]
/usr/lib/i386-linux-gnu/libstdc++.so.6(_ZdlPv+0x1f)[0xb695e51f]
./bitcoin-qt[0x80c59cd]
(etc)

Should I still be suspecting my hardware?
newbie
Activity: 42
Merit: 0
March 22, 2013, 05:45:17 PM
#8
If you have a very bad connection or a slow computer that may be the issue. The client is very buggy and sluggish so I'm not surprised.
newbie
Activity: 55
Merit: 0
March 22, 2013, 05:39:38 PM
#7
Mmm. I have a slightly different direction on this just this evening. I repeated the recent test conditions  but I did not run the 'System Monitor' app. I have liked System Monitor, it has been a favourite of mine (used it a lot.....), because I like to view the CPU, ram and bandwidth....  Anyway, pretty similar tests (without System Monitor running)  have so far not led to any database corruption etc. So my current stage of thinking here is to theorise that it is System Monitor which is associated with my crashes  and corruptions whatever, and see how things go. System Monitor does have some ongoing bugs listed, and I am aware it is pretty heavy on resources, but I still would like it  - if it were to be proven clean here. Meanwhile I can do without it. Fingers crossed. ;-)
newbie
Activity: 55
Merit: 0
March 22, 2013, 02:43:02 PM
#6
Mmm. RAM check for 7 hours  no errors.
Difficult to find specific clues. I can use different hardware soon, but not  for a while yet.
1) What can I glean from the debug log? I am not certain about this but I have restored from backup ok a couple of times after database corruption, an din each case, the most recent backup contained corruption, but the previous one worked ok. Which is strange because  I only back up when things are running well and with no problems. It looks a bit as if some corruption occurs or is not challenged, only to be discovered later. Is this possible somehow?
This is possible and even quite common: most of the time writing to disk involves copying data and/or transforming it before writing. Corruption can happen during this process without modifying the original data. As long as the program runs it holds a good copy of the data and don't exhibit any strange behavior even if corrupted data has been written to disk. But when the program is started again and reads data from disk it fails.
The debug.log won't help you, in fact not much can when you can't trust your hardware.

I am working on troubleshooting tests. The next easiest thing to change would be the psu, however, it would be nice to get some more systematic evidence. Using a monitoring tool  (xsensors) I see that all (static) voltages and temperatures etc look ok, stable, and well in range. I started with an empty database and watched. Over 35 minutes no problem. I was also running system monitor, which I have sometimes wondered might cause some instability (?) but anyway, I also ran two accounts, a user account and a bitcoin-user account, with each logged in. Even so, sensors seemed to indicate all ok. In the user account I have an rsync script to sync a couple of hard drives, I ran this early on, when presumably the early (smaller??) blocks were downloading. No problem. After 30 minutes, I did the same thing, and while the script was briefly running  I switched user to the bitcoin user.  Almost immediately I saw a 'database error' window appear, and the wallet stopped. This is pretty similar to the situations and errors I have been experiencing  with bitcoin since I began not long ago. I could guess that with the later, longer block calculations(?) or longer writing to disk(?) any problems in hardware would be more likely to show up later  rather than earlier in the initial downloading process (?)
The reported voltages, temperatures etc from xsensors all still seemed ok, so if the PSU is a problem it would be more likely something such as ripple(?), or smoothing of transients, I aim to change the PSU anyway, and then see.
I do not think I have seen any errors in the past of actually writing to disk, but as previously mentioned, I do very occasionally see crashes, and frequent problems with bitcoin - in rather similar circumstances, using two logged in accounts. The hard drives all check out ok.
It seems quite useful in a way that I have perhaps prompted a database error in circumstances I have begun to suspect, hopefully i can use it as a test action.

Any further comments re CPU, mainboard, PSU etc would be appreciated....
:-)
newbie
Activity: 55
Merit: 0
March 20, 2013, 09:42:05 AM
#5
Thanks - very useful comments.
(ouch)
I did a simple 2 minute long stress test (cpuburn) on each of 3 of the 4 processors indicated. For some reason the 4th cpu did not get called. No apparent prob. PSU seems next easiest (not) .....
hero member
Activity: 896
Merit: 1000
March 20, 2013, 06:36:31 AM
#4
Mmm. RAM check for 7 hours  no errors.
Difficult to find specific clues. I can use different hardware soon, but not  for a while yet.

1) What can I glean from the debug log? I am not certain about this but I have restored from backup ok a couple of times after database corruption, an din each case, the most recent backup contained corruption, but the previous one worked ok. Which is strange because  I only back up when things are running well and with no problems. It looks a bit as if some corruption occurs or is not challenged, only to be discovered later. Is this possible somehow?

This is possible and even quite common: most of the time writing to disk involves copying data and/or transforming it before writing. Corruption can happen during this process without modifying the original data. As long as the program runs it holds a good copy of the data and don't exhibit any strange behavior even if corrupted data has been written to disk. But when the program is started again and reads data from disk it fails.
The debug.log won't help you, in fact not much can when you can't trust your hardware.

2) I am running bitcoin-qt by use of a terminal and ./bitcoin-qt This means that then a bitcoin window appears (wallet?) and I usually also then start the debug window to see other stuff - such as number of peers. My question - how should I be closing bitcoin with this method? So far, I am using the bitcoin window 'close', which appears to work - the debug window closes, the bitcoin window closes, and the prompt in the terminal  reverts to normal. Does this all sound ok? I am trying to verify that my actiona are not causing problems, corruption etc?

Your way of running bitcoin-qt is fine.
newbie
Activity: 55
Merit: 0
March 20, 2013, 06:06:08 AM
#3
Mmm. RAM check for 7 hours  no errors.
Difficult to find specific clues. I can use different hardware soon, but not  for a while yet.

1) What can I glean from the debug log? I am not certain about this but I have restored from backup ok a couple of times after database corruption, an din each case, the most recent backup contained corruption, but the previous one worked ok. Which is strange because  I only back up when things are running well and with no problems. It looks a bit as if some corruption occurs or is not challenged, only to be discovered later. Is this possible somehow?
2) I am running bitcoin-qt by use of a terminal and ./bitcoin-qt This means that then a bitcoin window appears (wallet?) and I usually also then start the debug window to see other stuff - such as number of peers. My question - how should I be closing bitcoin with this method? So far, I am using the bitcoin window 'close', which appears to work - the debug window closes, the bitcoin window closes, and the prompt in the terminal  reverts to normal. Does this all sound ok? I am trying to verify that my actiona are not causing problems, corruption etc?
hero member
Activity: 896
Merit: 1000
March 19, 2013, 02:32:46 PM
#2
I do not see many similar complaints around like this, so I begin to suspect my hardware. I do (very occasionally) see a freeze, when using say, rsync, which presumably works the hardware intensively (?).
Freezes caused by rsync? You bet you have an hardware problem.

The initial Bitcoin sync is intensive... Run memtest to see if your RAM is OK (it's one of the mos likely suspects). Use smartctl -a to check that the drives are OK. After that it can be CPU/PSU/MB which is harder to check without swapping them with known good ones...

Keep posting in the jail and you'll get out after a few messages.
newbie
Activity: 55
Merit: 0
March 19, 2013, 10:26:46 AM
#1
For several days I have been trying to get bitcoin up and running, with only a small success. On one occasion I achieved a completed verified situation. :-) However, - I am a very newbie - if anything prompts a re download (or re index?)  - all goes well until near to the end (many hours :-(  ) and then I usually see an error about database corruption or the wallet just disappears (crash?), and a re start then shows database corruption. I started with bitcion very recently - just  before the version 0.8.1 - so I actually began by using 0.8.0 (using Ubuntu 12.04 32 bit), and I thought at first that the problems I saw might  be stopped after this maintenance release. But I am now using 0.8.1, and the same thing is happening again. I do not see many similar complaints around like this, so I begin to suspect my hardware. I do (very occasionally) see a freeze, when using say, rsync, which presumably works the hardware intensively (?).

I would be grateful for comments, tia

[also - how do I get out of newbie jail??]
Jump to: