Author

Topic: Why use Berkeley DB for the wallet? (Read 3877 times)

member
Activity: 89
Merit: 10
January 03, 2012, 12:42:52 PM
#14

I re-tested while at work today. Screenshot after about 9 hours when I came home:

legendary
Activity: 1512
Merit: 1049
Death to enemies!
January 02, 2012, 05:59:41 PM
#13
The blockchain is verified and scanned for coins that might belong to you according to wallet.dat file. This is WORK and your computer is doing it. You must do the work yourself, actually the bitcoin on your computer must do the work, instead of central banking authority. Arbeit Macht Frei.
full member
Activity: 210
Merit: 100
January 02, 2012, 04:39:07 AM
#12
Torrents was not running.

Just kidding about torrents Smiley

I'll do a test on XP later today, but I can already tell you it's gonna look far worse than my previous test.
Firstly, XP does far less caching using an older algorithm and secondly, this time the testing machine will only have 2 gigs of RAM.

In case it wasn't obvious from my previous post, bitcoin client WANTED to transfer 22.7 gigs on/off the drive and it was only thanks to Windows 7 that this snafu didn't actually happen.
member
Activity: 89
Merit: 10
January 02, 2012, 02:42:38 AM
#11

Torrents was not running, the only mistake I may have made is a decimal place when reading the number of gigs.
However the continous trashing sounds the HDD made for an entire day and your numbers indicate I did read them correctly.
There was no excessive use of CPU or RAM for me either.

Since I'm now back home I can test it again to make sure.
full member
Activity: 210
Merit: 100
January 01, 2012, 09:16:06 PM
#10
Taskmanager shows the client had written 128Gigs and read about 30gigs of data, but the blockchain itself is about 1gig?

Uhmmm... sure you didn't confuse bitcoin with utorrent there, partner? Smiley

Holy heck, almost 160 gigs, that sounds absurd.
In fact, it sounded just crazy enough to make me wanna test what mr Bitcoin-qt 5.1 does while downloading the whole chain:

    It took exactly 4 hours and 17 minutes for the whole blockchain to be downloaded.
    No excessive use of CPU or memory was noticed.
    891 megs of data was downloaded. Combined hard disk reads and writes: 22,7 gigabytes.(1)
    Holy sh*t, that's a lot of disk activity... or not:
    the hard drive actually only transfered 15 gigabytes SINCE REBOOT 25 hours ago.
    Thanks to read/write caching most of bitcoin.exe's activity never actually touched the drive.(2)

The machine I ran the test on is a lowly HP nx7300 laptop with upgraded memory and hard drive (Core2Duo T5500, 3 GB RAM, Kingston SNV-425S2 SSD drive) running Windows 7 x64.


Gavin, in the light of recent wallet.dat failures(3) I'd like to ask whether the bitcoin client is using Windows's transactional NTFS file operations in any way?(4) That would seem to be desirable... Transactional file ops are supported since Windows Vista and boil down to merely calling the transactional counterparts of the old API calls.


References:
(1)



(2)

(3)
https://bitcointalksearch.org/topic/epic-fail-56023
https://bitcointalksearch.org/topic/help-recovering-from-walletdat-55975
(4)
http://msdn.microsoft.com/en-us/library/aa365738%28VS.85%29.aspx
member
Activity: 89
Merit: 10
January 01, 2012, 02:10:28 PM
#9

I'm not a database expert nor high level programmer, however there seems to be much room for improvement.

Here are some observations:

-After installing the bitcoin client it took a day of disk trashing to download the blockchain to a laptop with a mechanical drive.
-Taskmanager shows the client had written 128Gigs and read about 30gigs of data, but the blockchain itself is about 1gig?
-After 2 days offline the client trashes for about 3 minutes before the client is up and running.

I did not notice the trashing before since my main comp uses SSD.
Client in question was 0.5.1 on XP-32bit, 8Mbit ADSL,  T7200 2.0GHz dual core, 2GB RAM,  Hitachi Travelstar 7K100

hero member
Activity: 991
Merit: 1011
December 31, 2011, 08:05:08 PM
#8
The Satoshi bitcoin code could certainly do a better job of helping users recover from any/all of the above, although I personally think that development time would be better spent on the "what if my computer catches fire" scenario-- can we make it really easy for users to securely backup and restore their wallets off-site?

i am totally with you.
but maybe first make it easy to backup at all?
legendary
Activity: 1652
Merit: 2311
Chief Scientist
December 31, 2011, 05:39:34 PM
#7
Wallet.dat files don't get corrupted very often; blkindex.dat or addr.dat corruption is much more common (which makes sense, they are much larger and changes all the time as new blocks are added/indexed).

A lot of reported "database corruption" has been Berkeley DB log file incompatibility (the .dat files are compatible between 4.* releases and across operating systems; I know the log files are NOT compatible from 4.7 to 4.8, I think they're cross-OS compatible too but could be wrong about that).

The Satoshi bitcoin code could certainly do a better job of helping users recover from any/all of the above, although I personally think that development time would be better spent on the "what if my computer catches fire" scenario-- can we make it really easy for users to securely backup and restore their wallets off-site?
legendary
Activity: 1428
Merit: 1093
Core Armory Developer
December 31, 2011, 10:32:12 AM
#6
I don't see why this scheme is problematic.  It's guaranteed to be at least as safe as your regular filesystem operations of modifying a file, which is already extraordinarily reliable.  Perhaps, the argument worth making is that I should avoid any "automatic" recovery based on flag files.  Luckily, I do have error detection/correction built-in, but it's not used to determine which file is potentially corrupt... maybe I should...

So is there a scheme that is guaranteed to work?   I mean, I'm not sure how you could guarantee ACID-based DB operations if you can never count on in-order filesystem ops.

I don't know if wallet corruptability problems still exist, but 6 months ago when I first got in to Bitcoin, I was constantly getting corrupted wallet files.  Probably due to the client or OS crashing, somehow getting interrupted during an update.  Gavin even wrote a tool to help recover corrupted wallets... instead I pulled all the keys out of my wallet and decided I was going to make my own client Smiley


714
member
Activity: 438
Merit: 10
December 31, 2011, 10:20:40 AM
#5
Specifically, I have file A and backup file A'.  Every time I modify A, I first make sure A and A' are the same, then I touch a flag file to identify I'm about to modify A.  Once A is done, I write a flag specifying I'm about to modify A' (identically), remove the flag for A, and then start modifications of A'.  Once that is done, I delete the last flag.  If the computer crashes during update of either one, I will see the files are different, and see which update flag is there to let me know which one is corrupted.  Then I just restore from the other one.  (and I never give the user any data until the whole operation is complete).

It took me more than a few seconds to remember where I had seen something resembling this before. Btrieve.

Following your example, in Btrieve A and A' each were each a map of used database pages within a file with a single flag indicating which was current. An internal commit operation would toggle which map was the current consistent database image. An update to A involved making a changed copy of a page that would be part of A' until the commit point occurred. No page-by-page recovery comparison was required in the event of a crash/halt as effectively A' would simply appear to have never existed after a crash.

File systems that reorder write operations to optimize disk operation speed could hose this scheme. Even use of "raw" disk isn't foolproof as reordering of operations could occur at the disk driver level if, for example, elevator seek was implemented there. This property only gets worse as storage gets bigger and more complex. Once one gets into things like EMC storage units where thousands of disks and hundreds of controllers are active, determinacy of order of operations is pretty much not assured. Not too long ago uninterrupted power was a hard requirement for EMC storage, the vendor wouldn't even support configurations that didn't have it.

Is corruption of wallet databases a frequent issue? I've never had it happen, that's why I ask.

legendary
Activity: 1428
Merit: 1093
Core Armory Developer
December 31, 2011, 09:14:14 AM
#4
I completely agree with this.  I think it's much more important to have a "transparent" wallet, than one that is easily corruptable in a DB format.  Of course, the benefit of the DB is the fact that I should have ACID operations (guaranteeing integrity of changes even if the computer loses power at the wrong time).  But there comes other problems, like the fact that the DB doesn't necessarily overwrite data you wanted to remove:  I was actually the one that discovered the bug of unencrypted private keys left in your wallet after encrypting.

I will be releasing my client soon (Armory), and I decided to go the polar opposite route.  Flat, transparent, binary file, with in-place operations for encryption, and a synchronous backup that is updated with the main file in such a way that I can always tell if one is corrupted and which on it is.   It was a bit of work getting it together, but I think it's critical that any wallet-file implementation have such atomic operations -- I would agree with Berkeley DB for this reason, except that it seems to get corrupted all the time, anyway.

Specifically, I have file A and backup file A'.  Every time I modify A, I first make sure A and A' are the same, then I touch a flag file to identify I'm about to modify A.  Once A is done, I write a flag specifying I'm about to modify A' (identically), remove the flag for A, and then start modifications of A'.  Once that is done, I delete the last flag.  If the computer crashes during update of either one, I will see the files are different, and see which update flag is there to let me know which one is corrupted.  Then I just restore from the other one.  (and I never give the user any data until the whole operation is complete).

As for the topic of too many keys... I don't know how many bytes are stored by the Satoshi client for each address, but Armory uses 237 bytes... which means that if I get a million customers (which is ludicrous), I will be at 237 MB of storage used.  And anyone processing that many transactions will have dozens of GB of RAM on their system, so they will have no problem keeping it in RAM, much less keeping it on their HDD.  I imagine the Satoshi client is on the same order-of-magnitude...

legendary
Activity: 1890
Merit: 1086
Ian Knowles - CIYAM Lead Developer
December 31, 2011, 02:10:52 AM
#3
theoretically it could become big, if you are a business with many customers

Well I guess if you are going to use separate addresses per customer (or maybe per sale item) then this could indeed become a problem.

I was thinking more of the "home user" but in order to work well with either type of user (if a simpler wallet persistence mechanism were to be entertained) then a configuration option would need to be added.

As for Amazon accepting Bitcoin - I wouldn't hold my breath.  Smiley


Cheers,

Ian.
sr. member
Activity: 306
Merit: 257
December 31, 2011, 01:57:12 AM
#2
theoretically it could become big, if you are a business with many customers
BTW when doyou think Amazon switches over to bitcoins?
legendary
Activity: 1890
Merit: 1086
Ian Knowles - CIYAM Lead Developer
December 31, 2011, 01:33:14 AM
#1
Hi all,

After following a couple of recent problems on this forum that people are having trying to recover corrupted encrypted wallets that seemingly got this way due to Bitcoin being shutdown unexpectedly (although not whilst actually generating keys) has made me wonder why the wallet is being stored in the DB?

Is it because the wallet size is expected to be potentially be too big to buffer in RAM?

Is it because it needs to be written to at some other time apart from when issuing a new transaction?

Why not just store it in a text file (plain/encoded) which is read/written in its entirety when the need arises and kept closed at all other times.


Cheers,

Ian.
Jump to: