Pages:
Author

Topic: Berkeley Database version 4.8 requirement unsustainable (Read 7283 times)

legendary
Activity: 1456
Merit: 1081
I may write code in exchange for bitcoins.
Last time I built bitcoin you could pass a parameter to ./configure "--with-incompatible-bdb" and then use whatever bdb was available on your system.  This was really helpful for not having to dig up ancient software.  I tend to agree with the thrust of this thread that it would be nice to figure out a way to further mitigate this issue, but it's already been explained upthread how different versions of bdb are going to result in incompatible wallets.  Anyway, in the use case where you're not trying to recover an old wallet from years ago, it really is pretty simple to build bitcoin using whatever bdb is available in your distro's repository.
legendary
Activity: 2128
Merit: 1073
@2112:
The issue isn't really one of distribution as much as it's an issue of usage. Reliance upon a library that is antiquated (even by the library's own standards) is pure nonsense. It's akin to a major car company using the seats of the 1923 model of someone else's truck in their 2018 model sports car. Undecided
What a dumb comparison!

Bitcoin is, for better and for worse, effectively married to BerkeleyDB. It is used to operate on the keys stored in the wallet.dat. It will have to be supported probably forever, at least in the read-only mode, to safely import the content of legacy wallet.dat into some more modern storage layer that is both better designed and easier to integrate with.

Dumb fanboism and upgrade treadmill is a job safety trick that the open source people learned from the best ones in Redmond and Cupertino. Not that BerkeleyDB is perfect, but it is stable, and will have to be supported practically forever, even if only in a separate legacy import module.
hero member
Activity: 1092
Merit: 552
Retired IRCX God
@2112:
The issue isn't really one of distribution as much as it's an issue of usage. Reliance upon a library that is antiquated (even by the library's own standards) is pure nonsense. It's akin to a major car company using the seats of the 1923 model of someone else's truck in their 2018 model sports car. Undecided
legendary
Activity: 2128
Merit: 1073
Secondly, I'd like to point out that: 3 years and 5 major versions of Core after that quote, we're still stuck with trying to make a version of BDB from 2010 work in a 2017 operating system (despite the fact that both Core and BDB have had dozens of releases since that time); I'd say there's definitely no one "rushing into it".  Roll Eyes
The BerkeleyDB is a special case for compatibility of their binary files. Having the correct version of source code doesn't make the binary files fully compatible. IIRC BerkeleyDB intentionally stores somewhere a hash of concatenation of sizeof's of certain structures. That hash value is very dependent on the exact compilation flags and other details of the compilation environment. Since the official Bitcoin Core client is not built in the default environment but in Gitian, the default BerkeleyDB (utilities and shared libraries) available in the Linux distribution will not be fully compatible with the one build under Gitian.

To achieve full compatibility with the user's code the Bitcoin Core distribution should probably distribute their own internal Gitian build of BerkeleyDB (utilities and archive libraries) in a separate, optionally downloadable file. Single zip file would do for all platforms built with the Gitian, it is of somewhat limited interest, only for a subset of developers.

Trying to operate on BerkeleyDB binary files with the default shared library and default utilities creates some hard to understand and debug special cases/problems in the 3rd party software. The number of hidden ways to shoot yourself in the foot is bewildering. Even before Oracle acquired them SleepyCat made some unusual decisions in their design. My best guess is that instead of binary compatibility their optimized for minimum size of the executable code.

Trying to undo those SleepyCat decisions would probably require maintaining a separate fork of BerkeleyDB somewhere in the Bitcoin Core's source tree. This would somewhat parallel the situation with LevelDB, where separate fork is being maintained to improve its ACID resilience from hardware/OS crashes.

In summary, I agree with the decision to "proceed with caution". I definitely think that using the distribution's default build of BerkeleyDB is not the "best" way. The Linux distributions (or MacOSX builds) are made that way to optimize/minimize total disk and memory footprint of the entire distribution. Bitcoin Core is such a special case that for the sake of safety it is better to optimize for another goal and don't rely on the pre-distributed code (source and binaries).
hero member
Activity: 1092
Merit: 552
Retired IRCX God
Changing to our own append only format has been the tentative plan for a long time.  BDB is not acceptable going forward for many reasons, and we've already eliminated it for non-wallet needs.  Breaking wallet compatibility is awkward though so no need to rush into it...
Firstly, I would like to thank achow101 for being the arbiter of truth and validity. Without his guidance, we wouldn't know the proper place, or time, to express our concern that it is a pain in the ass to get Core to work in any modern version of Linux (especially since this concern is raised again with the new release of Debian 9).

Secondly, I'd like to point out that: 3 years and 5 major versions of Core after that quote, we're still stuck with trying to make a version of BDB from 2010 work in a 2017 operating system (despite the fact that both Core and BDB have had dozens of releases since that time); I'd say there's definitely no one "rushing into it".  Roll Eyes

achow101, please, feel free to let us/me know the exact rule on cross-thread quoting, whereby an old thread that is re-raised with current OS concerns (for an OS released 9 days ago), should say open (but not posted to) and a new thread should be started and quotes used out of context....

legendary
Activity: 2128
Merit: 1073
To be honest I don't get the point you are trying to convey. How does this relate to benchmarking or distributed transaction processing?
It's relating to your previous claim of a need for distributed transaction which does not exists IMHO, as blockchain & wallet can be completely seperated in terms of transactions ('You want to be able to produce a distributed transaction that updates both Bitcoin wallet and some external database as a single transaction that either goes to completion in both databases or gets rolled back in both databases')
I think I understand our miscommunication. You are thinking of "transaction" like defined in Bitcoin class CTransaction. I'm thinking of "transaction" like defined in the usual financial terms:

https://en.wikipedia.org/wiki/Transaction_processing
https://en.wikipedia.org/wiki/Transaction_processing_system

Writing financial database applications without transactional consistency is essentially a form fraud as committed by GLBSE, ASICMINER (and others that I don't recall at this moment). I remember those two as the first two large examples of sending outgoing Bitcoin transactions without recording them properly in their auxiliary databases. They performed the Bitcoin "sendmany" twice and then publicly appealed to the recipients to return the overpayment.

Append-only is indeed most researched, but as you also stated, there is nothing 100% predictible or standardized in terms of behavior, hence you cannot assume any simple append-only code will behavie correctly or consistently for simple code across platforms (in case of power failures or crashes).
Properly implemented append-{only,mostly} data storages provide the same guarantees as the general-purpose ones. I don't see any reason why you are making those caveats, beyond that probably you haven't seen the "properly implemented" ones.

Actually mempools (and non-confirmed or non-confirmable transactions) contain  many business-critical data that is very valuable long-term:

It could, but should it? Does not those fall into the realm of custom needs?

What I'm arguing about here is to have bitcoin core as a foundation, a bitcoin protocol toolbox, a reference implementation, that can be built upon and expanded, rather than the alpha and omega of bitcoin.

So the more open, standard and modular it is, the easier it is to build and expand upon.
Every custom format or storage reduces openness (practical openness, the one that matters), and does not necessarily improve performance or usability (like the current wallet format).
You are just self contradictory. You want the code to be toolbox for possible extensions yet consider CTxMemPool appropriate abstraction. CTxMemPool should be an abstract base class that interfaces to one of many https://en.wikipedia.org/wiki/In-memory_database implementations.

Basically I sense that you have exactly zero previous experience with financial software and the relevant accounting and auditing practices.

legendary
Activity: 1100
Merit: 1032
To be honest I don't get the point you are trying to convey. How does this relate to benchmarking or distributed transaction processing?
It's relating to your previous claim of a need for distributed transaction which does not exists IMHO, as blockchain & wallet can be completely seperated in terms of transactions ('You want to be able to produce a distributed transaction that updates both Bitcoin wallet and some external database as a single transaction that either goes to completion in both databases or gets rolled back in both databases')

Append-only is indeed most researched, but as you also stated, there is nothing 100% predictible or standardized in terms of behavior, hence you cannot assume any simple append-only code will behavie correctly or consistently for simple code across platforms (in case of power failures or crashes).

Actually mempools (and non-confirmed or non-confirmable transactions) contain  many business-critical data that is very valuable long-term:

It could, but should it? Does not those fall into the realm of custom needs?

What I'm arguing about here is to have bitcoin core as a foundation, a bitcoin protocol toolbox, a reference implementation, that can be built upon and expanded, rather than the alpha and omega of bitcoin.

So the more open, standard and modular it is, the easier it is to build and expand upon.
Every custom format or storage reduces openness (practical openness, the one that matters), and does not necessarily improve performance or usability (like the current wallet format).
legendary
Activity: 2128
Merit: 1073
1) Block explorer is not a valid benchmark. You want to be able to produce a distributed transaction that updates both Bitcoin wallet and some external database as a single transaction that either goes to completion in both databases or gets rolled back in both databases.
That's incorrect, and is actually a problematic design if you force that dependency.

What you need are actually 3 databases, which can be completely independent from each other:
- a blockchain DB (same as explorer DB), that can be restricted/filtered to only tx that concern you for size considerations
- a key DB (which will hold your key pool, xpriv, etc.)
- a wallet helper DB to hold metadata (address labels, payments extra info, accounts...)

The first two DB are only related by a filtering optimization, so no cross-DB transactions are ever necessary, and all the heavy lifting happens in the blockchain DB. When creating a new key, you always write it to the Key DB first, and it can only end up in the blockchain DB after having been confirmed. New unconfirmed tx are persisted in the wallet helper (with metadata and for eventual re-broadcasting), then go straight to the mempool for broadcasting. This could also help clean up the annoyances with new unconfirmed tx in the current bitcoin core code.

It would also isolate wallet management from key and tx creation/signing, the blockchain & key DB would be part of the core, but wallet helper could/should be considered as not being "core", but something alternative wallets could replace or do differently.

Key DB is really critical in terms of security, wallet helper much less so (if it leaks, you would leak some private info, but no funds), having it separate also means it will be simpler to move it to dedicated, standardized hardware.
To be honest I don't get the point you are trying to convey. How does this relate to benchmarking or distributed transaction processing?

2) Append-only files don't mean having to read and parse the whole file every time. For a popular example check out the PDF format that is designed to be read from the end, including appending new page indexes to the end.
But then you need some validation and recovery as what you appended could be partial or corrupted if a crash or power issue happened while you were writing it. Basically you would be re-inventing the ACID properties of a DB. Plenty of pitfalls there, especially with a cross-platform target, plenty of testing required.

You want to avoid NIHS and Yet-Another-Attempt-At-A-Crash-Proof-File-Format.
I disagree on that. Append only or append mostly data storage is a new and fertile area of research. It is mostly motivated by the constraints of the flash storage, where 1->0 bit transitions could be written (or rewritten) with small blocks (e.g. 4kB) but the 0->1 transitions require erasing of big blocks (e.g. 256kB).

Probably not much is published or open-sourced in this area. Flash storage vendors are very secretive about their new research in flash-optimized storage formats.

The 3 layers are: (a) blockchain storage including confirmed transactions (b) wallet keys&addresses storage (c) mempool a.k.a. unconfirmed transactions storage.
Yes, and currently (a) & (b) range from pretty bad to awful in terms of performance and accessibility of data.
(c) is a technical internal layer, I would not consider it as something alongside a & b, as it is transient and can be discarded.
Actually mempools (and non-confirmed or non-confirmable transactions) contain  many business-critical data that is very valuable long-term:

1) validly signed double-spend attempts
2) history of payment flows and fees at the granularity better than decaminutes.
legendary
Activity: 1100
Merit: 1032
1) Block explorer is not a valid benchmark. You want to be able to produce a distributed transaction that updates both Bitcoin wallet and some external database as a single transaction that either goes to completion in both databases or gets rolled back in both databases.
That's incorrect, and is actually a problematic design if you force that dependency.

What you need are actually 3 databases, which can be completely independent from each other:
- a blockchain DB (same as explorer DB), that can be restricted/filtered to only tx that concern you for size considerations
- a key DB (which will hold your key pool, xpriv, etc.)
- a wallet helper DB to hold metadata (address labels, payments extra info, accounts...)

The first two DB are only related by a filtering optimization, so no cross-DB transactions are ever necessary, and all the heavy lifting happens in the blockchain DB. When creating a new key, you always write it to the Key DB first, and it can only end up in the blockchain DB after having been confirmed. New unconfirmed tx are persisted in the wallet helper (with metadata and for eventual re-broadcasting), then go straight to the mempool for broadcasting. This could also help clean up the annoyances with new unconfirmed tx in the current bitcoin core code.

It would also isolate wallet management from key and tx creation/signing, the blockchain & key DB would be part of the core, but wallet helper could/should be considered as not being "core", but something alternative wallets could replace or do differently.

Key DB is really critical in terms of security, wallet helper much less so (if it leaks, you would leak some private info, but no funds), having it separate also means it will be simpler to move it to dedicated, standardized hardware.

2) Append-only files don't mean having to read and parse the whole file every time. For a popular example check out the PDF format that is designed to be read from the end, including appending new page indexes to the end.
But then you need some validation and recovery as what you appended could be partial or corrupted if a crash or power issue happened while you were writing it. Basically you would be re-inventing the ACID properties of a DB. Plenty of pitfalls there, especially with a cross-platform target, plenty of testing required.

You want to avoid NIHS and Yet-Another-Attempt-At-A-Crash-Proof-File-Format.

The 3 layers are: (a) blockchain storage including confirmed transactions (b) wallet keys&addresses storage (c) mempool a.k.a. unconfirmed transactions storage.
Yes, and currently (a) & (b) range from pretty bad to awful in terms of performance and accessibility of data.
(c) is a technical internal layer, I would not consider it as something alongside a & b, as it is transient and can be discarded.
legendary
Activity: 2128
Merit: 1073
Not really, since the wallet holds critical information, you want it safe from the usual range of errors, and you want ACID properties on it, which means a database.

A standard database means the wallet information is directly accessible to the users through standard tools, without having to use custom tools.

Also having a proper database means you can do away with a lot of the bitcoin-side wallet bookkeeping in C++, and just query the DB.

If properly indexed, you will have no performance issues with SQLite (unlike the current wallet code). I run whole blockchain explorers on SQLite, and can compute balance for any address or wallet faster than bitcoin core can compute balance for a moderately busy wallet, and that's with a brute-force select sum(), and wallets with hundreds of thousands of addresses (like major exchanges and darknet marketplaces).

Quote
But does "Real databases" include SQLite here? Does SQLite gives any assurances in the case of sudden reboots?

Yes, it was built for that (and other issues)

Quote
At least with an append-only file (and regular syncing) you can be fairly sure that only record of your last activity before the reboot will be lost, and that not some management structure in the middle broke and made it impossible to parse the entire file.

SQLite was in part designed to compete with fopen(), to provide robustness vs crashes which a simple file does not provide, and a lot of people use it just because of that robustness.

Also simple append-only file means you would have to parse and load everything every time, which will be slow and inefficient.
My comments to the above:

1) Block explorer is not a valid benchmark. You want to be able to produce a distributed transaction that updates both Bitcoin wallet and some external database as a single transaction that either goes to completion in both databases or gets rolled back in both databases.

BerkeleyDB (with bitcoind -privdb=0) actually has X/Open distributed transaction monitor interface (not out of the box in Bitcoin Core, but not too much work either). I have never seen SQLite participate in a distributed transaction, but I'm not saying that it impossible or even very hard to modify it to be able to participate in some form of transactional processing.

2) Append-only files don't mean having to read and parse the whole file every time. For a popular example check out the PDF format that is designed to be read from the end, including appending new page indexes to the end.

3) You guys are all going to lose if you keep up your popularity contest for the favorite embedded database engine. The 3 database layers in the Bitcoin Core all need to be abstracted to be engine-independent. Then it would not matter if an example implementation uses not the best choice of engine for the particular application needs. The 3 layers are: (a) blockchain storage including confirmed transactions (b) wallet keys&addresses storage (c) mempool a.k.a. unconfirmed transactions storage.

legendary
Activity: 2576
Merit: 2267
1RichyTrEwPYjZSeAYxeiFBNnKC9UjC5k
I have looked at it enough to know that there is nothing in the DB that would allow someone to spend your coins without knowing your passphrase.  So the "security" angle is not all that bad. 

But yes, the wallet needs an import/export format that's not database specific.

Well, I'll admit that security is not too likely to be an issue. It was more illustrative of the point about tying to old software. Or really, any particular piece of software at all. Internal data structures as documents were not OK when Word was doing it with the doc format. We should aim to do better than that.
legendary
Activity: 1100
Merit: 1032
Don't build your own database. You will get it wrong. It is really, really hard to build a reliable database system.

I agree with you, it's really hard.  I know that it is.  But, I think a database is overkill here. 

Not really, since the wallet holds critical information, you want it safe from the usual range of errors, and you want ACID properties on it, which means a database.

A standard database means the wallet information is directly accessible to the users through standard tools, without having to use custom tools.

Also having a proper database means you can do away with a lot of the bitcoin-side wallet bookkeeping in C++, and just query the DB.

If properly indexed, you will have no performance issues with SQLite (unlike the current wallet code). I run whole blockchain explorers on SQLite, and can compute balance for any address or wallet faster than bitcoin core can compute balance for a moderately busy wallet, and that's with a brute-force select sum(), and wallets with hundreds of thousands of addresses (like major exchanges and darknet marketplaces).

Quote
But does "Real databases" include SQLite here? Does SQLite gives any assurances in the case of sudden reboots?

Yes, it was built for that (and other issues)

Quote
At least with an append-only file (and regular syncing) you can be fairly sure that only record of your last activity before the reboot will be lost, and that not some management structure in the middle broke and made it impossible to parse the entire file.

SQLite was in part designed to compete with fopen(), to provide robustness vs crashes which a simple file does not provide, and a lot of people use it just because of that robustness.

Also simple append-only file means you would have to parse and load everything every time, which will be slow and inefficient.
staff
Activity: 4284
Merit: 8808
But yes, the wallet needs an import/export format that's not database specific.
It has one (importwallet / dumpwallet), and has for two years now.
legendary
Activity: 924
Merit: 1132
I have looked at it enough to know that there is nothing in the DB that would allow someone to spend your coins without knowing your passphrase.  So the "security" angle is not all that bad. 

But yes, the wallet needs an import/export format that's not database specific.
legendary
Activity: 2576
Merit: 2267
1RichyTrEwPYjZSeAYxeiFBNnKC9UjC5k
BDB is _only_ used for the wallet. It is not used for peers.dat, for fee_estimates, for indexes, etc.

Good to know.

Different BDB major versions are effectively different systems; very few projects are using the 6.x or likely to begin using it because oracle changed the licensing. 4.x also still recieves bug fixes and Its quite difficult for there to be security bugs in software which is not exposed to the outside world. (Bitcoin can, in fact, also be used with later versions of BDB, but it's somewhat untested-- and the file formats are not backwards compatible, so we don't recommend it).

True. But the major reason put forward for sticking with 4.8 is portable wallets which they really are not, any more than HTML best viewed with IE3.0 is.

I can understand the objection to 6.x based on the licensing (I am not that well versed on the changes but I did notice a few complaints while searching). I just think being wedded to an old version of software is not ideal and I suggest that a move away from it be started. The dumpwallet and importwallet may well (haven't looked into it yet) provide what should, going forward, be considered a portable wallet and wallet.dat should become to be regarded as an internal instance of a wallet only.
staff
Activity: 4284
Merit: 8808
BDB is _only_ used for the wallet. It is not used for peers.dat, for fee_estimates, for indexes, etc.

Different BDB major versions are effectively different systems; very few projects are using the 6.x or likely to begin using it because oracle changed the licensing. 4.x also still recieves bug fixes and Its quite difficult for there to be security bugs in software which is not exposed to the outside world. (Bitcoin can, in fact, also be used with later versions of BDB, but it's somewhat untested-- and the file formats are not backwards compatible, so we don't recommend it).



legendary
Activity: 2576
Merit: 2267
1RichyTrEwPYjZSeAYxeiFBNnKC9UjC5k
why not define an specific database for bitcoin to decrease the blockchain?
I believe BerkeleyDB save a lot of metadata for it's own work/features that maybe we dont need them at all.
BDB is _only_ used for the wallet. The blockchain itself is stored as raw blocks.

Is it not used also for indexing the blockchain and other things? I also see a peers.dat and fee_estimates.dat (granted that the latter is probably fairly recent)
legendary
Activity: 2576
Merit: 2267
1RichyTrEwPYjZSeAYxeiFBNnKC9UjC5k
FYI I've created an issue for this. Please help implementing it Smiley

https://github.com/bitcoin/bitcoin/issues/3971


I think it would be better to get away from requiring BDB4.8. The dumpwallet/importwallet appear to be the correct way forward. The database used should not be important (beyond minimum requirements).

I'd also note that in attempting to find information to feed into this argument, I went digging for information about 4.8 and end of life/support information and was unable to find anything much. Given that we're two versions on now (6.something), is it reasonable to be expecting much in the way of attention to 4.8? What if there are security bugs lurking?

I think it would be healthy do decouple core from any particular database in any case but surely this tying to a particular version is "not good".

hero member
Activity: 812
Merit: 1022
No Maps for These Territories
FYI I've created an issue for this. Please help implementing it Smiley

https://github.com/bitcoin/bitcoin/issues/3971
hero member
Activity: 812
Merit: 1022
No Maps for These Territories
Even if we believe that we need a full-on database for live wallets, we should at minimum save a serialized file (a CSV table with encrypted individual values, for example) that can be written by any version of bitcoin and read by any other.  Using that as the wallet file, we could upgrade or change the database back-end without breaking wallet compatibility. 
This already exists. See dumpwallet/importwallet RPCs in 0.9.0.
Pages:
Jump to: