LevelDB reliability? | Bitcointalksearch.org

2112

legendary

Activity: 2128

Merit: 1074

Quote from: gmaxwell on March 17, 2016, 12:38:38 AM

Quote from: fluffypony on March 16, 2016, 07:19:42 PM

That's precisely what we did with Monero. We abstracted our blockchain access subsystem out into a generic blockchainDB class,

Thats exactly how core has been done for years.

Though we don't consider it acceptable to have 32bit and 64 bit hosts fork with respect to each other, and so prefer to not take risks there!

This cannot be right. The Satoshi Bitcoin client always stored blockchain as the plain flat files. The database engines were used only for indexing into those files. The recent LevelDB-based backend worsened the situation by explicitly storing (also in plain files) the some precomputed data to undo the transaction confirmations in case of a reorganization.

The proper way to modularize the storage layers is to completely encapsulate all data and functions to access blockchain, without the crossing the abstraction layers inside the upper-level code.

I stress that "storage layers" need to be plural. The mempool is also a storage layer and also needs to be properly abstracted. In a proper, modular implementation the migration of transactions between those storage layers (for unconfirmed and confirmed transactions) will ideally require setting one field in the transaction attributes, e.g. bool confirmed;.

hyc

member

Activity: 88

Merit: 16

Quote from: gmaxwell on March 17, 2016, 12:38:38 AM

Quote from: fluffypony on March 16, 2016, 07:19:42 PM

That's precisely what we did with Monero. We abstracted our blockchain access subsystem out into a generic blockchainDB class,

Thats exactly how core has been done for years.

Though we don't consider it acceptable to have 32bit and 64 bit hosts fork with respect to each other, and so prefer to not take risks there!

Monero is using LMDB's VL32 support on 32bit systems, which gives it identical data format and capacity of 64bit builds. There's no difference between a 32bit or 64bit host in the blockchain.

gmaxwell

staff

Activity: 4326

Merit: 8951

Quote from: fluffypony on March 16, 2016, 07:19:42 PM

That's precisely what we did with Monero. We abstracted our blockchain access subsystem out into a generic blockchainDB class,

Thats exactly how core has been done for years.

Though we don't consider it acceptable to have 32bit and 64 bit hosts fork with respect to each other, and so prefer to not take risks there!

smaxz

sr. member

Activity: 430

Merit: 253

VeganAcademy

so is there no way to get my backed up blockchains/wallets on 4.8.3 to work with current core?

i've managed to compile and link the archived datasets from oracle but core wont build past leveldb.. figured it was something i did wrong.

fluffypony

donator

Activity: 1274

Merit: 1060

GetMonero.org / MyMonero.com

Quote from: 2112 on March 13, 2016, 12:16:10 AM

Quote from: misterbigg on March 12, 2016, 11:52:47 PM

Could this be of use? It is designed for high performance and reliability on SSDs
https://github.com/ripple/rippled/tree/develop/src/beast/beast/nudb

It doesn't matter. The real answer is simple: the storage layers (plural!) need to be abstracted. No single engine could meet all needs.

That's precisely what we did with Monero. We abstracted our blockchain access subsystem out into a generic blockchainDB class, and then build out implementations for LMDB (default, preferred) and for BerkeleyDB (failover).

Of course, there exists a risk that a portion of the network may be forked off if in an edge-case between implementations, but that's no different to a subset being forked off the network because they're running 32-bit whatever in a 64-bit world.

2112

legendary

Activity: 2128

Merit: 1074

Quote from: misterbigg on March 16, 2016, 10:42:00 AM

Quote from: 2112 on March 13, 2016, 12:16:10 AM

Quote from: misterbigg on March 12, 2016, 11:52:47 PM

Could this be of use? It is designed for high performance and reliability on SSDs
https://github.com/ripple/rippled/tree/develop/src/beast/beast/nudb

It doesn't matter. The real answer is simple: the storage layers (plural!) need to be abstracted. No single engine could meet all needs.

Of course. I did that for my first task at Ripple. You're telling me that Bitcoin Core still has not abstracted the storage layer?

How could I know what they really did? I feel I need to copy-paste something I wrote nearly 4 years ago under "[POLL] Multi-sig or scalability--which is more pressing?".

Quote from: 2112 on July 22, 2012, 02:34:52 PM

Quote from: Gavin Andresen on July 22, 2012, 10:22:10 AM

Gentle reminder to the other bitcoin developers: it is generally best not to feed trolls. Use the ignore button.

“If you sit in on a poker game and don’t see a sucker, get up. You’re the sucker.”

http://quoteinvestigator.com/2011/07/09/poker-patsy/

Anyway, here's the example of how open source poker is being played.

A whale of a player spends big bucks developing a secret database engine. After 5 years this player takes one of the earliest branches and open sources it. Lets call that branch LevelDB. Suckers jump on it, spend money and energy to develop some basic tools like statistics gathering and query optimization. Then the whale brings the cold deck to the table, which gives him an instant 5 years of leadtime. It looks like that:

Quote from: etotheipi on July 10, 2012, 10:11:27 PM

Then, a month later, the main devs decided to switch to compressed public keys which requires a whole new wallet format for Armory. I was crushed.

I'm no Google insider or anything like that. But I used to knew the people who played with the current Googlers; and I'm broadly familiar with the level of skill involved.

Before you spend to much time at the keyboard and mouse please do see the old David Mamet's movie "House of games". Remember the line:

“it was only business … nothing personal.”

misterbigg

legendary

Activity: 1064

Merit: 1001

Quote from: 2112 on March 13, 2016, 12:16:10 AM

Quote from: misterbigg on March 12, 2016, 11:52:47 PM

Could this be of use? It is designed for high performance and reliability on SSDs
https://github.com/ripple/rippled/tree/develop/src/beast/beast/nudb

It doesn't matter. The real answer is simple: the storage layers (plural!) need to be abstracted. No single engine could meet all needs.

Of course. I did that for my first task at Ripple. You're telling me that Bitcoin Core still has not abstracted the storage layer?

watashi-kokoto

sr. member

Activity: 690

Merit: 269

Quote from: 2112 on March 12, 2016, 11:29:33 PM

People like Gregory Maxwell, Mike Hearn or Peter Todd are getting paid to pretend to not understand.

Let's not discuss what they pretend, or critize some hobby project
Could you please elaborate, what major problems for Bitcoin could be caused by devs failing to understand utxo db, blockchain spool storage or other aspects

2112

legendary

Activity: 2128

Merit: 1074

Quote from: misterbigg on March 12, 2016, 11:52:47 PM

Could this be of use? It is designed for high performance and reliability on SSDs
https://github.com/ripple/rippled/tree/develop/src/beast/beast/nudb

It doesn't matter. The real answer is simple: the storage layers (plural!) need to be abstracted. No single engine could meet all needs.

misterbigg

legendary

Activity: 1064

Merit: 1001

Could this be of use? It is designed for high performance and reliability on SSDs
https://github.com/ripple/rippled/tree/develop/src/beast/beast/nudb

2112

legendary

Activity: 2128

Merit: 1074

Quote from: jl777 on March 10, 2016, 08:33:30 PM

However, memory mapped files share a lot of the same advantages you list:

1) independence of logical data from physical storage - yes
2) sharing of the dataset between tasks and machines - yes (you do need both endian forms)
3) rapid integrity verification - yes, even faster as once verified, no need to verify again
4) optimization of storage method to match the access patterns - yes that is exactly what has been done
5) maintenance of transactional integrity with related datasets and processes - yes
6) fractional/incremental backup/restore while accessing software is online - yes

7) support for ad-hoc queries without the need to write software - no
Cool

ease of integration with new or unrelated software packages - no
9) compliance with accounting and auditing standards - not sure
10) easier gathering of statistics about access patterns - not without custom code

So it depends on if 7 to 10 trump the benefits and if the resources are available to get it working

James

I really don't want to be discouraging to you. You do very creative and innovative work, but you know less than zero about databases, and it is going to hamper you. You write like an autodidact or maybe you went to some really bad school.

1) no mmap()-ed files as used by you don't provide independence of the physical storage layout. It is fixed in your code and in your proposed filesystem usage. Not even simple tasks like adding additional disk volumes with more free space while online.

2) it doesn't seem like your code does any locking, so currently you can't even do sharing on a single host. Shared mmap() over NFS (or MapViewOfFile() over SMB) is only for desperadoes.

3) here you slip from zero knowledge to willful, aggressive ignorance. Marking files read-only and using SquashFS is not storage integrity verification. Even amateur non-programmers, like musicians or video producers are on average more aware of the need to verify integrity.

4) unfortunately you've got stuck in a rut of exclusively testing the initial sync. This is very non-representative access pattern. Even non-programmers in other thread are aware of this issue. This is the reason why professional database systems have separate tools for initial loading (like SQL*Loader for Oracle or BULK INSERT in standard SQL).

5) I don't believe you know what you're talking about. I'm pretty sure that you've never heard of https://en.wikipedia.org/wiki/Two-phase_commit_protocol and couldn't name any https://en.wikipedia.org/wiki/Transaction_processing_system to save your life.

6) I couldn't find any trace of support for locking or incremental backup/restore in your code. Personally, you look like somebody who rarely backs up, even less often restores and verifies. Even amateur, but experienced non-programmers seem to be more aware of the live-backup issues.

So not 7/10 or even 6/10. It is 0/10 with big minus for getting item 3) so badly wrong.

Again, I don't want to be discouraging to your programming project, although I don't fully comprehend it. Just please don't write about something you have no understanding.

People like Gregory Maxwell, Mike Hearn or Peter Todd are getting paid to pretend to not understand. Old quote:

Quote from: Upton Sinclair

It is difficult to get a man to understand something, when his salary depends upon his not understanding it!

watashi-kokoto

sr. member

Activity: 690

Merit: 269

Quote from: TierNolan on March 12, 2016, 12:40:29 PM

All blocks are stored ....

Really interesting info, thanks a lot

TierNolan

legendary

Activity: 1232

Merit: 1094

Quote from: jl777 on March 10, 2016, 08:28:26 PM

OK, so what is the DB used for? Will everything still work without the DB?

All blocks are stored in the same format that they are received, as append only files. There are undo (reverse) files which are stored separately (and they are append-only file too).

You can take the UTXO set as it is for block 400,000 and then apply the undo file for block 400,000 and you get the state for block 399,999.

This means that the UTXO set can be stepped back during a chain reorg. You use the undo (reverse) files to step back until you hit the fork and then move forward along the new fork.

The database stores the UTXO set that matches the chaintip at any given time. It is only ever changed to add/remove a block atomically.

Each new block atomically updates the UTXO set.

Block-hashes are mapped to (type 'b' record):

previous hash
block height
which file is is stored in
file offset for the block in block*****.dat
file offset for undo data in rev*****.dat
block version
merkle root
timestamp
target (bits)
nonce
status (Headers validated, block received, block validated etc)
transaction count)

Each key is pre-pended by a code to indicate what type of record and all records are stored in the same database.

DB_COINS = 'c';
DB_BLOCK_FILES = 'f';
DB_TXINDEX = 't';
DB_BLOCK_INDEX = 'b';

c: Maps txid to unspent outputs for that transaction
f: Maps file index to info about that file
t: Maps txid to the location of the transaction (file number + offset)
b: Maps block hash to the block header info (see above)

The 't' field doesn't have all the transactions unless txindex is enabled.

These are single record only fields (I think):

DB_BEST_BLOCK = 'B';
DB_FLAG = 'F';
DB_REINDEX_FLAG = 'R';
DB_LAST_BLOCK = 'l';

hyc

member

Activity: 88

Merit: 16

Quote from: gmaxwell on March 10, 2016, 08:17:40 PM

LevelDB needs a "filesystem interface layer". It doesn't come with one for windows; when leveldb is used inside Chrome it uses special chrome specific APIs to talk to the file system. A contributor provided a windows layer for Bitcoin which is what allowed Bitcoin to use leveldb in the first place.

This windows filesystem interface layer was incorrect: it failed to flush to disk at all the points which it should. It was fixed rapidly as soon as someone brought reproduction instructions to Wladimir and he reproduced it. There was much faffing about replacing it, mostly by people who don't contribute often to core-- in my view this was an example of bad cargo-cult "engineering" where instead of actual engineering people pattern-match buzzwords and glue black boxes together: "I HURD YOU NEED A DATABASE. SOMEONE ONCE TOLD ME THAT MYCROSAFT SEQUAL IS A GREAT DATABASE. IT HAS WEBSCALE". When the actual system engineers got engaged, the problem was promptly fixed.

This seems to ignore the large number of non-Windows-related corruption occurrences.

Quote

This is especially irritating because leveldb is not a generic relational database, it is a highly specialized transactional key/value store. Leveldb is much more like an efficient disk-backed MAP implementation than it is like anything you would normally call a database. Most other "database" systems people suggest are not within three orders of magnitude in performance for our specific very narrow use case. The obvious alternatives-- like LMDB have other limitations (in particular LMDB must mmap the files, which basically precludes using it on 32 bit systems-- a shame because I like LMDB a lot for the same niche leveldb covers; leveldb also has extensive corruption detection, important for us because we do not want to incorrectly reject the chain due to filesystem corruption).

I think it's more likely that Bitcoin Core would eventually move to a custom data structure than to another "database" (maybe a swap to LMDB if they ever support non-mmap operations... maybe); as doing so would basically be a requirement for performance utxo set commitments.

LevelDB is not a transactional data store, it doesn't support full ACID semantics. It lacks Isolation, primarily, and its Atomicity features aren't actually reliable. No storage system that relies on multiple files for storage can offer true Atomicity - the same applied to BerkeleyDB too.

Meanwhile, despite relying on mmap, LMDB works perfectly well on 32 bit systems. And unlike LevelDB, LMDB is fully supported on Windows. (Also unlike LevelDB, LMDB is *fully supported* - LevelDB isn't actively maintained any more.)

Quote

A large number of these corruption reports were also being caused by anti-virus software randomly _deleting_ files out from under Bitcoin Core. It turns out that there are virus "signatures" that are as short as 16 bytes long... and AV programs avoid deleting random files all over the users system through a set of crazy heuristics like extension matching which failed to preclude the Bitcoin information (though I'm sure actual viruses have no problem abusing these heuristics to escape detection). Core implemented a whitening scheme that obfuscate the stored state in order to avoid these problems or any other potential for hostile blockchain data to interact with weird filesystem or storage bugs.

Right now it's very hard to corrupt the chainstate on Windows in Bitcoin Core 0.12+. There still may be some corner case bugs but they're now rare enough that they're hard to distinguish from broken hardware/bad drivers that inappropriately write cache or otherwise corrupt data-- issues which no sane key value store could really deal with. If you're able to reproduce corruption like that, I'd very much like to hear from you.

We've suffered a bit, as many other Open Source projects do -- in that comparatively few skilled open source developers use Windows (and, importantly, few _continue_ to use windows once they're hanging out with Linux/BSD users; if nothing else they end up moving to Mac)-- so we're extra dependent on _good_ trouble reports from Windows users whenever there is a problem which is Windows specific...

jl777

legendary

Activity: 1176

Merit: 1134

Quote from: hhanh00 on March 11, 2016, 09:30:27 PM

Quote from: jl777 on March 11, 2016, 08:37:26 AM

Quote from: hhanh00 on March 11, 2016, 05:58:31 AM

Quote from: jl777 on March 10, 2016, 04:10:42 PM

Quote from: achow101 on March 10, 2016, 03:15:13 PM

LevelDB being stupid is one of the major reasons that people have to reindex on Bitcoin Core crashes. There have been proposals to replace it but so far there are no plans on doing so. However people are working on using different databases in Bitcoin Core and those are being implemented and tested.

Maybe the most reliable DB is no DB at all? Use efficiently encoded read only files that can be directly memory mapped.

https://bitcointalksearch.org/topic/an-optimal-engine-for-utxo-db-1387119
https://bitcointalksearch.org/topic/using-compact-indexes-instead-of-hashes-as-identifiers-1377459
https://bitco.in/forum/forums/iguana.23/

James

LevelDb is used to store the UTXO set. How is that read only?

UTXO set falls into the write once category. Once an input is spent, you cant spend it again. The difference with the UTXO set is explained here: https://bitco.in/forum/threads/30mb-utxo-bitmap-uncompressed.941/

So you can calculate the OR'able bitmap for each bundle in parallel (as soon as all its prior bundles are there). Then to create the current utxo set, OR the bitmaps together.

What will practically remain volatile is the bitmap, but the overlay bitmap for each bundle is read only. This makes a UTXO check a matter to find the index of the vout and check a bit.

James

Not sure if we are talking about the same thing. Following your link, it seems you are describing the internal data structure used by a block explorer which aren't necessarily optimal for a bitcoin node.
In particular, you use a 6 byte locator. Given a new incoming transaction that can spend any utxo (hash+vout), do you need to map it to a locator? And if so, how is it done?

iguana is a bitcoin node that happens to update block explorer level dataset.
The data structures are optimized for parallel access, so multicore searches can be used
however even with a single core searching linearly (backwards), it is quite fast to find any txid

Each bundle of 2000 files has a hardcoded hash table for all the txid's in it, so it is a matter of doing a hash lookup until it is found. I dont have timings of fully processing a full block yet, but I dont expect it would take more than a few seconds to update all vins and vouts

since txid's are already high entropy, there is no need to do an additional hash, so I XOR all 4 64-bit long ints of the txid together to create an index into an open hash table, which is created to be never more than half full, so it will find any match in very few iterations. Since everything is memory mapped, after the initial access to swap it in, each search will take less than a microsecond

hhanh00

sr. member

Activity: 467

Merit: 267

Quote from: jl777 on March 11, 2016, 08:37:26 AM

Quote from: hhanh00 on March 11, 2016, 05:58:31 AM

Quote from: jl777 on March 10, 2016, 04:10:42 PM

Quote from: achow101 on March 10, 2016, 03:15:13 PM

LevelDB being stupid is one of the major reasons that people have to reindex on Bitcoin Core crashes. There have been proposals to replace it but so far there are no plans on doing so. However people are working on using different databases in Bitcoin Core and those are being implemented and tested.

Maybe the most reliable DB is no DB at all? Use efficiently encoded read only files that can be directly memory mapped.

https://bitcointalksearch.org/topic/an-optimal-engine-for-utxo-db-1387119
https://bitcointalksearch.org/topic/using-compact-indexes-instead-of-hashes-as-identifiers-1377459
https://bitco.in/forum/forums/iguana.23/

James

LevelDb is used to store the UTXO set. How is that read only?

UTXO set falls into the write once category. Once an input is spent, you cant spend it again. The difference with the UTXO set is explained here: https://bitco.in/forum/threads/30mb-utxo-bitmap-uncompressed.941/

So you can calculate the OR'able bitmap for each bundle in parallel (as soon as all its prior bundles are there). Then to create the current utxo set, OR the bitmaps together.

What will practically remain volatile is the bitmap, but the overlay bitmap for each bundle is read only. This makes a UTXO check a matter to find the index of the vout and check a bit.

James

Not sure if we are talking about the same thing. Following your link, it seems you are describing the internal data structure used by a block explorer which aren't necessarily optimal for a bitcoin node.
In particular, you use a 6 byte locator. Given a new incoming transaction that can spend any utxo (hash+vout), do you need to map it to a locator? And if so, how is it done?

kushti

full member

Activity: 317

Merit: 103

Quote from: kushti on March 11, 2016, 06:13:52 AM

Blockchain systems probably need for versioned immutable databases with rollback possibility and efficient cleanup of old versions in background. There are no known implementations around though.

Well in fact we have half-done solution for that used in our open modular blockchain framework Scorex ( https://github.com/ScorexProject/Scorex ). We can externalize it and make Pull-Request to Bitcoinj if some Java dev would like to help with Java part

jl777

legendary

Activity: 1176

Merit: 1134

Quote from: hhanh00 on March 11, 2016, 05:58:31 AM

Quote from: jl777 on March 10, 2016, 04:10:42 PM

Quote from: achow101 on March 10, 2016, 03:15:13 PM

LevelDB being stupid is one of the major reasons that people have to reindex on Bitcoin Core crashes. There have been proposals to replace it but so far there are no plans on doing so. However people are working on using different databases in Bitcoin Core and those are being implemented and tested.

Maybe the most reliable DB is no DB at all? Use efficiently encoded read only files that can be directly memory mapped.

https://bitcointalksearch.org/topic/an-optimal-engine-for-utxo-db-1387119
https://bitcointalksearch.org/topic/using-compact-indexes-instead-of-hashes-as-identifiers-1377459
https://bitco.in/forum/forums/iguana.23/

James

LevelDb is used to store the UTXO set. How is that read only?

UTXO set falls into the write once category. Once an input is spent, you cant spend it again. The difference with the UTXO set is explained here: https://bitco.in/forum/threads/30mb-utxo-bitmap-uncompressed.941/

So you can calculate the OR'able bitmap for each bundle in parallel (as soon as all its prior bundles are there). Then to create the current utxo set, OR the bitmaps together.

What will practically remain volatile is the bitmap, but the overlay bitmap for each bundle is read only. This makes a UTXO check a matter to find the index of the vout and check a bit.

James

kushti

full member

Activity: 317

Merit: 103

LevelDB is surely about weak consistency for performance's sake, as well as most NoSQL databases.

Blockchain systems probably need for versioned immutable databases with rollback possibility and efficient cleanup of old versions in background. There are no known implementations around though.

hhanh00

sr. member

Activity: 467

Merit: 267

Quote from: jl777 on March 10, 2016, 04:10:42 PM

Quote from: achow101 on March 10, 2016, 03:15:13 PM

LevelDB being stupid is one of the major reasons that people have to reindex on Bitcoin Core crashes. There have been proposals to replace it but so far there are no plans on doing so. However people are working on using different databases in Bitcoin Core and those are being implemented and tested.

Maybe the most reliable DB is no DB at all? Use efficiently encoded read only files that can be directly memory mapped.

https://bitcointalksearch.org/topic/an-optimal-engine-for-utxo-db-1387119
https://bitcointalksearch.org/topic/using-compact-indexes-instead-of-hashes-as-identifiers-1377459
https://bitco.in/forum/forums/iguana.23/

James

LevelDb is used to store the UTXO set. How is that read only?

Topic: LevelDB reliability? (Read 4438 times)