Pages:
Author

Topic: LevelDB reliability? (Read 4420 times)

legendary
Activity: 2128
Merit: 1073
April 03, 2016, 09:21:08 PM
#30
That's precisely what we did with Monero. We abstracted our blockchain access subsystem out into a generic blockchainDB class,
Thats exactly how core has been done for years.

Though we don't consider it acceptable to have 32bit and 64 bit hosts fork with respect to each other, and so prefer to not take risks there!
This cannot be right. The Satoshi Bitcoin client always stored blockchain as the plain flat files. The database engines were used only for indexing into those files. The recent LevelDB-based backend worsened the situation by explicitly storing (also in plain files) the some precomputed data to undo the transaction confirmations in case of a reorganization.

The proper way to modularize the storage layers is to completely encapsulate all data and functions to access blockchain, without the crossing the abstraction layers inside the upper-level code.

I stress that "storage layers" need to be plural. The mempool is also a storage layer and also needs to be properly abstracted. In a proper, modular implementation the migration of transactions between those storage layers (for unconfirmed and confirmed transactions) will ideally require setting one field in the transaction attributes, e.g. bool confirmed;.
hyc
member
Activity: 88
Merit: 16
March 17, 2016, 11:57:22 AM
#29
That's precisely what we did with Monero. We abstracted our blockchain access subsystem out into a generic blockchainDB class,
Thats exactly how core has been done for years.

Though we don't consider it acceptable to have 32bit and 64 bit hosts fork with respect to each other, and so prefer to not take risks there!

Monero is using LMDB's VL32 support on 32bit systems, which gives it identical data format and capacity of 64bit builds. There's no difference between a 32bit or 64bit host in the blockchain.
staff
Activity: 4284
Merit: 8808
March 17, 2016, 12:38:38 AM
#28
That's precisely what we did with Monero. We abstracted our blockchain access subsystem out into a generic blockchainDB class,
Thats exactly how core has been done for years.

Though we don't consider it acceptable to have 32bit and 64 bit hosts fork with respect to each other, and so prefer to not take risks there!
sr. member
Activity: 430
Merit: 253
VeganAcademy
March 16, 2016, 10:52:57 PM
#27
so is there no way to get my backed up blockchains/wallets on 4.8.3 to work with current core?

i've managed to compile and link the archived datasets from oracle but core wont build past leveldb.. figured it was something i did wrong.
donator
Activity: 1274
Merit: 1060
GetMonero.org / MyMonero.com
March 16, 2016, 07:19:42 PM
#26
Could this be of use? It is designed for high performance and reliability on SSDs
https://github.com/ripple/rippled/tree/develop/src/beast/beast/nudb
It doesn't matter. The real answer is simple: the storage layers (plural!) need to be abstracted. No single engine could meet all needs.


That's precisely what we did with Monero. We abstracted our blockchain access subsystem out into a generic blockchainDB class, and then build out implementations for LMDB (default, preferred) and for BerkeleyDB (failover).

Of course, there exists a risk that a portion of the network may be forked off if in an edge-case between implementations, but that's no different to a subset being forked off the network because they're running 32-bit whatever in a 64-bit world.
legendary
Activity: 2128
Merit: 1073
March 16, 2016, 11:35:51 AM
#25
Could this be of use? It is designed for high performance and reliability on SSDs
https://github.com/ripple/rippled/tree/develop/src/beast/beast/nudb
It doesn't matter. The real answer is simple: the storage layers (plural!) need to be abstracted. No single engine could meet all needs.

Of course. I did that for my first task at Ripple. You're telling me that Bitcoin Core still has not abstracted the storage layer?

How could I know what they really did? I feel I need to copy-paste something I wrote nearly 4 years ago under "[POLL] Multi-sig or scalability--which is more pressing?".

Gentle reminder to the other bitcoin developers: it is generally best not to feed trolls.  Use the ignore button.
“If you sit in on a poker game and don’t see a sucker, get up. You’re the sucker.”

http://quoteinvestigator.com/2011/07/09/poker-patsy/

Anyway, here's the example of how open source poker is being played.

A whale of a player spends big bucks developing a secret database engine. After 5 years this player takes one of the earliest branches and open sources it. Lets call that branch LevelDB. Suckers jump on it, spend money and energy to develop some basic tools like statistics gathering and query optimization. Then the whale brings the cold deck to the table, which gives him an instant 5 years of leadtime. It looks like that:

Then, a month later, the main devs decided to switch to compressed public keys which requires a whole new wallet format for Armory.  I was crushed.

I'm no Google insider or anything like that. But I used to knew the people who played with the current Googlers; and I'm broadly familiar with the level of skill involved.

Before you spend to much time at the keyboard and mouse please do see the old David Mamet's movie "House of games". Remember the line:

“it was only business … nothing personal.”

legendary
Activity: 1064
Merit: 1001
March 16, 2016, 10:42:00 AM
#24
Could this be of use? It is designed for high performance and reliability on SSDs
https://github.com/ripple/rippled/tree/develop/src/beast/beast/nudb
It doesn't matter. The real answer is simple: the storage layers (plural!) need to be abstracted. No single engine could meet all needs.

Of course. I did that for my first task at Ripple. You're telling me that Bitcoin Core still has not abstracted the storage layer?
sr. member
Activity: 687
Merit: 269
March 14, 2016, 03:29:55 PM
#23
People like Gregory Maxwell, Mike Hearn or Peter Todd are getting paid to pretend to not understand.

Let's not discuss what they pretend, or critize some hobby project
Could you please elaborate, what major problems for Bitcoin could be caused by  devs failing to understand utxo db, blockchain spool storage or other aspects
legendary
Activity: 2128
Merit: 1073
March 13, 2016, 12:16:10 AM
#22
Could this be of use? It is designed for high performance and reliability on SSDs
https://github.com/ripple/rippled/tree/develop/src/beast/beast/nudb
It doesn't matter. The real answer is simple: the storage layers (plural!) need to be abstracted. No single engine could meet all needs.
legendary
Activity: 1064
Merit: 1001
March 12, 2016, 11:52:47 PM
#21
Could this be of use? It is designed for high performance and reliability on SSDs
https://github.com/ripple/rippled/tree/develop/src/beast/beast/nudb
legendary
Activity: 2128
Merit: 1073
March 12, 2016, 11:29:33 PM
#20
However, memory mapped files share a lot of the same advantages you list:

1) independence of logical data from physical storage - yes
2) sharing of the dataset between tasks and machines - yes (you do need both endian forms)
3) rapid integrity verification - yes, even faster as once verified, no need to verify again
4) optimization of storage method to match the access patterns - yes that is exactly what has been done
5) maintenance of transactional integrity with related datasets and processes - yes
6) fractional/incremental backup/restore while accessing software is online  - yes

7) support for ad-hoc queries without the need to write software - no
Cool ease of integration with new or unrelated software packages - no
9) compliance with accounting and auditing standards - not sure
10) easier gathering of statistics about access patterns - not without custom code

So it depends on if 7 to 10 trump the benefits and if the resources are available to get it working

James
I really don't want to be discouraging to you. You do very creative and innovative work, but you know less than zero about databases, and it is going to hamper you. You write like an autodidact or maybe you went to some really bad school.

1) no mmap()-ed files as used by you don't provide independence of the physical storage layout. It is fixed in your code and in your proposed filesystem usage. Not even simple tasks like adding additional disk volumes with more free space while online.

2) it doesn't seem like your code does any locking, so currently you can't even do sharing on a single host. Shared mmap() over NFS (or MapViewOfFile() over SMB) is only for desperadoes.

3) here you slip from zero knowledge to willful, aggressive ignorance. Marking files read-only and using SquashFS is not storage integrity verification. Even amateur non-programmers, like musicians or video producers are on average more aware of the need to verify integrity.

4) unfortunately you've got stuck in a rut of exclusively testing the initial sync. This is very non-representative access pattern. Even non-programmers in other thread are aware of this issue. This is the reason why professional database systems have separate tools for initial loading (like SQL*Loader for Oracle or BULK INSERT in standard SQL).

5) I don't believe you know what you're talking about. I'm pretty sure that you've never heard of https://en.wikipedia.org/wiki/Two-phase_commit_protocol and couldn't name any https://en.wikipedia.org/wiki/Transaction_processing_system to save your life.

6) I couldn't find any trace of support for locking or incremental backup/restore in your code. Personally, you look like somebody who rarely backs up, even less often restores and verifies. Even amateur, but experienced non-programmers seem to be more aware of the live-backup issues.

So not 7/10 or even 6/10. It is 0/10 with big minus for getting item 3) so badly wrong.

Again, I don't want to be discouraging to your programming project, although I don't fully comprehend it. Just please don't write about something you have no understanding.

People like Gregory Maxwell, Mike Hearn or Peter Todd are getting paid to pretend to not understand. Old quote:

Quote from: Upton Sinclair
It is difficult to get a man to understand something, when his salary depends upon his not understanding it!
sr. member
Activity: 687
Merit: 269
March 12, 2016, 01:19:52 PM
#19
All blocks are stored ....

Really interesting info, thanks a lot
legendary
Activity: 1232
Merit: 1094
March 12, 2016, 12:40:29 PM
#18
OK, so what is the DB used for? Will everything still work without the DB?

All blocks are stored in the same format that they are received, as append only files.  There are undo (reverse) files which are stored separately (and they are append-only file too).

You can take the UTXO set as it is for block 400,000 and then apply the undo file for block 400,000 and you get the state for block 399,999.

This means that the UTXO set can be stepped back during a chain reorg.  You use the undo (reverse) files to step back until you hit the fork and then move forward along the new fork.

The database stores the UTXO set that matches the chaintip at any given time.  It is only ever changed to add/remove a block atomically.

Each new block atomically updates the UTXO set.

Block-hashes are mapped to (type 'b' record):
  • previous hash
  • block height
  • which file is is stored in
  • file offset for the block in block*****.dat
  • file offset for undo data in rev*****.dat
  • block version
  • merkle root
  • timestamp
  • target (bits)
  • nonce
  • status (Headers validated, block received, block validated etc)
  • transaction count)

Each key is pre-pended by a code to indicate what type of record and all records are stored in the same database.

DB_COINS = 'c';
DB_BLOCK_FILES = 'f';
DB_TXINDEX = 't';
DB_BLOCK_INDEX = 'b';

c: Maps txid to unspent outputs for that transaction
f: Maps file index to info about that file
t: Maps txid to the location of the transaction (file number + offset)
b: Maps block hash to the block header info (see above)

The 't' field doesn't have all the transactions unless txindex is enabled.

These are single record only fields (I think):

DB_BEST_BLOCK = 'B';
DB_FLAG = 'F';
DB_REINDEX_FLAG = 'R';
DB_LAST_BLOCK = 'l';
hyc
member
Activity: 88
Merit: 16
March 12, 2016, 04:22:36 AM
#17
LevelDB needs a "filesystem interface layer". It doesn't come with one for windows; when leveldb is used inside Chrome it uses special chrome specific APIs to talk to the file system.  A contributor provided a windows layer for Bitcoin which is what allowed Bitcoin to use leveldb in the first place.

This windows filesystem interface layer was incorrect: it failed to flush to disk at all the points which it should. It was fixed rapidly as soon as someone brought reproduction instructions to Wladimir and he reproduced it.   There was much faffing about replacing it, mostly by people who don't contribute often to core-- in my view this was an example of bad cargo-cult "engineering" where instead of actual engineering people pattern-match buzzwords and glue black boxes together: "I HURD YOU NEED A DATABASE. SOMEONE ONCE TOLD ME THAT MYCROSAFT SEQUAL IS A GREAT DATABASE. IT HAS WEBSCALE". When the actual system engineers got engaged, the problem was promptly fixed.

This seems to ignore the large number of non-Windows-related corruption occurrences.

Quote
This is especially irritating because leveldb is not a generic relational database, it is a highly specialized transactional key/value store. Leveldb is much more like an efficient disk-backed MAP implementation than it is like anything you would normally call a database. Most other "database" systems people suggest are not within three orders of magnitude in performance for our specific very narrow use case. The obvious alternatives-- like LMDB have other limitations (in particular LMDB must mmap the files, which basically precludes using it on 32 bit systems-- a shame because I like LMDB a lot for the same niche leveldb covers; leveldb also has extensive corruption detection, important for us because we do not want to incorrectly reject the chain due to filesystem corruption).

I think it's more likely that Bitcoin Core would eventually move to a custom data structure than to another "database" (maybe a swap to LMDB if they ever support non-mmap operations... maybe); as doing so would basically be a requirement for performance utxo set commitments.

LevelDB is not a transactional data store, it doesn't support full ACID semantics. It lacks Isolation, primarily, and its Atomicity features aren't actually reliable. No storage system that relies on multiple files for storage can offer true Atomicity - the same applied to BerkeleyDB too.

Meanwhile, despite relying on mmap, LMDB works perfectly well on 32 bit systems. And unlike LevelDB, LMDB is fully supported on Windows. (Also unlike LevelDB, LMDB is *fully supported* - LevelDB isn't actively maintained any more.)

Quote
A large number of these corruption reports were also being caused by anti-virus software randomly _deleting_ files out from under Bitcoin Core. It turns out that there are virus "signatures" that are as short as 16 bytes long... and AV programs avoid deleting random files all over the users system through a set of crazy heuristics like extension matching which failed to preclude the Bitcoin information (though I'm sure actual viruses have no problem abusing these heuristics to escape detection). Core implemented a whitening scheme that obfuscate the stored state in order to avoid these problems or any other potential for hostile blockchain data to interact with weird filesystem or storage bugs.

Right now it's very hard to corrupt the chainstate on Windows in Bitcoin Core 0.12+. There still may be some corner case bugs but they're now rare enough that they're hard to distinguish from broken hardware/bad drivers that inappropriately write cache or otherwise corrupt data-- issues which no sane key value store could really deal with. If you're able to reproduce corruption like that, I'd very much like to hear from you.

We've suffered a bit, as many other Open Source projects do -- in that comparatively few skilled open source developers use Windows (and, importantly, few _continue_ to use windows once they're hanging out with Linux/BSD users; if nothing else they end up moving to Mac)-- so we're extra dependent on _good_ trouble reports from Windows users whenever there is a problem which is Windows specific...
legendary
Activity: 1176
Merit: 1134
March 11, 2016, 10:21:32 PM
#16
LevelDB being stupid is one of the major reasons that people have to reindex on Bitcoin Core crashes. There have been proposals to replace it but so far there are no plans on doing so. However people are working on using different databases in Bitcoin Core and those are being implemented and tested.

Maybe the most reliable DB is no DB at all? Use efficiently encoded read only files that can be directly memory mapped.

https://bitcointalksearch.org/topic/an-optimal-engine-for-utxo-db-1387119
https://bitcointalksearch.org/topic/using-compact-indexes-instead-of-hashes-as-identifiers-1377459
https://bitco.in/forum/forums/iguana.23/

James

LevelDb is used to store the UTXO set. How is that read only?

UTXO set falls into the write once category. Once an input is spent, you cant spend it again. The difference with the UTXO set is explained here: https://bitco.in/forum/threads/30mb-utxo-bitmap-uncompressed.941/

So you can calculate the OR'able bitmap for each bundle in parallel (as soon as all its prior bundles are there). Then to create the current utxo set, OR the bitmaps together.

What will practically remain volatile is the bitmap, but the overlay bitmap for each bundle is read only. This makes a UTXO check a matter to find the index of the vout and check a bit.

James

Not sure if we are talking about the same thing. Following your link, it seems you are describing the internal data structure used by a block explorer which aren't necessarily optimal for a bitcoin node.
In particular, you use a 6 byte locator. Given a new incoming transaction that can spend any utxo (hash+vout), do you need to map it to a locator? And if so, how is it done?


iguana is a bitcoin node that happens to update block explorer level dataset.
The data structures are optimized for parallel access, so multicore searches can be used
however even with a single core searching linearly (backwards), it is quite fast to find any txid

Each bundle of 2000 files has a hardcoded hash table for all the txid's in it, so it is a matter of doing a hash lookup until it is found. I dont have timings of fully processing a full block yet, but I dont expect it would take more than a few seconds to update all vins and vouts

since txid's are already high entropy, there is no need to do an additional hash, so I XOR all 4 64-bit long ints of the txid together to create an index into an open hash table, which is created to be never more than half full, so it will find any match in very few iterations. Since everything is memory mapped, after the initial access to swap it in, each search will take less than a microsecond
sr. member
Activity: 467
Merit: 267
March 11, 2016, 09:30:27 PM
#15
LevelDB being stupid is one of the major reasons that people have to reindex on Bitcoin Core crashes. There have been proposals to replace it but so far there are no plans on doing so. However people are working on using different databases in Bitcoin Core and those are being implemented and tested.

Maybe the most reliable DB is no DB at all? Use efficiently encoded read only files that can be directly memory mapped.

https://bitcointalksearch.org/topic/an-optimal-engine-for-utxo-db-1387119
https://bitcointalksearch.org/topic/using-compact-indexes-instead-of-hashes-as-identifiers-1377459
https://bitco.in/forum/forums/iguana.23/

James

LevelDb is used to store the UTXO set. How is that read only?

UTXO set falls into the write once category. Once an input is spent, you cant spend it again. The difference with the UTXO set is explained here: https://bitco.in/forum/threads/30mb-utxo-bitmap-uncompressed.941/

So you can calculate the OR'able bitmap for each bundle in parallel (as soon as all its prior bundles are there). Then to create the current utxo set, OR the bitmaps together.

What will practically remain volatile is the bitmap, but the overlay bitmap for each bundle is read only. This makes a UTXO check a matter to find the index of the vout and check a bit.

James

Not sure if we are talking about the same thing. Following your link, it seems you are describing the internal data structure used by a block explorer which aren't necessarily optimal for a bitcoin node.
In particular, you use a 6 byte locator. Given a new incoming transaction that can spend any utxo (hash+vout), do you need to map it to a locator? And if so, how is it done?

full member
Activity: 317
Merit: 103
March 11, 2016, 08:51:06 AM
#14
Blockchain systems probably need for versioned immutable databases with rollback possibility and efficient cleanup of old versions in background. There are no known implementations around though.

Well in fact we have half-done solution for that used in our open modular blockchain framework Scorex ( https://github.com/ScorexProject/Scorex ). We can externalize it and make Pull-Request to Bitcoinj if some Java dev would like to help with Java part
legendary
Activity: 1176
Merit: 1134
March 11, 2016, 08:37:26 AM
#13
LevelDB being stupid is one of the major reasons that people have to reindex on Bitcoin Core crashes. There have been proposals to replace it but so far there are no plans on doing so. However people are working on using different databases in Bitcoin Core and those are being implemented and tested.

Maybe the most reliable DB is no DB at all? Use efficiently encoded read only files that can be directly memory mapped.

https://bitcointalksearch.org/topic/an-optimal-engine-for-utxo-db-1387119
https://bitcointalksearch.org/topic/using-compact-indexes-instead-of-hashes-as-identifiers-1377459
https://bitco.in/forum/forums/iguana.23/

James

LevelDb is used to store the UTXO set. How is that read only?

UTXO set falls into the write once category. Once an input is spent, you cant spend it again. The difference with the UTXO set is explained here: https://bitco.in/forum/threads/30mb-utxo-bitmap-uncompressed.941/

So you can calculate the OR'able bitmap for each bundle in parallel (as soon as all its prior bundles are there). Then to create the current utxo set, OR the bitmaps together.

What will practically remain volatile is the bitmap, but the overlay bitmap for each bundle is read only. This makes a UTXO check a matter to find the index of the vout and check a bit.

James
full member
Activity: 317
Merit: 103
March 11, 2016, 06:13:52 AM
#12
LevelDB is surely about weak consistency for performance's sake, as well as most NoSQL databases.

Blockchain systems probably need for versioned immutable databases with rollback possibility and efficient cleanup of old versions in background. There are no known implementations around though.
sr. member
Activity: 467
Merit: 267
March 11, 2016, 05:58:31 AM
#11
LevelDB being stupid is one of the major reasons that people have to reindex on Bitcoin Core crashes. There have been proposals to replace it but so far there are no plans on doing so. However people are working on using different databases in Bitcoin Core and those are being implemented and tested.

Maybe the most reliable DB is no DB at all? Use efficiently encoded read only files that can be directly memory mapped.

https://bitcointalksearch.org/topic/an-optimal-engine-for-utxo-db-1387119
https://bitcointalksearch.org/topic/using-compact-indexes-instead-of-hashes-as-identifiers-1377459
https://bitco.in/forum/forums/iguana.23/

James

LevelDb is used to store the UTXO set. How is that read only?
Pages:
Jump to: