Author

Topic: New database for blockchain (Read 460 times)

legendary
Activity: 2870
Merit: 7490
Crypto Swap Exchange
December 08, 2021, 06:03:33 AM
#24
I think you should give GIT a try, it's basically a key=value storage systems, does something similiar as to what you want, store content with a content hash.

--snip--

While you could git as key-value database, conclusion on this blog (https://www.kenneth-truyers.net/2016/10/13/git-nosql-database/) telling us git isn't suitable for OP use case. (mainly because lots of small data).
full member
Activity: 385
Merit: 110
December 07, 2021, 10:46:21 PM
#23
I want to create a new database for the blockchain.
I would like to create a file for each block.
eg: 000000000019D6689C085AE165831E934F763AE46A2A6C172B3F1B60A8CE26F.Block
That would be more than 700,000 files.

NTFS can be 4.294.967.295 files in a folder.
So theoretically it should go.
Question:
Has anyone ever tested such a thing?
Would the file system crashed or something similar?

My idea is through the file name of the blockhash is to find and open the blocks very quickly.
Ideas and suggestions are welcome :-)

I think you should give GIT a try, it's basically a key=value storage systems, does something similiar as to what you want, store content with a content hash.

Git does have some decent performance thanks to big linux kernel development approx 2 GB in size, ofcourse bitcoin much larger 150x larger.

I am not sure how git works internally, have not gotten that far yet with it's analysis... it does use a database stored in a .git folder.

GIT is open source I believe, so you could analyze the source code to see how it works.

GIT is also very similiar to BITCOIN... GIT also uses "blocks" which are called "commits" and it also calculates hashes, sha-1 I believe.

And these blocks/commits are also linked to each other in a similiar way.

Perhaps somebody involved with GIT was also involved in the creation of BITCOIN ! Wink
copper member
Activity: 1666
Merit: 1901
Amazon Prime Member #7
November 08, 2021, 12:20:36 AM
#22
You should deserialize each transaction (this will take a long initial time), and place transaction data in one NoSQL table (minus vin and vout arrays), harvest the vin and vout data to make a utxo table, and take the addresses and values out of each vout to construct an addresses table.

And of course you can fetch each block, deserialize the data and create a blocks table while you're at it.

SQL databases are horribly inefficient. NoSQLs like MongoDB are better.
You are describing a relational database structure (such as SQL) in a convoluted way.

A NoSQL collection of blocks might break down as follows:
record for block x --> transactions --> list of vIns, each of which references a previously confirmed transaction and index to that transaction and list of vOuts, each of which reference an address and amount.

If a particular vOut is spent, it would be very difficult to lookup that transaction in a NoSQL database.

If you are using a relational database (such as SQL), you can create an index on any column you are using a WHERE clause on (or are aggregating data on). So if you ever will filter by txid, you can create an index on the txid column in your transactions table. Querying a SQL database when filtering by data in a column that is indexed will be very fast, while querying when filtering by data in a column that is not indexed will be slow (or maybe very slow if your database has many records).

If you need to frequently access data, you can also create a temporary table. Bitcoin core already does this (or something very similar) by keeping the UTXO set in RAM.
full member
Activity: 161
Merit: 168
November 07, 2021, 03:04:05 PM
#21
My previous experience shows that the deserialization of transactions is the smallest problem and goes very fast.
Loading the transaction from the BLK files takes much longer, because min. 2 hard disk access is needed.
I'm trying to reduce this to a hard disk access.
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
November 07, 2021, 01:49:23 PM
#20
You should deserialize each transaction (this will take a long initial time), and place transaction data in one NoSQL table (minus vin and vout arrays), harvest the vin and vout data to make a utxo table, and take the addresses and values out of each vout to construct an addresses table.

And of course you can fetch each block, deserialize the data and create a blocks table while you're at it.

SQL databases are horribly inefficient. NoSQLs like MongoDB are better.
sr. member
Activity: 310
Merit: 727
---------> 1231006505
November 02, 2021, 06:58:05 AM
#19
You can bring in the raw transactions and just compress that one table: https://mariadb.com/kb/en/innodb-page-compression/
I would *think* that it would still be much faster then using a rpc call to get the data.
For single person using it, it probably would not matter. If it's exposed for the world to use then you need to have it in a db.

-Dave
It all depends on the use-case you have. If for instance I would create a webservice and expose it to the world to return the raw transaction based on a txid in the request it would make very much sense to store all the transactions in a DB.

But if I would like to be able to analyze/query the blockchain I wouldn't get very far only storing the rawtransactions in a DB. Raw transactions are optimized to contain as much data as needed in a minimum storage space. However good luck creating a query like "when was the last time adress X received over 0.1 BTC" based on raw transaction data only. There is a reason the UTXO's are stored in a DB by core itself.

So what I meant (and what I did) I extracted all the info needed from the raw transactions and stored them in a several RDBMS tables. Added some indexes and this a perfect solution for my use-case. So in the end I had no need for the raw transactions themselves anymore.

TLDR: Just storing raw transactions in a RDBMS table doesn't mean all the data within is easy accessible.
legendary
Activity: 3500
Merit: 6320
Crypto Swap Exchange
October 31, 2021, 06:18:43 AM
#18
Ok, I think I will try to pack all RAW transactions directly in a Leveldb database.
Has anyone tried that already?
Hi, I have done this in a RDBMS (MariaDB). Although after a while I got rid of storing the RAW transactions myself, just made a couple of database tables that store the information I need (like block, transaction, address, etc). In the rare occasions I do need a raw transaction I just provide the txid to the rpc api from core (getrawtransaction). Just keep in mind that adding indexes will make it more efficient but it will also add to the storage space required. I got it setup on a 2TB sdd which is now approx. used for 50%. If I would have kept all the RAW transactions in the DB it would mean I would probably run into storage problems by now.


You can bring in the raw transactions and just compress that one table: https://mariadb.com/kb/en/innodb-page-compression/
I would *think* that it would still be much faster then using a rpc call to get the data.
For single person using it, it probably would not matter. If it's exposed for the world to use then you need to have it in a db.

-Dave
sr. member
Activity: 310
Merit: 727
---------> 1231006505
October 31, 2021, 03:41:04 AM
#17
Ok, I think I will try to pack all RAW transactions directly in a Leveldb database.
Has anyone tried that already?
Hi, I have done this in a RDBMS (MariaDB). Although after a while I got rid of storing the RAW transactions myself, just made a couple of database tables that store the information I need (like block, transaction, address, etc). In the rare occasions I do need a raw transaction I just provide the txid to the rpc api from core (getrawtransaction). Just keep in mind that adding indexes will make it more efficient but it will also add to the storage space required. I got it setup on a 2TB sdd which is now approx. used for 50%. If I would have kept all the RAW transactions in the DB it would mean I would probably run into storage problems by now.
legendary
Activity: 1456
Merit: 1176
Always remember the cause!
October 30, 2021, 05:01:44 PM
#16
Ok, I think I will try to pack all RAW transactions directly in a Leveldb database.
Has anyone tried that already?

Not that leveldb is bad, but why not mariadb?
Leveldb is bad for the job, once it comes to explorer class of applications, because it lacks the querying power of an RDBMS such as  MariaDB which you have already recommended thoughtfully, the latter is open source and guaranteed to remain open source inheriting the rich history and highly crafted source code of the legendary MySQL.
legendary
Activity: 3500
Merit: 6320
Crypto Swap Exchange
October 30, 2021, 10:30:08 AM
#15
Ok, I think I will try to pack all RAW transactions directly in a Leveldb database.
Has anyone tried that already?

Not that leveldb is bad, but why not mariadb?
There are major corporations running multi terabyte databases on it. A few hundred GB of blockchain data is not even going to make it stress.
A 1 TB or 2 TB SSD and a server with 32 GB of ram and you should have no issues at all.

You keep trying to do things the hard way when people have done them other better ways many times before.

-Dave


copper member
Activity: 2996
Merit: 2374
October 30, 2021, 10:12:10 AM
#14
Quote
The current way of storing multiple blocks in a file and storing an index for their location is fast enough. Why do you need more speed?

I need a faster database for my block Explorer.

All TX to get an address takes too long.
This address, e.g. does not work anymore: 12c6DSiU4Rq3P4ZxziKxzrL5LmMBrzjrJX
See here: https://bitcoinexplorer.org/address/12c6DSiU4Rq3P4ZxziKxzrL5LmMBrzjrJX

Currently, it is so that there are minimal 2 hard disk access necessary to get a TX.
The first for the TX index in the Leveldb and the second then in the BLK.DAT

I am looking for a way with just a hard disk access.
Maybe I can manage to load the indexing in the RAM or to work with bloom filter.
Most likely you need to update your config file to reflect txindex=1

This is going to increase the amount of disk space required to run your node, and it will take longer to sync.

When you are running any kind of database, you need to create an index on any column that you will potentially search by. Putting information in RAM, increasing CPU capacity, increasing RAM, changing the file structure etc will not fix your problem.

Ok, I think I will try to pack all RAW transactions directly in a Leveldb database.
Has anyone tried that already?
This will make your database inefficient. It will require multiple queries any time you want to look something up. Lebeldb is not a relational database, and IMO you need a relational database in order to run a block explorer.
full member
Activity: 161
Merit: 168
October 30, 2021, 09:57:23 AM
#13
Ok, I think I will try to pack all RAW transactions directly in a Leveldb database.
Has anyone tried that already?
legendary
Activity: 1456
Merit: 1176
Always remember the cause!
October 30, 2021, 07:23:38 AM
#12
Using an alternative approach to persistency of blocks in a blockchain is not a bad idea per se, but OP's proposal, using FS for this purpose is the worst idea ever as most folks have mentioned correctly above, still it remains a good topic to discuss as long as we forget about FS thing and look for better alternatives, IMO.

I'd suggest a full-fledged RDBMS, obviously.
legendary
Activity: 3472
Merit: 10611
October 30, 2021, 06:55:56 AM
#11
Did you noticed that the balance on which you have linked block explorers is different?
Just because some dumb block explorers have bugs doesn't change anything about what HCP said.
The bug in this case is that because these explorers were too lazy to code "search by script" and there is no address associated by P2PK, they decided that it is fine to convert a P2PK script to an entirely different script called P2PKH and show that address. Which is why they are counting 50BTC more for this address whereas the 50BTC output is actually associated with the P2PK script not the P2PKH (the address you posted) script.
legendary
Activity: 2870
Merit: 7490
Crypto Swap Exchange
October 30, 2021, 06:43:15 AM
#10
That's extremely inefficient. You will end up with a lot of files that are so tiny (176-300 bytes) and opening each file and reading it is still consuming the biggest percentage of the total time.

Don't forget the bloat due to filesystem block size (these days most filesystem have 4KB as default block size).

I want to program the database itself, which makes it possible, more or less. It's worth a try.

Consider taking course on Database. Or just run Electrum server locally and disable the rate limit.
full member
Activity: 161
Merit: 168
October 30, 2021, 06:08:16 AM
#9
That address is working "fine" on other block explorers:
"Our monetary system works "fine" for other banks."

Did you noticed that the balance on which you have linked block explorers is different?   ... "working fine" ;-)

Of course it works somehow, that was not the question here.
I can, if I have the resources, also load the whole blockchain into the RAM, then it works great fast.
....
--txid-limit it does not work.
HCP
legendary
Activity: 2086
Merit: 4363
October 30, 2021, 04:47:01 AM
#8
I need a faster database for my block Explorer.

All TX to get an address takes too long.
This address, e.g. does not work anymore: 12c6DSiU4Rq3P4ZxziKxzrL5LmMBrzjrJX
See here: https://bitcoinexplorer.org/address/12c6DSiU4Rq3P4ZxziKxzrL5LmMBrzjrJX
That address is working "fine" on other block explorers:
https://btc.com/btc/address/12c6DSiU4Rq3P4ZxziKxzrL5LmMBrzjrJX
https://blockchair.com/bitcoin/address/12c6DSiU4Rq3P4ZxziKxzrL5LmMBrzjrJX
https://live.blockcypher.com/btc/address/12c6DSiU4Rq3P4ZxziKxzrL5LmMBrzjrJX/
https://www.blockchain.com/btc/address/12c6DSiU4Rq3P4ZxziKxzrL5LmMBrzjrJX
https://www.blockstream.info/address/12c6DSiU4Rq3P4ZxziKxzrL5LmMBrzjrJX


What you're coming up against with that particular address is a limitation with the Electrum server that the Block Explorer is using to retrieve data. It's not really an issue with the way the data is being stored on disk. Did you read the error message the the block explorer puts out.
Failed to retrieve transaction history from Electrum Rust Server (electrs). See Issue #67 for more information.
As a workaround, consider starting electrs with a custom --txid-limit argument, to support longer transaction histories.

Have you tried running electrs with a custom --txid-limit as it suggests? Huh (Assuming you are running electrs and not ElectrumX... as apparently the workaround is specific to electrs)
full member
Activity: 161
Merit: 168
October 30, 2021, 02:48:44 AM
#7
Choosing a DB is always a threshold choice, depending on what you want to achieve in terms of, for example, storage occupation, query speed (which by itself depends on query type you have in mind), resilience to adverse events (e.g. power leakage causing DB shutting down abruptly in the the middle of a transaction (=set of operation to be executed or not atomically, aka as just a unique comprehensive one), and so on...
Given that, it's uncommon to begin thinking about it from beneath file system, or chunks size.

There are two requirements for the database that are very hard.
1. The blockchain is> 400GB
2. I have to read more than 2000 transactions from the blockchain for displaying an address in the block explorer.

I am looking for an efficient approach to this problem.
I do not know any database that focuses this specific problem.
As they have already described correctly.
Most databases are made for applications that are often occurring and not specialized.
I want to program the database itself, which makes it possible, more or less. It's worth a try.
member
Activity: 90
Merit: 91
October 30, 2021, 02:26:05 AM
#6

Hi, some unleashed thoughts

Choosing a DB is always a threshold choice, depending on what you want to achieve in terms of, for example, storage occupation, query speed (which by itself depends on query type you have in mind), resilience to adverse events (e.g. power leakage causing DB shutting down abruptly in the the middle of a transaction (=set of operation to be executed or not atomically, aka as just a unique comprehensive one), and so on...
Given that, it's uncommon to begin thinking about it from beneath file system, or chunks size.

A common way of thinking is that a DB is good if in some way it intrinsically maps the "shape" of the data is going to hold, without needing brand new "structures" (whichever it could mean) between the way the DB operates and the contained data. The heuristic at the base of this reasoning is that the shape of the data determines which operations we could or could not do effectively and efficiently on it, so choosing a DB mirroring that shape is the best you can do without further constraints on the interesting operations.

That said an interesting world to look at is the Graph Databases ones. As far as I know the reference here is NEO4J; it also has a free community edition: https://neo4j.com/download-center/#community and Greg Walker made a PoC for Bitcoin blockchain: https://neo4j.com/blog/import-bitcoin-blockchain-neo4j/ and https://learnmeabitcoin.com/neo4j/

My 2 cents Smiley
baro
full member
Activity: 161
Merit: 168
October 30, 2021, 02:14:17 AM
#5
Quote
The current way of storing multiple blocks in a file and storing an index for their location is fast enough. Why do you need more speed?

I need a faster database for my block Explorer.

All TX to get an address takes too long.
This address, e.g. does not work anymore: 12c6DSiU4Rq3P4ZxziKxzrL5LmMBrzjrJX
See here: https://bitcoinexplorer.org/address/12c6DSiU4Rq3P4ZxziKxzrL5LmMBrzjrJX

Currently, it is so that there are minimal 2 hard disk access necessary to get a TX.
The first for the TX index in the Leveldb and the second then in the BLK.DAT

I am looking for a way with just a hard disk access.
Maybe I can manage to load the indexing in the RAM or to work with bloom filter.
HCP
legendary
Activity: 2086
Merit: 4363
October 29, 2021, 10:59:16 PM
#4
Or to ask it in a slightly different way... what do you believe you will gain from this setup as opposed to the current setup?
legendary
Activity: 3472
Merit: 10611
October 29, 2021, 10:46:38 PM
#3
That's extremely inefficient. You will end up with a lot of files that are so tiny (176-300 bytes) and opening each file and reading it is still consuming the biggest percentage of the total time.

My idea is through the file name of the blockhash is to find and open the blocks very quickly.
The current way of storing multiple blocks in a file and storing an index for their location is fast enough. Why do you need more speed?
legendary
Activity: 4522
Merit: 3426
October 29, 2021, 10:12:16 PM
#2
Do you really think that your database is better? There is more to a database than how to organize the files.
full member
Activity: 161
Merit: 168
October 29, 2021, 06:07:37 PM
#1
I want to create a new database for the blockchain.
I would like to create a file for each block.
eg: 000000000019D6689C085AE165831E934F763AE46A2A6C172B3F1B60A8CE26F.Block
That would be more than 700,000 files.

NTFS can be 4.294.967.295 files in a folder.
So theoretically it should go.
Question:
Has anyone ever tested such a thing?
Would the file system crashed or something similar?

My idea is through the file name of the blockhash is to find and open the blocks very quickly.
Ideas and suggestions are welcome :-)
Jump to: