Author

Topic: Blockchain explorer reads from a chain not from external database (Read 547 times)

member
Activity: 148
Merit: 45
https://bitaps.com/
I'm developing now https://github.com/bitaps-com/btcapiserver

This is core for explorers or wallet backends. Python + Postgres.
Performance about 12 - 24 hours to initial parse bitcoin blockchain (depends of configurated modules and hardware)

Take look maybe my code will be useful for you.
member
Activity: 144
Merit: 38
I am relatively new to programming but now have more time to spend on this after 20 years of inactivity in programming.

What I wanted is a database that contains all readable BTC addresses with non-zero coins on it.
So I made a program in Java, using a H2 database that does the following:
- starting from block 1
- read block (with RPC call readblock to node)
- extract coinbase transaction and check to add to database (if address exists: add, if not add new record), RPC call rawtransaction
- iterate though every other transaction in the block:
  - starting with vout(): iterate vout() and add to existing address in database or add new record (read with RPC call rawtransaction)
  - then all vin(): iterate vin() and substract from existing database or erase record if sum == 0 (read with RPC call rawtransaction)
Some translation is necessary because not all transactions have the readable BTC address in the block.

Doing block 0 to block 100.000 (that has total of 500k of transactions) takes about 30 minutes.
Probably 95% of the time is waiting for the RPC response from my local node (running in a VM).
Database size for this 500k transactions is about 90Mb.
After the 500k transactions there are 65321 records in the database (so 65321 unique BTC addresses with non-zero value).
Only 3 fields in the database (blocknr, readable btc address, value). Only 1 index: on btc address.

Even if this thing takes 1 week to synchronise it does not really matter. Once synchronised every block will take about 10s to add.

Searching the H2 database for a BTC address takes << 1sec.
legendary
Activity: 3500
Merit: 6320
Crypto Swap Exchange
Many desktop/mobile application usually uses same practice, sacrifice performance/complexity for faster development to continuously satisfy customer with new features.
But it's horrible practice in this case since the application won't scale with complex hardware (multi-core CPU, multiple CPU, big RAM & multiple server).

I see that way to much. But, it's amazing when looking at the bloat how nobody cares about it anymore. The internet really broke programming. Now, somethings wrong you can download an update. 25+ years ago, you had to mail your customer a disc and that took time and money.

It's also the possible there's bottleneck with other hardware (such as CPU, GPU or RAM), bottleneck with slow internet connection or the task performs is either CPU, GPU or RAM intensive tasks.

True, but you have not had a good laugh as a hardware guy until after telling everyone it's the shitty programmers fault that it's running slow; you update the hardware to the extreme, and it's still just as slow because they have no idea how to optimize their tables or fix the poorly written query.

-Dave
legendary
Activity: 2870
Merit: 7490
Crypto Swap Exchange
How about use another Bitcoin full node client which claim offer better performance in this case, such as https://bitcore.io/?
Take note client besides Bitcoin Core might be outdated & not thoughtfully tested unlike Bitcoin Core.

You can hide a lot of programming things with better hardware. NOT an ideal solution but it can help you with your coding.

Many desktop/mobile application usually uses same practice, sacrifice performance/complexity for faster development to continuously satisfy customer with new features.
But it's horrible practice in this case since the application won't scale with complex hardware (multi-core CPU, multiple CPU, big RAM & multiple server).

Just my thoughts. I am not a programmer I'm the hardware guy. But I have seen some things that were SOOOOOO badly written that 5 x the hardware did not make a difference in the application speed.

It's also the possible there's bottleneck with other hardware (such as CPU, GPU or RAM), bottleneck with slow internet connection or the task performs is either CPU, GPU or RAM intensive tasks.
legendary
Activity: 3500
Merit: 6320
Crypto Swap Exchange
Another point to be made is how far do you want to "cheat".
I just did a test with an old scrypt coin with an Iquidus explorer that I am running.

The explorer and node are on a Hyper-V VM that is running off an average SSD
I wiped the DB and restarted, 1 GB of blockchain took about an hour or so to sync.

I created a 2GB ram drive on the Hyper-V server and copied the VM to it and did the same test. So instead of a "slow" SSD it came completely from RAM.
Under 10 minutes and done. I don't have an exact time, I was not staring at it. I figured I would check in 15 minutes to see how it was going but at the 10 minute mark I looked in and done. so ~ 6x faster.

You can hide a lot of programming things with better hardware. NOT an ideal solution but it can help you with your coding.
If you have a function that takes "X" time to run, then you do what I did you know it now takes 1/6 of "X"
If the next function you do takes "Y" time to run but running it from the ram drive still takes "Y" time, then there is another issue.

Just my thoughts. I am not a programmer I'm the hardware guy. But I have seen some things that were SOOOOOO badly written that 5 x the hardware did not make a difference in the application speed.

-Dave
staff
Activity: 3458
Merit: 6793
Just writing some code
Why don't you want to use an external database? In terms of databases, the blockchain is very inefficient data structure - it's a linked list. You are limited to a linear search just to find anything. It is far faster to use an external database to at least index the blockchain and store the index using an efficient to search data structure (like a B+ tree) so that you can search for things in the index and use that to find the actual data in the blockchain itself. And this appears to be what you have done; this is also what Bitcoin Core does.

Using an external database as an index and just reading the information from the blockchain files itself is fine for small projects or one time things, but as soon as you get a large number of users that can be accessing blockchain data concurrently, you will get a problem with data consistency and scalability of your application. The node is constantly reading and writing data to and from the files storing the blockchain. You will inevitably catch it in the middle of writing a block to a file which may result in corrupt information reaching your application. Furthermore, when you have lots of users, you will need to have multiple servers that can serve the users. By using modern database servers, it is easier to scale up and down the backend quickly and largely avoids data consistency issues.
legendary
Activity: 1456
Merit: 1175
Always remember the cause!
op,
Let's have this topic discussed in a more "scientific" way:

Why don't we invent a special wheel for our special machine? Firstly because it will eventually end to re-inventing the wheel or to make a bad wheel, secondly because it is just a distraction from our project with lots of risks and costs involved.

From a software engineering point of view, you need to use known saving and retrieval techniques whenever you don't have an obligation to do otherwise and it is the point: your question is about the reasons for following best practices while you need to ask why not?

In bitcoin, it has been discussed years ago and people have argued about the special situation with consensus and the risk of unintentional forks which escalates with employing a third party database system, because they believed that different versions of the system may answer differently to same queries.

That being said, I'm not satisfied with this argument. Actually as a practitioner in database systems, I denounce it completely because the queries we typically use in consensus are not complicated that much to trigger such a risk or such complications can be totally avoided in a clean architectural design.

On the other hand we have a lot of established and reliable database features critical for data management tasks, like what @DaveF has correctly mentioned, lock isolation, that make it an unwise decision, pretty much, not to use database systems for non-consensus related tasks.
legendary
Activity: 1124
Merit: 1013
ParalleCoin's ruler from the shadow
Thank you Dave, your answer is exactly hitting what I wanted to discuss. Those technical things, possibilities, and limitations.

I will do checks and testings to get right numbers regarding the topic you wrote.

Anyhow I made this online so it can be tested.

It is from my home ADSL on the domain so it is a little bit slow, but you all can check. When I finish I will explain all I have done all.

https://explorer.parallelcoin.io/
legendary
Activity: 3500
Merit: 6320
Crypto Swap Exchange
If you area using RPC calls from the wallet itself you will hit a wall. The bitcoin core wallet uses locks a lot of times to keep data structures clean. The same RPC calls will use the same locks. So if you are doing a lot of RPCs, it's going to lock and force the next call to have to wait for whatever to finish before moving on to the next call.

If you are reading the blocks from the disk and bypassing the wallet entirely, it could cause an issue if the wallet wants to do something and the file is locked because you are doing something.

If I misread what you are doing ignore this post :-) Only had 1 cup of coffee so far today.

-Dave

legendary
Activity: 1124
Merit: 1013
ParalleCoin's ruler from the shadow
Rust is faster than GoLang and in some ways much better (firefox is rust), but again it does things different than Go, although they are very similar. I believe rust will be a widespread substitution for C++. From my perspective I know I will write something in rust for sure, but never again in PHP Tongue ....although today I was struggling with some of the mine old PHP code which collects and show nodes statuses.
Point what I wanted to write is that I already know for most of them and I made my choice for development based on huge research.
The thing that I am waiting is to see VLang get developed well as it looks like it is the right combination Go and Rust.
Please check and write what you think about that concept. https://vlang.io/
full member
Activity: 615
Merit: 154
CEO of Metaisland.gg and W.O.K Corp
I think that the language used by Facebook to program Libra their coin will have a boost. It seems to be a good C++ alternative that combine the speed of C++, but the simplicity of a short learning curve language.
https://www.rust-lang.org/

You should have a look, it might be a good choice.



Don't be thrown to trash be recycled  Tongue

Well, I am looking too much forward maybe, but this tech, because we are on this forum, is the forwarding thing. I was working with PHP for a long time and I agree that 80% is PHP, from which 79% is Wordpress probably, but that does not prove it is not obsolete and that same percentage will be next year. I can bet it will go half in next 3-5 years. Because of all that is "the old www" which also becomes obsolete. I see future www as something for which PHP is not capable to do.

For most "regular" people Google is "the internet", from one hand and from another you need to be seen if you want to sell something, product or story, so by my opinion if you want something to work in that manner than do exactly what "the master" said.

Corporations are greedy, but if you look the wider picture of what is happening now in this field of IT is that many corporations made things open and even hired some important people or start to fund some independent organization/community.

When I was introduced to Go first thing which I found important was exactly that, the largest company put together in team Rob Pike, Ken Thompson and Robert Griesemer (who are the true legends, each of them deserve many credits for this what we have today, from UTF to Go) to design language for Google's needs, which means language for today's large network systems, etc.

Thing is I am not a programmer, I am an artist who is dealing with technology and my goal is not to do what someone wants just to be paid.

As soon as I make accessible online I will be happy to show you my work, for now, you can take a look at the code https://git.parallelcoin.io/marcetin/explorer it's easy to read. End thing will be making that standalone blockchain explorer application in which user will be able to put addresses of nodes to be read.

Thank you Smiley
legendary
Activity: 1124
Merit: 1013
ParalleCoin's ruler from the shadow
Don't be thrown to trash be recycled  Tongue

Well, I am looking too much forward maybe, but this tech, because we are on this forum, is the forwarding thing. I was working with PHP for a long time and I agree that 80% is PHP, from which 79% is Wordpress probably, but that does not prove it is not obsolete and that same percentage will be next year. I can bet it will go half in next 3-5 years. Because of all that is "the old www" which also becomes obsolete. I see future www as something for which PHP is not capable to do.

For most "regular" people Google is "the internet", from one hand and from another you need to be seen if you want to sell something, product or story, so by my opinion if you want something to work in that manner than do exactly what "the master" said.

Corporations are greedy, but if you look the wider picture of what is happening now in this field of IT is that many corporations made things open and even hired some important people or start to fund some independent organization/community.

When I was introduced to Go first thing which I found important was exactly that, the largest company put together in team Rob Pike, Ken Thompson and Robert Griesemer (who are the true legends, each of them deserve many credits for this what we have today, from UTF to Go) to design language for Google's needs, which means language for today's large network systems, etc.

Thing is I am not a programmer, I am an artist who is dealing with technology and my goal is not to do what someone wants just to be paid.

As soon as I make accessible online I will be happy to show you my work, for now, you can take a look at the code https://git.parallelcoin.io/marcetin/explorer it's easy to read. End thing will be making that standalone blockchain explorer application in which user will be able to put addresses of nodes to be read.

Thank you Smiley
full member
Activity: 615
Merit: 154
CEO of Metaisland.gg and W.O.K Corp
I am an old veteran programmer, so maybe you will throw me in the trash were I deserve to be, but I have programmed in a lot of languages, I will not list here, you won't believe me  Grin
Saying that PHP/Mysql is obsolete is wrong. 80% of the web is powered by PHP/Mysql https://w3techs.com/technologies/details/pl-php/all/all

NodeJS, Electron and all the Javascript stuff is a dying trend. They are good for doing smalls apps for a restaurant or an hotel, that staff can track on their phone. Small programs.
But it is so damn slow. I have developed with these technologies, and if you want to fill a datatable with 1000 fields, you put a Ryzen 1700 on its knee, in 2019, that's ridiculous.
I have worked on Atari and non compiled basic was running faster than Javascript on today's computers. There is a reason why Epic Games UE4 3d Engine is programmed in C/C++. You won't run a Pacman smoothly programmed in Javascript.

In the old time, people were laughing at you when you were working with an interpreter, or a byte code compiler, everything had to be coded in Assembly or in C++ and well optimized. Now everyone is using C# thinking it is quick, when it is only byte coded. On Mobile it was so slow that they had to use JIT to compile in ASM on the fly or the app would not even start.

I see young guys praising Javascript.  Huh Huh Huh it make me laugh.

Trust me, if you want to learn something for a long time, reuse your experience and your code in 20 years, stick with C++ and PHP Mysql. It is not the latest fashion, but if you are a good PHP/Mysql programmer, you can go anywhere in the world and find a job.

I understand that there is a fashion in all these REST based developments, due to some mobile specifications and the fact that you can cross compile easily. But it is t temporary.

For me, I stick on what is stable, not controlled by a big greedy corporation. What I learned 30 years ago is still valid to this day.

Anyway if you need a beta tester for your tool, I will give it a shot.



Thank you for your answer, appreciated.
...which again does not mean I am satisfied with your answer.
But let us start the discussion here Smiley

First of all PHP and MySQL are out of date for today's needs.

Luke Williams made Iquidus Explorer which is nodejs/mongodb based and much faster than any PHP/SQL solution I tried. I was using that explorer for needs of BitNodes.net and host over 30 explorers of different coins

Nodejs is good but... not enough.

So now there is GoLang, which is something like nodejs but with the power of C++. My strong recommendation if you do anything with networks (web, p2p, etc)

I am not sure which info is there useless if I want to provide full 3rd party RPC for other users.

A wallet needs to be on the same machine of course and all other counts the same as for any other database. Another thing is cache which makes things even faster.

As I wrote already, I have done a lot of research, which means I am researching the last 5 years, from pythons bitcoin-abe and Jake's PHP block crawler till these days. These days cannot be PHP and MySQL as that is obsolete.

I am using a combination of GoLang for backend and Google AMP for fronted, which is again generated by GoLang. The whole thing is very fast and there is no difference was it RPC or DB. BadgerDB was used in the first place until I got this idea to parse directly from RPC. Now for the database, I use simple Scribble, which just saves .json files in folders, which if I want to give something out I just use that folder as root address for static files on that prefix. So again, I think none of the databases can be fast as is fast reading a single json file.

In next couple of days, I will set this to look nicer and put it for the test online so if you are interested you can check how fast it is even thro ADSL and even reading RPC from the not same machine. One is the router which connects to ADSL with OPNSense running and other with FreeBSD 12 which runs daemons, which are binaries compiled on Ubuntu 16. That is another thing, Ubuntu binaries run faster on FreeBSD. It's not a piece of cake to set all working but results are fascinating.

Again, thank you for your answer and move of PHP/MySQL Cheesy
Write in Go

PS: Sorry if I made this little bit confusing Smiley
legendary
Activity: 1124
Merit: 1013
ParalleCoin's ruler from the shadow
Thank you for your answer, appreciated.
...which again does not mean I am satisfied with your answer.
But let us start the discussion here Smiley

First of all PHP and MySQL are out of date for today's needs.

Luke Williams made Iquidus Explorer which is nodejs/mongodb based and much faster than any PHP/SQL solution I tried. I was using that explorer for needs of BitNodes.net and host over 30 explorers of different coins

Nodejs is good but... not enough.

So now there is GoLang, which is something like nodejs but with the power of C++. My strong recommendation if you do anything with networks (web, p2p, etc)

I am not sure which info is there useless if I want to provide full 3rd party RPC for other users.

A wallet needs to be on the same machine of course and all other counts the same as for any other database. Another thing is cache which makes things even faster.

As I wrote already, I have done a lot of research, which means I am researching the last 5 years, from pythons bitcoin-abe and Jake's PHP block crawler till these days. These days cannot be PHP and MySQL as that is obsolete.

I am using a combination of GoLang for backend and Google AMP for fronted, which is again generated by GoLang. The whole thing is very fast and there is no difference was it RPC or DB. BadgerDB was used in the first place until I got this idea to parse directly from RPC. Now for the database, I use simple Scribble, which just saves .json files in folders, which if I want to give something out I just use that folder as root address for static files on that prefix. So again, I think none of the databases can be fast as is fast reading a single json file.

In next couple of days, I will set this to look nicer and put it for the test online so if you are interested you can check how fast it is even thro ADSL and even reading RPC from the not same machine. One is the router which connects to ADSL with OPNSense running and other with FreeBSD 12 which runs daemons, which are binaries compiled on Ubuntu 16. That is another thing, Ubuntu binaries run faster on FreeBSD. It's not a piece of cake to set all working but results are fascinating.

Again, thank you for your answer and move of PHP/MySQL Cheesy
Write in Go

PS: Sorry if I made this little bit confusing Smiley
full member
Activity: 615
Merit: 154
CEO of Metaisland.gg and W.O.K Corp
Reading the RPC wallet is not efficient and fast. It is better to read the blocks, add them to a database, and then do a query of the Database.

This is what Yiimp pool is doing, here is an example: http://pool.kryptofranc.net/explorer/KYF

Now if you want to have something in PHP that read the RPC wallet, there is a function in Yiimp where you can update the blocks in the MYSQL database.
Basically what it does is to erase all the blocks, and scan all the blockchain with RPC to refill the MYSQL database.

It is fast on a small block chain.

The source code is located here: https://github.com/tpruvot/yiimp/blob/next/web/yaamp/core/backend/blocks.php

Check this function: function BackendBlocksUpdate($coinid = NULL)

This function crawl the blockchain and you could use this code to crawl directly with PHP.

The reason why a database is better is just because MYSQL is faster, just that, RPC is calling the wallet that sends back a JSON with a lot of useless info, you need to parse the JSON and extract the information. Generally, it is faster and more convenient to use a DB, specially if your Wallet is not connected all the time.
But
In some case, crawling directly is more efficient if you need a small excerpt of the blockchain. For instance here: http://pool.kryptofranc.net/site/mining
The last 20 blocks at the right are not coming from the DB but directly from the RPC wallet, but if I restart the wallet, the page is showing nothing.
legendary
Activity: 1124
Merit: 1013
ParalleCoin's ruler from the shadow
I have done a lot of research but 100% of open source blockexplorers I found use external database to store blockchain information. There are few so-called "crawlers" but they are all too old and just for a few simple calls.
What I made at the end is Bitcoin RPC compatible blockchain explorer written in GoLang which reads blocks and tx data directly from the chain and use external database just for some collected or calculated data which can not be pulled directly from a chain, as are addresses data, rich list, etc..
My point was, why should I use an external database when there is one already, a chain. Please just don't tell me MySQL or MongoDB is so much faster that is mandatory to use them. After I did a lot of tests I found my way was enough fast and with the usage of some cache I got really good results, even it runs from a local FreeBSD machine connected with ADSL 10/1Mbits  I was getting so fast results from distant locations that I was amazed.
Starting point whit which I was confused was the number of confirmations of a single block which rises when a new block appears on the network. So, or it is needed to be read again from a chain or needs to have some script to update that number in a database. That is how I end up with this "direct from blockchain" technic.
Please share some thoughts on this if you know some particular reasons why should be external database based or do you find my way better to do.
Thank you all in advance.
Jump to: