Author

Topic: Querying the blockchain with SQL databases (Read 193 times)

legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
August 31, 2023, 05:22:40 AM
#8
What exactly do you mean by time consuming?

I meant writing a block parser to database, by hand. But it seems that someone has already done that as your link to Github shows.

But it has a problem: It depends on Python 2.7, which has been unsupported for a while, and most distros don't even carry Python2 anymore. It should be straightforward to port it to Python 3 if it's not using Python2 dependencies as well.
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
August 29, 2023, 01:49:33 AM
#7
Just going to make the comment that if you want to do anything 'real' with the data you really should download core and import the data into the DB yourself.

You can't do that because Bitcoin Core is using LevelDB which is not a relational database, but a NoSQL-type. You could still use a script to conver tthe SQL into whatever format is being dumped into the LevelDB but that is way too time-consuming.

This is correct, I've seen multiple complaints of missing data on the old dataset "bigquery-public-data.bitcoin_blockchain.blocks", a case like missing transaction output not correct or shown, but the new dataset `bigquery-public-data.crypto_bitcoin.transactions`seem to have all the blockchain data and updated every 10 minutes or so, not sure, most times people who complain for missing data didn't send out the right query. Google developers also provided a form https://issuetracker.google.com/issues/new?component=187149&template=0 where people who have missing results or any underlying difficulty can send their complaints in details and their dispute may get settled. I'm not trusting them, only that they're trying to solve a problem on their own end, also putting a bitcoin dataset on their bigquery is a good move, despite the certainty people or users will time to time face issues unlike those who download bitcoin core and query directly by themselves. I understand your points, things can get fishy, since it's managed by some developers, aside it, we have explorers to crosscheck our results if unsatisfied.

Datasets are seldom updated, only a few times per year. It is unlikely that the data contained inside will ever be up-to-date and suitable for real-time use.
legendary
Activity: 3472
Merit: 10611
August 28, 2023, 11:34:55 PM
#6
found that the digits doesn't have a decimal point that separates the number
Decimals are only used in the User Interface to show the value to the user in different ways. Otherwise as far as the protocol and consensus rules are concerned there is no decimal values used anywhere. Every number in Bitcoin is an integer (a whole number) of different size, most are 32 bit (like version and locktime), some are variable size (like script lengths), we have a couple of 256 bit integers (used in difficulty) and the only 64 bit integers we have are the "amounts" field in the outputs of each transaction.
hero member
Activity: 1274
Merit: 561
Leading Crypto Sports Betting & Casino Platform
August 28, 2023, 01:20:58 PM
#5
Just going to make the comment that if you want to do anything 'real' with the data you really should download core and import the data into the DB yourself.

Weather deliberate or by accident if you're relying on somebody else's data for financial transactions you are just asking for trouble. Did Google want to censor something? Oops you got the wrong answer to your query.
Did some engineer at Google take a shortcut and copy a set of data to save time and forgot to bring in the real data? oops you've got problems.

Just picking on Google since it's who you are talking about but any place that you're relying on their data you are relying on them to be trustworthy. Might not even have to be the organization itself. Could just be a rogue engineer who's trying to scam something.

-Dave

This is correct, I've seen multiple complaints of missing data on the old dataset "bigquery-public-data.bitcoin_blockchain.blocks", a case like missing transaction output not correct or shown, but the new dataset `bigquery-public-data.crypto_bitcoin.transactions`seem to have all the blockchain data and updated every 10 minutes or so, not sure, most times people who complain for missing data didn't send out the right query. Google developers also provided a form https://issuetracker.google.com/issues/new?component=187149&template=0 where people who have missing results or any underlying difficulty can send their complaints in details and their dispute may get settled. I'm not trusting them, only that they're trying to solve a problem on their own end, also putting a bitcoin dataset on their bigquery is a good move, despite the certainty people or users will time to time face issues unlike those who download bitcoin core and query directly by themselves. I understand your points, things can get fishy, since it's managed by some developers, aside it, we have explorers to crosscheck our results if unsatisfied.
legendary
Activity: 3500
Merit: 6320
Crypto Swap Exchange
August 27, 2023, 08:48:18 AM
#4
Just going to make the comment that if you want to do anything 'real' with the data you really should download core and import the data into the DB yourself.

Weather deliberate or by accident if you're relying on somebody else's data for financial transactions you are just asking for trouble. Did Google want to censor something? Oops you got the wrong answer to your query.
Did some engineer at Google take a shortcut and copy a set of data to save time and forgot to bring in the real data? oops you've got problems.

Just picking on Google since it's who you are talking about but any place that you're relying on their data you are relying on them to be trustworthy. Might not even have to be the organization itself. Could just be a rogue engineer who's trying to scam something.

-Dave
newbie
Activity: 11
Merit: 0
August 27, 2023, 07:22:23 AM
#3
Just yesterday I was wondering if this was possible, OP I would drop you several merit if I had any, thank you.
legendary
Activity: 2870
Merit: 7490
Crypto Swap Exchange
August 27, 2023, 07:13:56 AM
#2
That's interesting, i didn't know Google had public dataset including Bitcoin blockchain.

Carried out a search on some SQL programs that can query the blockchain, Saw Abe, but it required that I download bitcoin core, which wasn't what I needed. Knew of some RPC commands that people who run full nodes can use to retrieve data on the blockchain. So I checked out bigquery, read it on a PDF, a google product that has some bitcoin datasets, with two tables blocks and transaction, including interval structures stored in columns; TXin & txout of a transaction, stored in the inputs and outputs columns of the transactions respectively.

There are other platform to do that (such as https://www.dolthub.com/repositories/web3/bitcoin/data/main), but they no longer update their data.

This retrieves different addresses with the amount of bitcoin contained in them, but the numbers didn't look realistic at first, like this 5502219754649, then I copied one address: bc1qjasf9z3h7w3jspkhtgatgpyvvzgpa2wwd2lr0eh5tx44reyn2k7sfc27a4 searched on blockchain explorer and found that the digits doesn't have a decimal point that separates the number, though correct. It looked this way on block explorer 55022.19754649 Tried out different queries, which I may not post because I didn't understand the results, didn't stop loading or too long to share.

FYI, Bitcoin full node software which was initially written by Satoshi (now called Bitcoin Core) use int64_t to store amount of satoshi. So it's realistic/not weird to see other platform also store amount of satoshi under the hood.
hero member
Activity: 1274
Merit: 561
Leading Crypto Sports Betting & Casino Platform
August 27, 2023, 05:34:09 AM
#1
Carried out a search on some SQL programs that can query the blockchain, Saw Abe, but it required that I download bitcoin core, which wasn't what I needed. Knew of some RPC commands that people who run full nodes can use to retrieve data on the blockchain. So I checked out bigquery, read it on a PDF, a google product that has some bitcoin datasets, with two tables blocks and transaction, including interval structures stored in columns; TXin & txout of a transaction, stored in the inputs and outputs columns of the transactions respectively.

It all looked complex to understand at first glance, but read further to discover that it's mainly to help analysis for various kinds of analytic problems, and not for exploring individual transactions.

I queried for the transactions with zero fees, which I also read on the introductory aspect of the program, which explains that miners add their transactions to a block without paying fees. The query retrieved the number of transactions with zero fees

Code:
SELECT 
  ROUND((input_value - output_value)/ size, 0) AS fees_per_byte,
  COUNT(*) AS txn_cnt
FROM
  `bigquery-public-data.crypto_bitcoin.transactions`
WHERE TRUE
  AND block_timestamp >= '2023-08-24'
  AND is_coinbase IS FALSE
GROUP BY 1


Did another test run on another query I found on the article written about the bitcoin dataset, the query is to retrieve balances on different addresses.

Code:
WITH double_entry_book AS (
   -- debits
   SELECT
    array_to_string(inputs.addresses, ",") as address
   , inputs.type
   , -inputs.value as value
   FROM `bigquery-public-data.crypto_bitcoin.inputs` as inputs
   UNION ALL
   -- credits
   SELECT
    array_to_string(outputs.addresses, ",") as address
   , outputs.type
   , outputs.value as value
   FROM `bigquery-public-data.crypto_bitcoin.outputs` as outputs
)
SELECT
   address
,   type   
,   sum(value) as balance
FROM double_entry_book
GROUP BY 1,2
ORDER BY balance DESC
LIMIT 1000

This retrieves different addresses with the amount of bitcoin contained in them, but the numbers didn't look realistic at first, like this 5502219754649, then I copied one address: bc1qjasf9z3h7w3jspkhtgatgpyvvzgpa2wwd2lr0eh5tx44reyn2k7sfc27a4 searched on blockchain explorer and found that the digits doesn't have a decimal point that separates the number, though correct. It looked this way on block explorer 55022.19754649 Tried out different queries, which I may not post because I didn't understand the results, didn't stop loading or too long to share.

The bitcoin dataset is 'bigquery-public-data.crypto_bitcoin' other cryptocurrency datasets also exists too, you can read this thread

Google is still building the model at the moment I notice some limitations, you can also use Legacy or Google SQL to execute query on bigquery.
Jump to: