Author

Topic: Extracting data from a blk file. (Read 277 times)

brand new
Activity: 0
Merit: 0
December 22, 2020, 08:39:54 AM
#10
it's a great question brother
jr. member
Activity: 35
Merit: 10
December 18, 2020, 05:00:57 PM
#8
I understand that the question is about reading blk files, but why do you need it ? You can ask bitcoin RPC all the data stored in these files. 
Secondly, I was thinking of creating a different block explorer. Right now, block explorers respond with RPC commands and return their result to the client. I was thinking of inserting the blocks' information to a mysql database. This way, RPC is not needed.

I'm pretty sure that block explorers already use internally some database, I think it is NoSQL databases. I personally did this work some time before - I loaded BTC blockchain to Postgres and it was bad idea. Because for example 438 600 blocks contains  481 744 165 transactions and they contains 1 285 285 104 outputs. So you have huge tables and problems with indexing. Than i decided to use Elasticsearch and this was mush better. So you should think what database would be better to use to solve this problem.
As for RPC - I used bitcoin core rpc only to extract block data and write it to elastic. And I found bitcoinetl the best way to extract it.

 
legendary
Activity: 1512
Merit: 7340
Farewell, Leo
December 18, 2020, 03:02:39 PM
#7
I understand that the question is about reading blk files, but why do you need it ? You can ask bitcoin RPC all the data stored in these files. 
Well to answer that, first of all, as ETFBitcoin wrote, to understand how it works. Secondly, I was thinking of creating a different block explorer. Right now, block explorers respond with RPC commands and return their result to the client. I was thinking of inserting the blocks' information to a mysql database. This way, RPC is not needed.
jr. member
Activity: 35
Merit: 10
December 17, 2020, 12:15:20 PM
#6
I understand that the question is about reading blk files, but why do you need it ? You can ask bitcoin RPC all the data stored in these files. 
I also tried to read blk files last summer, and also found the tool https://github.com/ragestack/blockchain-parser/. But after that I found another tool https://github.com/blockchain-etl/bitcoin-etl. It can connect to node and get data about blocks, transactions, inputs and outputs. And as for me it works great.
full member
Activity: 173
Merit: 120
December 16, 2020, 07:04:02 PM
#5
A block file (blk.dat) is not in human readable form. I wanted to know if there is a way to extract the data of it. For example converting it into text with readable block header, version number, last block, merkle root, time, target and all of the transactions that are included in that block. The same way bitcoin core translates it to JSON.
I did something just like that last summer using Python using github parser code to get me 95% where I wanted to go.  I was parsing the earliest blocks to do some analysis on coinbase transactions so my code likely doesn't handle the latest blocks as is however.  I parsed the entire block file  in a matter of minutes (granted I was parsing early smaller ones!) and harvested the fields I was interested in to sqllite database for post processing.

The following reference was useful to understand the fields and how they were encoded.

https://developer.bitcoin.org/reference/block_chain.html

I tried a number of parsers and code examples until I got one that worked for what I was doing.  I think this one was one of the more useful ones because it was standalone and I could review and edit any of the code if I needed. I didn't want to trust a 'black box' running on my PC  Wink

https://github.com/ragestack/blockchain-parser/
https://github.com/ragestack/blockchain-parser/blob/master/README.md
Quote
Blockchain parser
Author: Denis Leonov [email protected]

Simple script for parsing blkXXXXX.dat files of Bitcoin blockchain database.

This script also compatible with most of altcoins, after making some tiny tricks.

The one realisation of blockchain parser that allows you to explore the main database as close as possible.
Don't worry to email me your questions or suggestions about this parser.

No dependencies, no third-parties modules or libs needed. Just install Python standart release and run.

Make sure you change the paths for blkXXXXX.dat files and for the parsing results to yours. The script works only with fully downloaded blockchain dat files (that are ~134Mb).

This script convert the full blockchain raw database that is stored in blkXXXXX.dat files to the simple txt view.

If this was helpfull for you, don't hesistate to make a donations!!!
Bitcoin (BTC): 1FvssyzXNnmgHbJg2DYwb7rkzTrtT8adcL
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
December 16, 2020, 10:42:52 AM
#4
According to https://learnmeabitcoin.com/technical/blkdat, blk*.dat files are just composed of an array of magic bytes, sizes that reveals how large the rest of the immediate BlockTransaction is, and BlockTransaction structures consisting of a block header, number of transactions and then the transactions themselves. It is very easy to build a parser for it in your language of choice, because the contents after each magic and size bytes pair are the BIP152 BlockTransaction structure I am referring to.

There are multiple blocks stored in a blk.dat file separated by the magic bytes and size of the BlockTransaction immediately after it, so long as the blk.dat size stays under 128MB. It's faster to read from a handful of files than putting each block in its own file and reading every single one of them. It's possible to read the whole 128MB in one shot for example.

I think 0xd9b4bef9 is the magic bytes from looking at the example block in the link above. If you encounter this in the middle of a file then that marks the beginning of a new block. Then the size of the BlockTransaction in hex (0x011d, which is an example size) followed by the block header which starts with a version field.

This link https://en.bitcoin.it/wiki/Protocol_documentation will be very helpful for you to understand the fields in each structure.

Keep in mind that all data in the file is in little endian, so if you make a parser to read it yourself, you have to reverse the order of the bytes for 32-bit fields, 16-bit fields etc. e.g. 0x12345678 becomes 0x78563412.
legendary
Activity: 952
Merit: 1386
December 16, 2020, 09:11:03 AM
#3
There is an interesting tool if you want to dump data to cvs file for analysis:
https://github.com/gcarq/rusty-blockparser
member
Activity: 73
Merit: 19
December 16, 2020, 04:20:08 AM
#2
A block file (blk.dat) is not in human readable form. I wanted to know if there is a way to extract the data of it. For example converting it into text with readable block header, version number, last block, merkle root, time, target and all of the transactions that are included in that block. The same way bitcoin core translates it to JSON.

Hi, you can use this for python -> https://github.com/ragestack/blockchain-parser

legendary
Activity: 1512
Merit: 7340
Farewell, Leo
December 16, 2020, 02:24:33 AM
#1
A block file (blk.dat) is not in human readable form. I wanted to know if there is a way to extract the data of it. For example converting it into text with readable block header, version number, last block, merkle root, time, target and all of the transactions that are included in that block. The same way bitcoin core translates it to JSON.
Jump to: