Author

Topic: specification of blockchain format (Read 16494 times)

full member
Activity: 166
Merit: 101
August 19, 2012, 11:01:30 AM
#6
More specifically, each new block is appended to the blkXXXX.dat files as they are received.  Their format is pretty simple:


Quote
Magic Bytes (4 bytes)
BlockSize w/ header (4 bytes)
Raw Header (80 bytes)
Number of Tx, N (VAR_INT)
Raw Tx1
Raw Tx2
...
Raw TxN
Magic Bytes (4 bytes)
BlockSize w/ header (4 bytes)
Raw Header (80 bytes)
Number of Tx, N (VAR_INT)
Raw Tx1
Raw Tx2
...
Raw TxN
Magic Bytes (4 bytes)
BlockSize w/ header (4 bytes)
Raw Header (80 bytes)
Number of Tx, N (VAR_INT)
Raw Tx1
Raw Tx2
...
Raw TxN
...

That's what I was after, thanks.  Based on https://en.bitcoin.it/wiki/Protocol_specification#block I could see that it was almost a concatenation of block data structures.  But the first 8 bytes before each block in the file were a mystery to me, and I had been discarding them.  The magic bytes plus blocksize now accounts for that.
legendary
Activity: 1428
Merit: 1093
Core Armory Developer
August 18, 2012, 04:52:11 PM
#5
More specifically, each new block is appended to the blkXXXX.dat files as they are received.  Their format is pretty simple:


Quote
Magic Bytes (4 bytes)
BlockSize w/ header (4 bytes)
Raw Header (80 bytes)
Number of Tx, N (VAR_INT)
Raw Tx1
Raw Tx2
...
Raw TxN
Magic Bytes (4 bytes)
BlockSize w/ header (4 bytes)
Raw Header (80 bytes)
Number of Tx, N (VAR_INT)
Raw Tx1
Raw Tx2
...
Raw TxN
Magic Bytes (4 bytes)
BlockSize w/ header (4 bytes)
Raw Header (80 bytes)
Number of Tx, N (VAR_INT)
Raw Tx1
Raw Tx2
...
Raw TxN
...
legendary
Activity: 905
Merit: 1011
legendary
Activity: 2128
Merit: 1065
August 18, 2012, 09:32:19 AM
#3
0) There's no "blockchain format" for the on-the-disk file.

1) blkNNNN.dat files are simple concatenation of the blocks as seen on the network wire.

2) because of the above and the possibility of Satoshi bitcoin client crashing mid-append, there is a possibility that those files contain partially-written blocks. There will be a header and at least portion of the transaction part written, but not all the way to the end.

3) blkindex.dat is just an index, nothing more. Currently it is in BerkeleyDB but there's a planned switch to LevelDB. None of this matters for your parser because the actual block chain will stay being stored in the above described simple format.

4) If you are trying to write your parser in C++ I suggest first looking into the parser written by the user znort987.

https://bitcointalksearch.org/topic/--88584
donator
Activity: 2058
Merit: 1007
Poor impulse control.
August 18, 2012, 05:59:57 AM
#2
If you mean the db, it'e Berkeleydb
full member
Activity: 166
Merit: 101
August 18, 2012, 05:58:20 AM
#1
I'm trying to write a parser for the blockchain, but I can't find a written specification of the format.  Does such a thing exist?  I don't really fancy reverse engineering it from the data or another parser's source code, but I guess I'll have to if it's not specified.
Jump to: