Author

Topic: Bitcoin block validator from scracth - bitcoin learning plan (Read 109 times)

newbie
Activity: 7
Merit: 24
First of all you can't verify blocks like this (at random position in chain), it is a blockchain and has to be verified like a chain meaning you verify block 0 then 1 then 2.... and you can't verify block n without having already verified the previous n blocks.
....
As for SegWit you should know a couple of things:

....
P.S. Also try to think of transaction, scripts, hashes,... as a stream of byte instead of a JSON, hex, etc. that would simplify a lot of things for you such as confusing the zeros in a block hash.

Yes, of course - its a linked list that can be traversed from the tail and validated from the head! I am thinking about how to keep an updated list of transactions in memory to reduce number of passes through the entire chain - creating a map of the entire chain first by reading the block header only on first pass and then start from the beginning seems to be good enough and this is what I plan to do (I'll need 300GB+ disk space- but thats ok for now. I think I'll just make it work first and then optimize performance. And thanks for the segwith steps.

I'm not expert, but i know there's bug on multi-signature verification (before SegWit). It's solved by adding OP_0, so you might want to take note when you implement script validation for multi-signature.
See https://github.com/bitcoin/bips/blob/master/bip-0147.mediawiki.

Thats very good information. I have no other option but to take the existing chain as the source of truth and organize validator so that it does not claim anything to be invalid - and still making sure no rule is skipped - so it'll be able to catch real bad blocks.
legendary
Activity: 2856
Merit: 7410
Crypto Swap Exchange
I plan to start by validating the block difficulty, then signature validation and then rest of the rules (script validation left for last step as I will need a small VM for that).

I'm not expert, but i know there's bug on multi-signature verification (before SegWit). It's solved by adding OP_0, so you might want to take note when you implement script validation for multi-signature.

See https://github.com/bitcoin/bips/blob/master/bip-0147.mediawiki.

I am not sure how updated those rules are.

The bottom of the webpage show this, although it doesn't guarantee the editor made sure it's up to date during editing.

Code:
This page was last edited on 23 June 2020, at 10:29.
legendary
Activity: 3430
Merit: 10505
First of all you can't verify blocks like this (at random position in chain), it is a blockchain and has to be verified like a chain meaning you verify block 0 then 1 then 2.... and you can't verify block n without having already verified the previous n blocks.
Even if it is the minimal initial verification of blocks because one step is for example verification of the Target (nbits) which is not possible if you can't compute the target for each block header at that particular height.
Obviously the next part of verification (transactions/scripts/signatures) demands verifying like a "chain" and having a chainstate.

As for SegWit you should know a couple of things:
- Witness is not a separate thing that is stored in some other location (as some bad resources on the internet tell you). It is simply another field in the transaction class just like version, txin, locktime,...
- Witness is similar to signature scripts but it is not a script anymore it is the "stack items" so it is not evaluated like a script but as an already existing stack.
- The witness structure is like this: [number of stack items as compact size][compact size length of the item][the item]. For example for a P2WPKH witness it is like [2][signature length][signature][pubkey length][pubkey].
- If the transaction has any witness it has to have a marker that is 0001 right after the version
- If the transaction has any witness the witness count is the same count as input count but for inputs that don't have witness we set 0 (an empty witness).
- For a block that has at least a single transaction with a witness you also have to compute the witness merkle root using all transactions with coinbase wtxid=00...000 and the final result should exist in the last output of the coinbase transaction using an OP_RETURN (details are in BIP141).

P.S. Also try to think of transaction, scripts, hashes,... as a stream of byte instead of a JSON, hex, etc. that would simplify a lot of things for you such as confusing the zeros in a block hash.
newbie
Activity: 7
Merit: 24
Keep in mind that re-implementations of the bitcoin protocol or block validator, other than Bitcoin Core, are not safe to use in production because they might have overlooked already resolved security vulnerabilities, and should only be used for learning purposes. This also applies to projects created from scratch.

Absolutely!

Segwit data is stored in the witness_len and witness_data fields of the bitcoin transaction and appears after the txout field. witness_data has a bunch of fields which you must validate according to https://bitcoincore.org/en/segwit_wallet_dev/ section "Transaction Serialization".

Also a segwit-ready block verifier must be able to recognize blocks with two transaction ID hashing, the first kinds is the legacy txid without the witness data, the second is the hash with the witness data and this kind of txID must not be recognized before the first block that signalled for segwit support. Also the older transaction ID hashing must not be recognized after Segwit's activation timeout due to rules in the BIPs responsible for these kind of activations.

This is one of the pitfalls of making your own validator, you have to take into account all the new transaction and block versions and systematically recognize or unrecognize them at specific block heights.

Thanks. I think I have kind of got the structure by using a bit of hack. I looked at the first byte, if its zero, its a segwith block, otherwise its an old block. Its ugly but encapsulated in a function that returns std::pair which ties to is_segwit, and input_count.

But then the data inside the witness blob is still opaque to me. I have dumped inside a vector field and put it back while serializing as is and call it done. I'll have to look into the documention while validating the transaction input having witness data/ txid. From first look into the document you shared, it seems like the witness blob for each input is composed of compactSize integer and then compactSize count of transaction ids. Now, I'll have to understand how to validate a transaction that contains witness-txid.
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
Keep in mind that re-implementations of the bitcoin protocol or block validator, other than Bitcoin Core, are not safe to use in production because they might have overlooked already resolved security vulnerabilities, and should only be used for learning purposes. This also applies to projects created from scratch.

SegWit was another trouble point. I had hard time finding beginner level document that explains how it was stored. At the moment here is my logic (simplified version as I have merged two functions here):

Segwit data is stored in the witness_len and witness_data fields of the bitcoin transaction and appears after the txout field. witness_data has a bunch of fields which you must validate according to https://bitcoincore.org/en/segwit_wallet_dev/ section "Transaction Serialization".

Also a segwit-ready block verifier must be able to recognize blocks with two transaction ID hashing, the first kinds is the legacy txid without the witness data, the second is the hash with the witness data and this kind of txID must not be recognized before the first block that signalled for segwit support. Also the older transaction ID hashing must not be recognized after Segwit's activation timeout due to rules in the BIPs responsible for these kind of activations.

This is one of the pitfalls of making your own validator, you have to take into account all the new transaction and block versions and systematically recognize or unrecognize them at specific block heights.
newbie
Activity: 7
Merit: 24
I am trying to learn bitcoin by creating a small library to validate the existing blocks from scratch using C++. I have made some progress and working to understand rest of the system. This post is not a question but the forum seems to have extremely experienced people on the topic-  so, sharing here to get some advice/ corrections along the way as I move forward. This is my first post here, if my post breaks any forum rule- please let me know so I can correct it.

So far, I am able to load and parse 2048 blocks have downloaded. It has first 1024 blocks (ending at block 00000000edfa5bfffd21cc8ce76e46b79dc00196e61cdc62fd595316136f8a83 ) and another 1024 blocks from last week (ending at block 0000000000000000000d06cb8554f862f69825a7994dab6161ec0970e35f463e). Now given the above two bloc ids, I can traverse through the 2048 blocks and hit genesis block for first iteration and 1024 block older one for the second iteration. I have verified that each numbers from the second block is being correctly parsed (compared with JSON data from blockchain.info  for the same block for verification).

The MerkleRoot calculation was a bit tricky (completely missed the double hash and was doing single hash and scratching my head for few hours) - but seems to be working now. And with reversing the next_block string I can find the next block id and load and and repeat - this was easier, I just looked into the value dumped in hex and realized zeroes are at the end Smiley.

SegWit was another trouble point. I had hard time finding beginner level document that explains how it was stored. At the moment here is my logic (simplified version as I have merged two functions here):

Code:
auto witness_count = read_var_int_hex(block_stream);
for (gsl::index i=0; i{
    auto witness_len = read_var_int_hex(block_stream);
    if (witness_len > 0)
    {
        read_hex(block_stream, witness_len, script_.data());
    }
    witness_list.push_back(witness);
}

return witness_list;

This seems to be working for 1024 blocks from last week. Please let me know if it looks correct or not.

With this, I think syntactical validation is now complete. I can tell, if a blockheader or transaction or entire block has exact values at right place given a block serialized in hex file.

To I want to move to next phase and validate logical rules. I have found some rules here: https://en.bitcoin.it/wiki/Protocol_rules

I plan to start by validating the block difficulty, then signature validation and then rest of the rules (script validation left for last step as I will need a small VM for that).

I am not sure how updated those rules are. I can always read the source code of bitcoin core, but I want to do in my way first instead of looking into it - the code seems bigger than my attention span. One good thing about this process is that I will probably never forget the structure now as I struggled through each data structure. But, with he JSON file from blockchaininfo, it is relatively straightforward to catch the error.

Once most of the logics are validated, I plan to create a small VM to execute the script code- its a stack based VM with limited types of operations and no jump instructions, so hoping it won't be that difficult.

I do not plan to implement the networking protocol of bitcoin. I am just assuming the blocks are ready to be parsed and validated starting from genesis blocks. And about that, I am thinking about how to efficiently order the blocks without double pass- traversing it all and finding the next blocks and then come back and validate the transactions.
Jump to: