I am trying to learn bitcoin by creating a small library to validate the existing blocks from scratch using C++. I have made some progress and working to understand rest of the system. This post is not a question but the forum seems to have extremely experienced people on the topic- so, sharing here to get some advice/ corrections along the way as I move forward. This is my first post here, if my post breaks any forum rule- please let me know so I can correct it.
So far, I am able to load and parse 2048 blocks have downloaded. It has first 1024 blocks (ending at block 00000000edfa5bfffd21cc8ce76e46b79dc00196e61cdc62fd595316136f8a83 ) and another 1024 blocks from last week (ending at block 0000000000000000000d06cb8554f862f69825a7994dab6161ec0970e35f463e). Now given the above two bloc ids, I can traverse through the 2048 blocks and hit genesis block for first iteration and 1024 block older one for the second iteration. I have verified that each numbers from the second block is being correctly parsed (compared with JSON data from blockchain.info for the same block for verification).
The MerkleRoot calculation was a bit tricky (completely missed the double hash and was doing single hash and scratching my head for few hours) - but seems to be working now. And with reversing the next_block string I can find the next block id and load and and repeat - this was easier, I just looked into the value dumped in hex and realized zeroes are at the end
.
SegWit was another trouble point. I had hard time finding beginner level document that explains how it was stored. At the moment here is my logic (simplified version as I have merged two functions here):
auto witness_count = read_var_int_hex(block_stream);
for (gsl::index i=0; i{
auto witness_len = read_var_int_hex(block_stream);
if (witness_len > 0)
{
read_hex(block_stream, witness_len, script_.data());
}
witness_list.push_back(witness);
}
return witness_list;
This seems to be working for 1024 blocks from last week. Please let me know if it looks correct or not.
With this, I think syntactical validation is now complete. I can tell, if a blockheader or transaction or entire block has exact values at right place given a block serialized in hex file.
To I want to move to next phase and validate logical rules. I have found some rules here:
https://en.bitcoin.it/wiki/Protocol_rulesI plan to start by validating the block difficulty, then signature validation and then rest of the rules (script validation left for last step as I will need a small VM for that).
I am not sure how updated those rules are. I can always read the source code of bitcoin core, but I want to do in my way first instead of looking into it - the code seems bigger than my attention span. One good thing about this process is that I will probably never forget the structure now as I struggled through each data structure. But, with he JSON file from blockchaininfo, it is relatively straightforward to catch the error.
Once most of the logics are validated, I plan to create a small VM to execute the script code- its a stack based VM with limited types of operations and no jump instructions, so hoping it won't be that difficult.
I do not plan to implement the networking protocol of bitcoin. I am just assuming the blocks are ready to be parsed and validated starting from genesis blocks. And about that, I am thinking about how to efficiently order the blocks without double pass- traversing it all and finding the next blocks and then come back and validate the transactions.