NOTES:
Heigh and Version fields
Fields height and version are needed to identify Outputs if they are last spent in the Transaction outputs. In older revxx.dat files, only if Output was last unspent in a Transaction it's hight will be there together with it’s Transaction’s Version. If it's a regular Output then the height field will be 0 (00 HEX in Varint format ) and there will be no Version field
In newer revxx.dat files height is written for every Output and the Version field is 0 (00 in HEX) to be backward compatible. I did not check the revxx.dat files myself if it’s so, so please test this note first.
Calculation of Height from the Field Value:
To compress the Amount of Statoshis this field uses the following logic:
If the amount is 0, output 0
First, divide the amount (in base units) by the largest power of 10 possible; call the exponent e (e is max 9)
if e < 9, the last digit of the resulting number cannot be 0; store it as d, and drop it (divide by 10)
call the result n
output 1 + 10*(9*n + d - 1) + e
if e==9, we only know the resulting number is not zero, so output 1 + 10*(n - 1) + 9
(this is decodable, as d is in [1-9] and e is in [0-9])
Then store it as a VARINT
nValue field(Amount)
To compress the Amount of Statoshis this field uses the following logic:
If the amount is 0, output 0
First, divide the amount (in base units) by the largest power of 10 possible; call the exponent e (e is max 9)
if e < 9, the last digit of the resulting number cannot be 0; store it as d, and drop it (divide by 10)
call the result n
output 1 + 10\*(9\*n + d - 1) + e
if e==9, we only know the resulting number is not zero, so output 1 + 10\*(n - 1) + 9
(this is decodable, as d is in [1-9] and e is in [0-9])
Then store it as a VARINT
Calculation of nValue field(Amount):
To decode this amount go through the operation backward:
First, decode it from VARINT. Then use the following procedure(taken directly from the Bitcoin core implementation):
uint64\_t DecompressAmount(uint64\_t x)
{
// x = 0 OR x = 1+10\*(9\*n + d - 1) + e OR x = 1+10\*(n - 1) + 9
if (x == 0)
return 0;
x--;
// x = 10\*(9\*n + d - 1) + e
int e = x % 10;
x /= 10;
uint64\_t n = 0;
if (e < 9) {
// x = 9\*n + d - 1
int d = (x % 9) + 1;
x /= 9;
// x = n
n = x\*10 + d;
} else {
n = x+1;
}
while (e) {
n \*= 10;
e--;
}
return n;
}
ScriptPubKey size and SriptPubKey fields
Current Bitcoin serialization is using a compressed format to store common types of scripts:
- Pay to Pubkey Hash (P2PKH) is encoded to – 20 bytes
- Pay to Script Hash (P2SH) is encoded to – 20 bytes
- Pay to Pubkey (P2PK) is encoded to – 32 bytes
Other script types are serialized using the general rule shown in the table:
- scriptPubKey size
- scriptPubKey data
Where script size is a VARINT field of len(scriptPubKey data) + number of special cases, where the number of special cases is int(6), then goes the regular scriptPubKey data.
The compression algorithm is removing all OP\_CODE values from the common types of scripts and leaves only the Key, KeyHash, or ScriptHash intact. Then it is using the
https://i.stack.imgur.com/NfiEs.png
P2Pk cases: 0x02, 0x03 identify compressed Public Key and show how which Y coordinate to use to reconstruct the full key. 0x04, 0x05 identify uncompressed Public Key, but the value in the
For other script types be sure to keep in mind that
A small addition to the script compression, while you be looking through the Bitcoin core compress.cpp/.h files you may find that during compression either 20 bytes or 33/65 bytes are being manipulated that are not identified as any opcodes.
That’s because of in bitcoin scripting if a field(4-bit hex value) is smaller than OP_PUSHDATA1 = 0x4c=76 is interpreted as an instruction to push the following number of bytes (the exact value of this field) onto the stack – this is called an immediate push.
As I’m not that good in modern C++, it was much easier for me to follow the compression algorithm when it was done in Golang:
https://github.com/btcsuite/btcd/blob/master/blockchain/compress.go
and in VarInt logic in Python:
https://github.com/christianb93/bitcoin/blob/1241778dc51bd39022611916f90f1bfaf07ee331/btc/serialize.py#L54
Greg did a really really nice explanation of Var Int, that I found so easy to follow. Thank you.
https://learnmeabitcoin.com/guide/varint
Yes, everything is in the code of original BitcoinCore implementation, I think maintainers and developers are doing epic job to keep it nice and clear. It’s because I was lacking some C++ skills I could not understand it properly from the beginning and had to spend some time on it. I just hope my notes will help someone to speed their learning process. Good luck.