Author

Topic: Did you know bitcoin uses 6 different ways to represent integers (Read 411 times)

brand new
Activity: 0
Merit: 0
How you store data is completely independent of the consensus rules (and behaviour of other nodes).

You can cut about 25% off the storage space by switching to a more efficient transaction serialization and no other design or layering or consensus changes. You don't need anyone else's permission or coordination, you can just do it.

People have implemented this previously (e.g. blockstream did so) but apparently haven't considered it worth the effort to make it production ready and submit it to the project.  At the end of the day it's only a small constant factor reduction and the blockchain keeps growing at a considerable rate.

Trying to foist those considerations onto consensus rules would both be bad engineering and politically foolish. Concerns which can be separated should be, work on consensus logic is hard enough without hyper-milling the storage and transmission formats.

Consensus only cares about how data is presented to hash functions and hash functions are so fast (like <167 picoseconds per byte for large inputs with sha-ni on a 3GHz quad core) that the serialization into hashes isn't a particularly important optimization target.
staff
Activity: 3458
Merit: 6793
Just writing some code
It's important to note that while you can choose to use different encodings for P2P and storage, consensus does require the transactions be serialized in a specific way for hashing and for weight calculations. So while you could serialize blocks and transactions using more compact integer encodings, you still need to be able to serialize everything correctly for txid computation and weight calculation.

2. Variable length big endian
This is only used for bigger values in signatures (R and S) using signed notation (most significant bit indicates sign) and public keys (X and Y coordinate). They are always preceded by their length using StackInt (scripts) or CompactInt (witnesses) format.

4. DerInt
(Not an official name) This method is used in DER encoding and can encode from 0 to 21008 length integers. This is only used for encoding signatures (only uses up to 33 bytes lengths). It indicates the length part in DER's Tag-Length-Value encoding scheme.
With Schnorr signatures, we're finally getting rid of these. The signatures will be the 64 byte "compact" signature. The R and s values will be a fixed 32 bytes and none of the DER stuff surrounding it. Also the sighash byte can be ommitted for SIGHASH_ALL (which is the sighash type used by almost every transaction).
legendary
Activity: 1135
Merit: 1166
Block size changes a very painful question as we know, I think that it is incorrect to compare such serious changes as the block size and a simple change in the encoding integers  that do not make any actual changes to consensus.

Yes exactly, that's what I wanted to express.  As gmaxwell stated, implementing an optimisation in the serialisation format of transactions is something that can be done purely locally (for disk space savings) or on the P2P level (for bandwidth).  It need not be done at all on the consensus level, so there's no need to wait for the next softfork with it.

Only if you want to also increase tx throughput would you need to affect consensus at all - but in that case, I think those two changes should be done independently.  I.e. first optimise storage/bandwidth without a fork, and then (if you can find agreement) argue for a pure block-size increase of the corresponding amount.
member
Activity: 148
Merit: 45
https://bitaps.com/

And if you want to also increase transaction throughput (which I think would certainly be a contentious issue, even if we did optimisations like you suggest at the same time), then I think the cleanest way to do it would be to simply suggest an increase in the official, consensus-level (i.e. uncompressed) block size - once the optimisations have been agreed upon and rolled out for disk storage and P2P networking already.

I do not think this change is important, but if in the case of any future soft fork comes in and changes the structure of the transaction, it can be used as a free bonus to important changes.

Block size changes a very painful question as we know, I think that it is incorrect to compare such serious changes as the block size and a simple change in the encoding integers  that do not make any actual changes to consensus.
legendary
Activity: 1135
Merit: 1166
How you store data is completely independent of the consensus rules (and behaviour of other nodes).

You can cut about 25% off the storage space by switching to a more efficient transaction serialization and no other design or layering or consensus changes. You don't need anyone else's permission or coordination, you can just do it.

People have implemented this previously (e.g. blockstream did so) but apparently haven't considered it worth the effort to make it production ready and submit it to the project.  At the end of the day it's only a small constant factor reduction and the blockchain keeps growing at a considerable rate.

Trying to foist those considerations onto consensus rules would both be bad engineering and politically foolish. Concerns which can be separated should be, work on consensus logic is hard enough without hyper-milling the storage and transmission formats.

Consensus only cares about how data is presented to hash functions and hash functions are so fast (like <167 picoseconds per byte for large inputs with sha-ni on a 3GHz quad core) that the serialization into hashes isn't a particularly important optimization target.

Yes, but compressing blockchain this not only about storing data on node also it increase capacity of block and reduce bandwidth consumption. 6% of block size will give about +150 transaction per block. On 2 segwit soft fork was changed transaction format and in case we apply simple replacement for var_int we will got +150 additional tx per block.

Saving bandwidth is also something that can be done without affecting the consensus rules - for that, you only need to update the P2P protocol version.  Obviously, then you can't just do it privately, but at least it can be done without any kind of fork.

And if you want to also increase transaction throughput (which I think would certainly be a contentious issue, even if we did optimisations like you suggest at the same time), then I think the cleanest way to do it would be to simply suggest an increase in the official, consensus-level (i.e. uncompressed) block size - once the optimisations have been agreed upon and rolled out for disk storage and P2P networking already.
member
Activity: 148
Merit: 45
https://bitaps.com/
How you store data is completely independent of the consensus rules (and behaviour of other nodes).

You can cut about 25% off the storage space by switching to a more efficient transaction serialization and no other design or layering or consensus changes. You don't need anyone else's permission or coordination, you can just do it.

People have implemented this previously (e.g. blockstream did so) but apparently haven't considered it worth the effort to make it production ready and submit it to the project.  At the end of the day it's only a small constant factor reduction and the blockchain keeps growing at a considerable rate.

Trying to foist those considerations onto consensus rules would both be bad engineering and politically foolish. Concerns which can be separated should be, work on consensus logic is hard enough without hyper-milling the storage and transmission formats.

Consensus only cares about how data is presented to hash functions and hash functions are so fast (like <167 picoseconds per byte for large inputs with sha-ni on a 3GHz quad core) that the serialization into hashes isn't a particularly important optimization target.

Yes, but compressing blockchain this not only about storing data on node also it increase capacity of block and reduce bandwidth consumption. 6% of block size will give about +150 transaction per block. On 2 segwit soft fork was changed transaction format and in case we apply simple replacement for var_int we will got +150 additional tx per block.
staff
Activity: 4326
Merit: 8951
How you store data is completely independent of the consensus rules (and behaviour of other nodes).

You can cut about 25% off the storage space by switching to a more efficient transaction serialization and no other design or layering or consensus changes. You don't need anyone else's permission or coordination, you can just do it.

People have implemented this previously (e.g. blockstream did so) but apparently haven't considered it worth the effort to make it production ready and submit it to the project.  At the end of the day it's only a small constant factor reduction and the blockchain keeps growing at a considerable rate.

Trying to foist those considerations onto consensus rules would both be bad engineering and politically foolish. Concerns which can be separated should be, work on consensus logic is hard enough without hyper-milling the storage and transmission formats.

Consensus only cares about how data is presented to hash functions and hash functions are so fast (like <167 picoseconds per byte for large inputs with sha-ni on a 3GHz quad core) that the serialization into hashes isn't a particularly important optimization target.
member
Activity: 148
Merit: 45
https://bitaps.com/
3 years ago!!
It could be implemented in segwit softfork new transaction format that was 2 years ago.
I think we should try to force this small improvements to bitcoin developers in case any future softfork.
legendary
Activity: 1042
Merit: 2805
Bitcoin and C♯ Enthusiast
Replacement CompactInt to more efficient integer compression algorithm in combination with store output value as compressed integer will reduce blockchain size to 6% about (15 GB)

There are lots of things that could be done to compress the blockchain.
I had a similar idea with CompactInts: https://bitcointalksearch.org/topic/increase-block-size-5-by-using-compactsize-instead-of-uint-1700405
And My naive implementation 3 years ago: https://github.com/Coding-Enthusiast/BitcoinBlockSizeChange
member
Activity: 148
Merit: 45
https://bitaps.com/
Replacement CompactInt to more efficient integer compression algorithm in combination with store output value as compressed integer will reduce blockchain size to 6% about (15 GB)

This is example of compression algorithm with unary prefix means number of used bytes for value (more flexible and effective than bitcoin CompactInt(var_int)


Code:
def int_to_c_int(n, base_bytes=1):
    """
    Convert integer to compressed integer
    :param n: integer.
    :param base_bytes: len of bytes base from which start compression.
    :return: bytes.
    """
    if n == 0:
        return b'\x00' * base_bytes
    else:
        l = n.bit_length() + 1
    if l <= base_bytes * 8:
        return n.to_bytes(base_bytes, byteorder="big")
    prefix = 0
    payload_bytes = ceil((l)/8) - base_bytes
    a=payload_bytes
    while True:
        add_bytes = floor((a) / 8)
        a = add_bytes
        if add_bytes>=1:
            add_bytes+=floor((payload_bytes+add_bytes) / 8) - floor((payload_bytes) / 8)
            payload_bytes+=add_bytes
        if a==0: break
    extra_bytes = int(ceil((l+payload_bytes)/8) - base_bytes)
    for i in range(extra_bytes):
        prefix += 2 ** i
    if l < base_bytes * 8:
        l = base_bytes * 8
    prefix = prefix << l
    if prefix.bit_length() % 8:
        prefix = prefix << 8 - prefix.bit_length() % 8
    n ^= prefix
    return n.to_bytes(ceil(n.bit_length() / 8), byteorder="big")
legendary
Activity: 1042
Merit: 2805
Bitcoin and C♯ Enthusiast
As you may know there are many numeric values used in the bitcoin blockchain, each representing a different thing: version, script lengths, locktime,... and since a block is just a sequence of octets (bytes) that is transferred between nodes and stored on disk, we have to convert these integer values into octet strings (byte arrays) and back.
But what you may not have ever noticed is that in bitcoin, depending on what that integer represents a different approach is chosen for its conversion to bytes, resulting in 6 different ways of encoding integers!

The following is an example transaction from BIP-143 containing 5 out of 6 methods listed below with each one highlighted according their type:

01000000000102fff7f7881a8099afa6940d42d1e7f6362bec38171ea3edf433541db4e4ad969f
00000000494830450221008b9d1dc26ba6a9cb62127b02742fa9d754cd3bebf337f7a55d114c8e
5cdd30be
022040529b194ba3f9281a99f2b1c0a19c0489bc22ede944ccf4ecbab4cc618ef3ed01
eeffffffef51e1b804cc89d182d279655c3aa89e815b1b309fe287d9b2b55d57b90ec68a010000
00
00ffffffff02202cb206000000001976a9148280b37df378db99f66f85c95a783a76ac7a6d59
88ac9093510d000000001976a9143bde42dbee7e4dbe6a21b2d50ce2f0167faa815988ac000247
304402203609e17b84f6a7d30c80bfa610b5b4542f32a8a0d5447a12fb1366d7f01cc44a022057
3a954c4518331561406f90300e8f3358f51928d43c212a8caed02de67eebee
0121025476c2e831
88368da1ff3e292e7acafcdb3566bb0ad253f62fc70f07aeee6357
11000000





1. Fixed length little endian
This is the easiest and most common way. It is used for block/transaction version, block time, block target, block nonce, TxIn.Outpoint.Index, TxIn.Sequence and TxOut.Amount and locktime.
The integer will be converted to a little endian byte array of fixed 4 bytes length, with the exception of TxOut.Amount which is 8 bytes.

2. Variable length big endian
This is only used for bigger values in signatures (R and S) using signed notation (most significant bit indicates sign) and public keys (X and Y coordinate). They are always preceded by their length using StackInt (scripts) or CompactInt (witnesses) format.

3. CompactInt
This is a special format used in bitcoin only that can encode from 0 to 264-1 values. It is used for script lengths, input/output count, witness item count and witness item length.

4. DerInt
(Not an official name) This method is used in DER encoding and can encode from 0 to 21008 length integers. This is only used for encoding signatures (only uses up to 33 bytes lengths). It indicates the length part in DER's Tag-Length-Value encoding scheme.

5. StackInt
(Not an official name) This method is used in bitcoin only to indicate length of the data that is supposed to be pushed onto the stack. It can encode from 0 to 232-1 values.

6. Short form integers inside scripts
In bitcoin script language, the stack is an array of bytes. Sometimes these bytes could be interpreted as integers, or an integer could be pushed to the stack. To do that a special format is used:
if there is an OP code for the value (OP_0, OP_NegativeOne,...) that single byte is used.
if there is no OP code, integer is converted to byte array in little endian order in shortest form (no extra zeros) and sign is determined based on most significant bit.

Example: How does 254 (0b11111110) look like in each encoding?
Spoiler (select/highlight to see text):
1. 0xfe000000
3. 0xfdfe00
4. 0x81fe
5. 0x4cfe
6. 0xfe00
Jump to: