Where's the transaction Merkle Tree stored?

Shymaa-Arafat

full member

Activity: 228

Merit: 156

A similar Q & A recently appeared in Bitcoin stack-exchange
(to put it all together for the interested reader)

https://bitcoin.stackexchange.com/questions/110982/merkle-tree-recalculation-during-transaction-lookup

Quote from: achow101 on September 26, 2021, 01:41:44 PM

Merkle trees are not stored because there is no need to store them. BIP 37 is the only protocol that sends merkle branches, and that is in the process of being deprecated. At this point in time, it doesn't make sense to optimize for a protocol that is going out of use and is already disabled by default.

In order to serve a BIP 37 client, the full nodes needs to iterate all of the transactions in the block to see if they match the client's filter. While doing so, they can compute the merkle branch on the fly for each transaction. It's not particularly hard nor computationally expensive. In fact, it is possible that computing the merkle branch is faster than it would be to read data from disk. Disk I/O tends to be extraordinarily slow.

NotATether

legendary

Activity: 1568

Merit: 6660

bitcoincleanup.com / bitmixlist.org

Quote from: Shymaa-Arafat on September 26, 2021, 02:45:07 PM

It's not that much of space to be stored in disk (won't exceed 128K like on top of block Txs).
However, if BIP37 and SPV asking full nodes for proves is a decaying process that's about to be replaced then maybe u r right no need for that.

But the space saving in the array storage is well justified in Utreexo or any UTXOS trees

Each of the leaf pairs of a merkle branch are combined/hashed together to get the hash for the parent merkle branch (if there is only one leaf then it is duplicated for the purposes of pair combining).

Where the leaves here are just transaction hashes.

The branch pairs get hashed again, reducing the number of hashes by half each time,!until you are left with just the merkle root.

In this sense, the merkle root is cryptographic proof thst represents the branch (as in practice, each merkle root will have only one tree that makes it).

Shymaa-Arafat

full member

Activity: 228

Merit: 156

It's not that much of space to be stored in disk (won't exceed 128K like on top of block Txs).
However, if BIP37 and SPV asking full nodes for proves is a decaying process that's about to be replaced then maybe u r right no need for that.

But the space saving in the array storage is well justified in Utreexo or any UTXOS trees

achow101

staff

Activity: 3458

Merit: 6793

Just writing some code

Merkle trees are not stored because there is no need to store them. BIP 37 is the only protocol that sends merkle branches, and that is in the process of being deprecated. At this point in time, it doesn't make sense to optimize for a protocol that is going out of use and is already disabled by default.

In order to serve a BIP 37 client, the full nodes needs to iterate all of the transactions in the block to see if they match the client's filter. While doing so, they can compute the merkle branch on the fly for each transaction. It's not particularly hard nor computationally expensive. In fact, it is possible that computing the merkle branch is faster than it would be to read data from disk. Disk I/O tends to be extraordinarily slow.

Shymaa-Arafat

full member

Activity: 228

Merit: 156

Quote from: garlonicon on September 26, 2021, 11:04:15 AM

Quote

But that's for miners when building the block, nothing assures it is saved afterwards, or that the same is done by full nodes

Because there is no need to store that data. If full node has a list of transaction hashes for some block, it is possible to reconstruct merkle tree in memory quite fast. All you need is a list of transaction hashes, based on that you can build a merkle tree when it is needed and verify any SPV proof. Typical PC can compute thousands of hashes per second, the minimum difficulty assumes 32 leading zeros in SHA256d per ten minutes, if you have a block with two, three or four thousand transactions, you can quickly compute all needed hashes.

But u don't compute it just once, u don't receive only 1 SPV request. Now I recall a student asking Tadje Dyraj in SPV lectures why should full nodes answer to these requests, it is true his answer was 3/2018 and many things have changed including speed, but he said answering consumes I think about 30% of his processing time.

Now why not save this time with just a minor extra storage overhead?If u saved pointers and used the already stored TX hashes, u r only adding a space overhead of N-1 hashes (I don't think there is a block with more than 4k TXs * 32byte = ~128k byte per block, u can keep them for just a margin window of 2-3 or max 6 blocks; never exceeds 1Mbyte)

In UTXOS Merkle for Stateless or lightning nodes I suggested saving space because it's about 600MB (and I'm still saving the pointers space here), but here when the space is just 128KB or 1MB at max why not trade it with time?

garlonicon

copper member

Activity: 944

Merit: 2257

Quote

But that's for miners when building the block, nothing assures it is saved afterwards, or that the same is done by full nodes

Because there is no need to store that data. If full node has a list of transaction hashes for some block, it is possible to reconstruct merkle tree in memory quite fast. All you need is a list of transaction hashes, based on that you can build a merkle tree when it is needed and verify any SPV proof. Typical PC can compute thousands of hashes per second, the minimum difficulty assumes 32 leading zeros in SHA256d per ten minutes, if you have a block with two, three or four thousand transactions, you can quickly compute all needed hashes.

Shymaa-Arafat

full member

Activity: 228

Merit: 156

From here
https://en.bitcoin.it/wiki/Getblocktemplate
The Merkle hashes is a compact 1D array representation of the Merkle Tree

Quote

import hashlib
def dblsha(data):
return hashlib.sha256(hashlib.sha256(data).digest()).digest()

txnlist = [coinbase] + [binascii.a2b_hex(a['data']) for a in template['transactions']]
merklehashes = [dblsha(t) for t in txnlist]
while len(merklehashes) > 1:
if len(merklehashes) % 2:
merklehashes.append(merklehashes[-1])
merklehashes = [dblsha(merklehashes + merklehashes[i + 1]) for i in range(0, len(merklehashes), 2)]
merkleroot = merklehashes[0]

But that's for miners when building the block, nothing assures it is saved afterwards, or that the same is done by full nodes

Shymaa-Arafat

full member

Activity: 228

Merit: 156

I did read the BIP, what I understand is the proof ( not the tree) gets computed in demand from an existing tree.

Quote

Thus, a merkleblock message is a block header, plus a part of a merkle tree which can be used to extract identifying information for transactions that matched the filter and prove that the matching transaction data really did appear in the solved block. Clients can use this data to be sure that the remote node is not feeding them fake transactions that never appeared in a real block, although lying through omission is still possible.

Also it contains a description of how the proof is computed by traversing the tree

Quote

Constructing a partial merkle tree object
Traverse the merkle tree from the root down, and for each encountered node:
Check whether this node corresponds to a leaf node (transaction) that is to be included OR any parent thereof:
.
.
.

This talks about an already existing tree

I found it
from this reference
https://www.oreilly.com/library/view/mastering-bitcoin/9781491902639/ch07.html
& tracing the push_back function in C++, it's used in Vector data structure.
I knew before that TXIDs are stored as a vector

Quote

std::vector vtx;

https://github.com/bitcoin/bitcoin/blob/7fcf53f7b4524572d1d0c9a5fdc388e87eb02416/src/primitives/block.h#L66

But I didn't knew it's the Merkle Tree; ie the compact array storage (with no pointers) is already used to store the Bitcoin Merkle Tree
It seems that is what is called serialization

pooya87

legendary

Activity: 3472

Merit: 10611

Quote from: Shymaa-Arafat on September 24, 2021, 01:09:55 AM

So there is a Merkle Tree stored in this case? Where it is stored?

No, it is still not stored. It is computed on demand based on what the other peer asked for with the filters it has set. Read the BIP for details.

Shymaa-Arafat

full member

Activity: 228

Merit: 156

Quote from: pooya87 on September 23, 2021, 11:07:01 PM

Quote

There are no messages to get some part of a merkle tree at protocol level (between SPV server and full node).

BIP-37 is doing this. A full node that has NODE_BLOOM flag can supply merkleblock messages where it just sends the header plus part of the merkle tree.

So there is a Merkle Tree stored in this case? Where it is stored?

pooya87

legendary

Activity: 3472

Merit: 10611

Quote from: vjudeu on September 23, 2021, 05:44:28 AM

SPV clients are connected with SPV servers, not with full nodes.

It doesn't have to. It is just easier to have a full node add an additional index to their database (ie. blockchain) so that it can be searched quickly. Otherwise there can be a SPV client that connects to [regular] full nodes and gets its information that way.

Of course there are multiple different implementations, mostly like Electrum rely on an "indexed full node", some depend on a single centralized server, some rely on [regular] full nodes.

Quote

There are no messages to get some part of a merkle tree at protocol level (between SPV server and full node).

BIP-37 is doing this. A full node that has NODE_BLOOM flag can supply merkleblock messages where it just sends the header plus part of the merkle tree.

vjudeu

copper member

Activity: 909

Merit: 2301

Quote

Sorry this one refers to script witness not to a Merkle proof, but I think the first that replies to get data does refer to the Merkle proof

Quote

As you can see, there are no messages related to SPV proofs, they are available only between SPV clients and SPV servers, full nodes know nothing about them.

Shymaa-Arafat

full member

Activity: 228

Merit: 156

In the link I found

tx_witness[] A list of witnesses, one for each

It's in tx that is sent in reply to get data

There is also a witness field in the object type of Inventory vectors
MSG_WITNESS_TX
that says in the table see BIP144 for more details
https://github.com/bitcoin/bips/blob/master/bip-0144.mediawiki
Sorry this one refers to script witness not to a Merkle proof, but I think the first that replies to get data does refer to the Merkle proof

vjudeu

copper member

Activity: 909

Merit: 2301

Quote

I mean when an SPV contacts full nodes

SPV clients are connected with SPV servers, not with full nodes.

Quote

It's not logic that they rebuild the Merkle Tree each time?

SPV servers need all data, so there is no case where they ask about some part of merkle tree. There are only messages to get block header, to get transaction hashes stored in a block, and to get the whole transactions. There are no messages to get some part of a merkle tree at protocol level (between SPV server and full node).

Edit: More information: https://en.bitcoin.it/wiki/Protocol_documentation#Message_types. As you can see, there are no messages related to SPV proofs, they are available only between SPV clients and SPV servers, full nodes know nothing about them.

Shymaa-Arafat

full member

Activity: 228

Merit: 156

I mean when an SPV contacts full nodes to verify their transactions, how do full nodes respond to them with the proof? It's not logic that they rebuild the Merkle Tree each time?

vjudeu

copper member

Activity: 909

Merit: 2301

Quote

I was told that full nodes don't store such a Merkle Tree; then where it is stored?

Nowhere, it is computed when it is needed.

Quote

I mean it must be stored somewhere to answer say SPV & alike trying to check their coins?

You don't need to store merkle tree if you have all data used to construct it. The whole blocks are stored, there are just block headers and transactions, nothing else is needed. If you have all transactions in a block, then you can hash each transaction and build merkle tree for that block, then you can check that merkle hash is correct. Full nodes don't need SPV features, because they have all data. Also, if some node is pruned, then still, it has N last blocks with all transactions, there is never a case where full node (pruned or not) has some "partial" block: for each block there are only headers or headers and all transactions.

Edit: Also note that SPV clients are indirectly connected with full nodes: there are for example Electrum servers and they require some level of trust, because they just tell you "this address has this balance, these transaction outputs are connected with this address", and so on. SPV clients just ask SPV servers for data, not full nodes. Only SPV servers talk to full nodes, and that servers need all data, so there is no case where full node need some part of a merkle tree to be stored permanently.

Shymaa-Arafat

full member

Activity: 228

Merit: 156

Thru my post
https://bitcointalksearch.org/topic/merkle-tree-with-no-pointers-5360009

(& Although it was originally for Utreexo Pollard CSN)I suggested one could examine the possibility of applying this to the block Transaction Merkle Tree too.

-I was told that full nodes don't store such a Merkle Tree; then where it is stored?

I mean it must be stored somewhere to answer say SPV & alike trying to check their coins?

Also to be sure I got it right, CSN stands for Client Side Chain?

Topic: Where's the transaction Merkle Tree stored? (Read 212 times)