Author

Topic: Merkle Tree with no pointers (Read 208 times)

legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
October 06, 2021, 09:10:12 AM
#9
In fact, from this reference
https://www.oreilly.com/library/view/mastering-bitcoin/9781491902639/ch07.html
& tracing the push_back function in C++, it's used in Vector data structure. I knew before that TXIDs are stored as a vector
std::vector vtx;
https://github.com/bitcoin/bitcoin/blob/7fcf53f7b4524572d1d0c9a5fdc388e87eb02416/src/primitives/block.h#L66

But I didn't knew it's the Merkle Tree; ie the compact array storage is already used to store the Bitcoin Merkle Tree
It seems that is what is called serialization

On a related note, this isn't really serialization - that would be if you compact the data representation to a string or byte sequence that can be sent over the network or to a file. A std::vector data structure cannot be sent over the network without conversion to string, and even then that would depend on whether the element type - CTransactionRef - is serializable.
full member
Activity: 228
Merit: 156
October 06, 2021, 08:52:14 AM
#8
 I think I should post parts of the Utreexo team reply, especially Tadje Dyraj, and you can read the whole thing in here
https://github.com/mit-dci/utreexo/discussions/316
( The bottom line, their forest in the bridge node does use the compact format with no pointers; he was talking about Pollard nodes which stores only a portion ie not a complete tree. Also maps is used as dictionary bet 2 sorting orders of UTXOS hashes & birth)

Quote
@Shymaa-Arafat, you've mostly posted about storing complete trees. We already have a implementation for that, called forest in the code, which doesn't use any pointers.

The positionMap in the forest struct is there to find a position within the forest given an outpoint hash.

(One downside of inserting elements in the order they show up in the blocks is that we need an additional lookup table to find them, but only the bridge node needs that.)

I really don't know why didn't they just say that from the beginning. Anyways, I had to deliver the info since I discussed it here too
full member
Activity: 228
Merit: 156
September 24, 2021, 06:21:47 AM
#7
In fact, from this reference
https://www.oreilly.com/library/view/mastering-bitcoin/9781491902639/ch07.html
& tracing the push_back function in C++, it's used in Vector data structure. I knew before that TXIDs are stored as a vector
std::vector vtx;
https://github.com/bitcoin/bitcoin/blob/7fcf53f7b4524572d1d0c9a5fdc388e87eb02416/src/primitives/block.h#L66

But I didn't knew it's the Merkle Tree; ie the compact array storage is already used to store the Bitcoin Merkle Tree
It seems that is what is called serialization
full member
Activity: 228
Merit: 156
September 19, 2021, 04:28:37 AM
#6
There is a clever way to store a binary tree in an array. Nodes are stored in breadth-first order beginning with index 1. (index 0 is unused).

Code:
           1
       2          3
    4    5     6     7
  8  9 10 11 12 13 14 15
...

This scheme enables fast traversal of the tree:

Indexes of nodes at depth D (root is 0) are 2D to 2D+1-1.
The parent of a node at index N is at index N/2.
The indexes of the children of a node at index N are 2N and 2N+1.


You probably mean the 1D array presentation I mentioned and said it's known in the literature for so long, maybe I forgot to add the links to MIT Algorithms lecture 8 & the screenshots in this post but they're in the Utreexo GitHub discussion
https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-006-introduction-to-algorithms-spring-2020/lecture-notes/index.htm
min 15 I think in this video
https://youtu.be/Xnpo1atN-Iw


(I have a browser problem in adding images here)
https://user-images.githubusercontent.com/83260375/133132854-bb10362f-7c82-4b14-9341-b20b1e01a9ec.jpg
https://user-images.githubusercontent.com/83260375/133132783-1c2a8f66-4950-44f7-98f9-234759b6605b.jpg

Anyways, we kind of stand on the same ground; ie no need to store pointers we can save this storage space.
It's just that I think one row for each tree level would be easier to handle as we have kind of an extra un-ordinary tree traversal needed to fetch the proof thru sibling.
"For me", not necessarily for everyone else, it's easier this way to write the math indexing and the merge or split of full trees when insert or delete in the Utreexo forest design
legendary
Activity: 4522
Merit: 3426
September 19, 2021, 04:01:22 AM
#5
There is a clever way to store a binary tree in an array. Nodes are stored in breadth-first order beginning with index 1. (index 0 is unused).

Code:
            1
       2          3
    4    5     6     7
  8  9 10 11 12 13 14 15
...

This scheme enables fast traversal of the tree:

Indexes of nodes at depth D (root is 0) are 2D to 2D+1-1.
The parent of a node at index N is at index N/2.
The indexes of the children of a node at index N are 2N and 2N+1.
full member
Activity: 228
Merit: 156
September 17, 2021, 05:47:41 AM
#4
I may add that 2D arrays are eventually stored in RAM as 1D array (contiguous physical locations) in row major order. So whether we should use this 2D representation or the classic 1D described in Algorithms & data structures literature depends on whether we prefer more locality of reference for breadth (my 2D) or depth (traditional 1D) traversal.
I think we use both, but we prefer to benefit from UTXOS (leaves) stored contiguous.
Anyways, that's a detail decision both don't need pointers.
.
Also, to be honest I got mixed up about where the TX Merkle Tree is stored. I know full nodes don't need it as they have the full blocks with TXs in it, but then who send the proofs to SPV? whomever does that must keep it & can benefit from the space reduction of moving the pointers
full member
Activity: 228
Merit: 156
September 14, 2021, 08:37:05 PM
#3
Bitcoin nodes do not store merkle trees, so there is nothing to save.

The particular layout you suggest has poor locality.  E.g. the first node above 0 is at 8 instead of being as soon as it could possibly be, at 2. Even in ram access locality matters. (at least to the extent that performance matters at all)

It also only works for trees which are fully populated at every level-- ones where the size is a power of two-- or otherwise it requires padding up to the next power of two size which wastes a *lot* of space.

First, thank you Sir for replying:

1-I was not sure about Transactions Merkle Trees, that's why I asked at the last paragraph; if not stored for each block, then no use in TX Merkle Tree. As I said, I originally suggested it for the Utreexo design.
-but in any case, where is the transaction Merkle Tree is stored?since I think the number of transactions do not deviate too much from powers of 2. With the current average it could be 2¹¹ or 2¹¹ + 2⁹, from what I understand the numbers will increase after applying Schnorr signature but still near powers of2; and even easier to handle than UTXOS as it's not dynamic.

2-The Utreexo design works by keeping logN roots of only complete full binary trees (a forest) to make sure there's at max 2logN nodes in any proof: all the roots + the regular proof from the tree the UTXO in
(it's like a binary representation of a number, u can always partition a number to a summation of powers of2. So, whatever way he is storing the trees in his forest they are always full binary trees.

3-I don't understand what u meant about the example; if u mean it's not the typical way they compact a tree to 1D array in data structures literature, I made it this way to visualize it more easy; when u insert or delete u need to look the true parent (for 0&1 it's 8), while in the proof fetching which I wrote a Pseudocode u need 9 for 0&1 (not 8), and 8 for 2&3.
They did say that was one of their problems in addition to space saving (if no care for space saving, they would have stored 3pointers :2 to children & 1 to aunt or niece)
So this way they can reach parent/aunt/sibling/... whatever they want with the right math indexing.

4-I thought this 2D (like half pyramid side for the forest, where the 2D array looks like a horizontal right angle triangle layer for each power of2 tree) is easier and do the job of saving the pointers. If they can compact it even more into 1D and get all the math indexing, shift when insert/delete correct ( I'm not sure which makes less shifts with insert/delete the 2D or 1D)fine, but the way it seems they still use pointers.
staff
Activity: 4326
Merit: 8951
September 14, 2021, 07:22:25 PM
#2
Bitcoin nodes do not store merkle trees, so there is nothing to save.

The particular layout you suggest has poor locality.  E.g. the first node above 0 is at 8 instead of being as soon as it could possibly be, at 2. Even in ram access locality matters. (at least to the extent that performance matters at all)

It also only works for trees which are fully populated at every level-- ones where the size is a power of two-- or otherwise it requires padding up to the next power of two size which wastes a *lot* of space.
full member
Activity: 228
Merit: 156
September 14, 2021, 03:46:40 PM
#1
I wasn't sure if this point is suitable to this group, as I think it involves data structures not just Bitcoin.
The idea was first suggested in response to a discussion in the Utreexo project, when storage space in the Pollard CSN node is not as free as in a bridge server.
The tree needs at least 2N pointers for the internal nodes (non leaves), if we add a flag to distinguish leaves with different struct & took care of the messy tree traversals we have to do in getting the proof in addition to deletions & insertions.
https://github.com/mit-dci/utreexo/discussions/316
....
-At the end they decided to use a pointer to aunt


type polNode struct {
   data     Hash
   aunt    *polNode
}



map[miniHash]*polNode



map[uint64]*polNode


.
I came up with this idea of storing the tree simply as a car size 2D array with no pointers at all, just use math indexing to traverse any direction u like

I draw a figure,
https://user-images.githubusercontent.com/83260375/132422816-0fc6605b-9986-4c13-93a2-769ad573bb5e.jpg

And from his 7 nodes example

 14
  |----------------------\
12                        13
|----------\               |--------\
08         09         10        11
|---\       |---\        |---\       |---\
00  01  02  03  04  05  06  07

R[0]=0,1,2,...,7
R[1]=8,9,10,11
R[2]=12,13
R[3]=14
.
the proof becomes
//direction to know + or -
If ((i mod 2)==0) drct=1;
else drct=-1;
// first, the sibling node
proof=R[0,i+drct]

//add the rest thru loop
For(j=1; j≤logN; j++)
{ index= i/(2^j)+drct;
proof=Add(R[j,index]);
}
.
Now, I'm asking what's wrong with my design?I mean if it's correct why didn't they save 2N =2*76m=152M*pointer size in bytes???
(for a pointer to address a space of 150m node it must be at least 28bit so it can't be less than4 bytes, ie 600M saving of ur 4G~15%)
In addition, it's more compact and neat coding
I mean there must be a reason
...
-Last but not least, would the transaction Merkle Tree benefit from this idea, or it's not stored this way?
Jump to: