Author

Topic: Merkle Trees - How many leaf nodes in a tree with N total nodes? (Read 2640 times)

member
Activity: 111
Merit: 10
Examples:

       6
  4       5
1   2   3  {3}

Total nodes = 6
Leaf nodes = 3



       7
  5       6
1   2   3   4

Total nodes = 7
Leaf nodes = 4


             11
      9                10
  6       7         8      {8}
1   2   3   4   5  {5}

Total nodes = 11
Leaf nodes = 5



              15
      13              14
  9       10      11      12
1   2   3   4   5   6   7   8

Total nodes = 15
Leaf nodes = 8

Would it be possible to optimise the way that these trees are built?
It doesn't seem quite right hashing the final leaf nodes with themselves when you encounter odd numbers.

How about this:
 For each 1 in the binary representation of the number of elements in a set (number of leaf nodes) you build a complete merkle tree. (if there is an odd number then the final tree will just be a single element). Then hash the root of the smallest tree with the root of the second smallest. Hash the result of this with the root of the third smallest etc. until you reach the root of the largest tree and this is your block hash.

 As an example, to represent a set of 5 leaf nodes represented in binary by 101 I would build two trees (in fact the final i isn't a tree but just the single element)

 So it would look like this.

                         H1234.5
           H12.34
      H1.2       H3.4
   1       2     3       4             5

Only 4 hashes compared to 6 above.

 So for the number 7  (111) we build three trees (one is a single element)

                                H1234.567
        H12.34                               H56.7
     H1.2     H3.4           H5.6     
    1     2    3     4        5     6                     7
 One hash less.

 I haven't really examined this idea but at first glance it seems more efficient in number of hashes and in some cases more efficient for validation of the elements. However it also seems too simple so what is wrong with it? Smiley




 


 
sr. member
Activity: 467
Merit: 267
Sure, no problem. Thanks
member
Activity: 85
Merit: 11
If the number of leaves is a power of 2, the total number of nodes is 2n - 1.
If it's not, there are extra nodes introduced to fill in the holes. Let m be the number of these extra nodes. The total number of nodes become 2n -1+m.
When n is written in binary form, a division by two is equivalent to a bit shift to the right. As long as the last digit is a 0, the number is even.
Therefore all the trailing zeros are levels that are even and don't introduce new nodes.
When we reach a 1, a extra node is added. It is carried to the next level. Ex: 101. Without extra nodes, we have 1 at the root. 10 on the second level and 101 on the third. but the extra node made the 3rd level 110. And the second level is now 11. It's odd again, so an extra node is once again added. And it becomes 100.
 We can see that the process of adding nodes continues for every 0 that appears after the first 1. Because the carry made the number odd.
When we reach a 1. The carry turns that into a 0 and carry a 1 further. Therefore the 1 has no effect of the number of extra nodes.
In conclusion, the number of extra nodes is 1 + the number of zeros that are to the left of the first 1 that is not the most significant bit.
The final result is 2n + number of non trailing zeros.



Credits goes to Meni for the formula.




Thanks that is a great explanation of how it works. If you don't mind I'd like to use it (or a possibly edited version) in my article with credit to you. The article is about binary merkle trees in general including a method to store/retrieve them in insertion order as well as a balanced merkle trie design (credit catia/cryptonite). It will be published openly (if I hopefully get around to finishing it).
sr. member
Activity: 467
Merit: 267
If the number of leaves is a power of 2, the total number of nodes is 2n - 1.
If it's not, there are extra nodes introduced to fill in the holes. Let m be the number of these extra nodes. The total number of nodes become 2n -1+m.
When n is written in binary form, a division by two is equivalent to a bit shift to the right. As long as the last digit is a 0, the number is even.
Therefore all the trailing zeros are levels that are even and don't introduce new nodes.
When we reach a 1, a extra node is added. It is carried to the next level. Ex: 101. Without extra nodes, we have 1 at the root. 10 on the second level and 101 on the third. but the extra node made the 3rd level 110. And the second level is now 11. It's odd again, so an extra node is once again added. And it becomes 100.
 We can see that the process of adding nodes continues for every 0 that appears after the first 1. Because the carry made the number odd.
When we reach a 1. The carry turns that into a 0 and carry a 1 further. Therefore the 1 has no effect of the number of extra nodes.
In conclusion, the number of extra nodes is 1 + the number of zeros that are to the left of the first 1 that is not the most significant bit.
The final result is 2n + number of non trailing zeros.



Credits goes to Meni for the formula.


member
Activity: 85
Merit: 11
It is fairly straight forward to calculate N from M so (bitcoin does this) so it shouldn't be a problem.

Bitcoin method (approx):

Code:
int GetNodeCount(int leafCount) {
    int nodeCount = 1;
    for (int i = leafCount; i > 1; i = (i + 1) >> 1)
        nodeCount += i;
    return nodeCount;
}
donator
Activity: 2058
Merit: 1054
But M is unknown so how can you tell that N is correct?

I'm struggling to think tonight Smiley

Edit: Oh I see. I get it now Smiley That is a big simplification. I will check which is faster as I don't have to decrement as many times with my method. Your's is infinitely more explainable though.

When (if) I finish my article I will credit you
Thanks Smiley But we still need to verify the NNTZ method works.
member
Activity: 85
Merit: 11
But M is unknown so how can you tell that N is correct?

I'm struggling to think tonight Smiley

Edit: Oh I see. I get it now Smiley That is a big simplification. I will check which is faster as I don't have to decrement as many times with my method. Your's is infinitely more explainable though (most likely faster too).

When (if) I finish my article I will credit you
donator
Activity: 2058
Merit: 1054
What I meant with counting downwards:

Let's say N=135. You start with 135/2 = 67. You first try M=67, using the method you get N=138 which is wrong. Then you try M=66, giving N=136. Then you try M=65, this gives N=135 so it is the correct result.

To clarify, I'm not sure my method works, if you post more values using your implementation, I can test out mine.
member
Activity: 85
Merit: 11
If I'm not mistaken, to find the total number of nodes N from the leaf nodes M, you have
N = 2M + number of non-trailing 0's in M's binary representation.

With the only exception is M being a power of 2, in which case N = 2M-1.

For example, if M = 66, the binary expansion is 1000010. It has 1 trailing zero and 4 non-trailing 0's, so you have N = 2M+4 = 136.

To do the reverse, you could just start with N/2 and count downwards (unless a power of 2).

This is very cool Thanks. I'm still thinking about it could you explain a bit more about the count downwards? I mean we know N, so starting with N/2 we keep subtracting 1 from N until what?
N = 2M + NNTZ (num non-trailing zeroes)
M = (N - NNTZ)/2
Just not getting it sorry.

I was working it out in a much more complicated way using what I call the waltz series (wolframalpha link) The graph basically represents a binary tree. If you squash the graph from the top downwards it is an efficient way of storing the nodes linearly for a read/write structure. And then to get the linear waltz index of any node:

C#
Code:
public static ulong ToWaltzIndex(int level, ulong offset)
{
    return (offset * (1u << (level + 1))) + (1u << level) - 1;
}

And to get the leaf count:

Code:
/// 
///     We know that:
///     * All positive integers are valid leafCounts
///     * Only certain values are valid nodeCounts.
///     * There is a 1:1 mapping between leafCount:nodeCount
///     * When listed in order, the differences between the valid nodeCount values follows the waltz series
///     * When the tree is complete (aka symetric) then: leafCount = (nodeCount + 1) / 2
///     Therefore:
///     * The difference between any nodeCount and the nodeCount of a complete tree with the same height
///     must be the sum of K elements of the waltz series.
///     * nodeDiff = sum(waltz[0],...,waltz[K-1])
///     * X = GetWaltzSumCount(nodeDiff)
///     * leafCount = maxLeafCount - X
///

public static ulong GetLeafCount(ulong nodeCount)
{
//max possible leaf count at the current height
ulong maxLeafCount = Pow2Prev(nodeCount);

//max possible total node count at the current height
ulong maxNodeCount = (maxLeafCount * 2) - 1;

//Get the difference between the current node count and the max node count
ulong nodeDiff = maxNodeCount - nodeCount;

//if numNodes equals maxNodes then leafCount must equal maxLeafCount
if (nodeDiff == 0)
return maxLeafCount;

//Subtract the number of summed waltz elements (steps) that are required to equal nodeDiff, from maxLeafCount
return maxLeafCount - GetWaltzSumCount(nodeDiff);
}

And for completeness. the supporting functions (GetWaltzSumCount):
Code:
///     Returns the number of elements that need to be added in order to reach the target value. 
///
/// The target result to emulate
/// The number of consecutive elements that need to be added together before reaching the target
public static ulong GetWaltzSumCount(ulong target)
{
ulong counter = 0;
ulong sum = 0;

while (sum < target)
sum += GetWaltzValue(++counter) + 1;

if (sum > target)
counter--;

return counter;
}

///
///     A waltz value is the amount of times a number can be halved without leaving a remainder. These
///     values (series) are useful for navigating and storing binary tree structures.
///

/// A positive int greater than 0
public static uint GetWaltzValue(ulong number)
{
//Covers all unsigned 64bit values
const uint maxDivisions = 64;

ulong num = number;

for (uint i = 0; i <= maxDivisions; i++)
{
if ((num % 2) == 1)
return i;

num = num / 2;
}

throw new IndexOutOfRangeException("Could not find Waltz value for number: " + number);
}

donator
Activity: 2058
Merit: 1054
If I'm not mistaken, to find the total number of nodes N from the leaf nodes M, you have
N = 2M + number of non-trailing 0's in M's binary representation.

With the only exception is M being a power of 2, in which case N = 2M-1.

For example, if M = 66, the binary expansion is 1000010. It has 1 trailing zero and 4 non-trailing 0's, so you have N = 2M+4 = 136.

To do the reverse, you could just start with N/2 and count downwards (unless a power of 2).
sr. member
Activity: 467
Merit: 267
Are you looking for an exact number? If an approximation is ok, it's simply half. We can prove it like this:

A merkle tree is a binary tree which is essentially a direct elimination tournament. When two hashes are combined, one hash is produced.
So if you start with N leaves and keep on combining nodes, every time you reduce the number by 1. To get to a single node, therefore we have
to do N-1 combinations and the total of nodes is then roughly 2N. This is true, regardless where you combine nodes in the tree.
However, the merke tree also adds nodes when there is an odd number of nodes in a given level. You have at most log_2(N) extra nodes added, i.e.
one per level of your tree. So the number is between 2N and 2N + log2(N). As N grows, the term is log is negligeable.
I'll try to find an exact formula.


   
member
Activity: 85
Merit: 11
AKAIK this is basically 1-to-1 (aka a bijection). I've solved all these problems and I'll be publishing an article soon on merkle trees including the binary merkle trie used in cryptonite.
sr. member
Activity: 406
Merit: 251
http://altoidnerd.com
I'm sure there is an elegant way to do this but I've wasted too much time and I can manage in code (messy). Just in case anyone is up for the challenge, here is some extra data:

Leaf Nodes,Total Nodes
1,1
2,3
3,6
4,7
5,11
6,12
7,14
8,15
9,20
10,21
11,23
12,24
13,27
14,28
15,30
16,31
17,37
18,38
19,40
20,41
21,44
22,45
23,47
24,48
25,52
26,53
27,55
28,56
29,59
30,60
31,62
32,63
33,70
...
63,126
64,127
65,135
66,136
67,138


Is this a bijection? Equivalently is there only one way to draw a tree with k total nodes?
member
Activity: 85
Merit: 11
Getting closer. This gets the total node count from the leaf count (opposite of what I'm trying to do)

Code:
int GetNodeCount(int leafCount) {
    int nodeCount = 1;
    for (int i = leafCount; i > 1; i = (i + 1) >> 1)
        nodeCount += i;
    return nodeCount;
}
member
Activity: 85
Merit: 11
I'm sure there is an elegant way to do this but I've wasted too much time and I can manage in code (messy). Just in case anyone is up for the challenge, here is some extra data:

Leaf Nodes,Total Nodes
1,1
2,3
3,6
4,7
5,11
6,12
7,14
8,15
9,20
10,21
11,23
12,24
13,27
14,28
15,30
16,31
17,37
18,38
19,40
20,41
21,44
22,45
23,47
24,48
25,52
26,53
27,55
28,56
29,59
30,60
31,62
32,63
33,70
...
63,126
64,127
65,135
66,136
67,138
member
Activity: 85
Merit: 11
If the tree is complete (IE. 1/2/4/8/16/... leaf nodes) then the formula is:

Leaf Count = (N + 1) / 2



member
Activity: 85
Merit: 11
Examples:

       6
  4       5
1   2   3  {3}

Total nodes = 6
Leaf nodes = 3



       7
  5       6
1   2   3   4

Total nodes = 7
Leaf nodes = 4


             11
      9                10
  6       7         8      {8}
1   2   3   4   5  {5}

Total nodes = 11
Leaf nodes = 5



              15
      13              14
  9       10      11      12
1   2   3   4   5   6   7   8

Total nodes = 15
Leaf nodes = 8
member
Activity: 85
Merit: 11
How many leaf nodes are there in a Merkle tree that has N total nodes?

Can anyone help me out with this? Assume the Merkle tree is implemented just as it is for Bitcoin transactions.

Any help greatly appreciated  Smiley
Jump to: