It'll probably not be worth the effort to integrate it into your perl library directly (being in Python and on top of SQLAlchemy, etc.), but you can port it, or maybe you'll just find it a useful reference as you write your own.
EDIT: I suppose the codebase might be intimidating to someone who not versed in Python or the SQLAlchemy framework. You'll find the insertion/update/deletion code in 'sqlalchemy_tree/orm.py', and the higher-level methods for manipulating the tree in 'sqlalchemy_tree/manager/{class_,instance}.py'.
I'll have a look at it someday. But meanwhile I've been working on neat stuff and I'd like to share it to everyone. It's not implemented in SQL right now, but I should do it soon.
Here it is.
I've spent quite some time reading the work of a guy named Vadim Tropashko on
this subject. It's quite fascinating, and I think his technique would be
perfect for a SQL model of the bitcoin block structure.
Tropashko uses rational values as borders for his intervals. One of the main
advantage is that you can work on a fix global interval (0 to 1 for instance):
you don't have to shift half of the block tree each time you insert a new node.
One cool aspect of this technique is the relation with continued fractions.
As you know any fraction can be written as a finite continued fraction:
p 1
- = ---------------
q 1
a + -----------
1
b + -------
c + ...
The idea is that the a, b, c, ... sequence provides a natural path for the
node's branch.
For instance, a sequence such as 1, 1, 2, 1 would mean: "take the second branch
at the third node from the root".
Now, nested intervals are all about intervals, so we can not be happy with just
one rational for each node. We need an upper limit, assuming the rational
described above is the lower limit (it is not necessarly the case though, but
we'll deal with this problem later).
The idea is to associate a Möbius function of a variable x ranging from 0 to 1
(same as our global interval):
1
M(x) = ---------------
1
a + -----------
1
b + -------
c + x
Then we have p/q = M(0). The parent of f/p is then M(infinity).
This is actually already enough to understand that M(x) can be written:
p + r x
M(x) = -------
q + s x
Where r/s is the parent node written as a rational.
I have no exact proof for that, but considerations on limits at infinity
and value at 0 clearly indicates it should be true.
Without the possibility to write M(x) like this, it would be pretty tough
to compute any M(x) value in SQL.
Anyway, what is important to remember is that to compute M(x) we only need
two rational numbers: the one from the actual lowest edge of the node, and
the one from its direct parent.
The only thing which is lacking is a way to get the first child. M(1) wouldn't
do as this is actually the first 'sibling'.
Tropashko then gets into quite a lot of complicated stuff and I had a lot of
difficulties grasping them, and when I managed to understand them I was
wondering if they would be useful anyway. Yet there is one remark that he
makes at the very end of his last article, a remark that can make, in my
humble opinion, the whole thing much easier.
Tropashko remarks that simple continued fractions are notoriously known
not to be monotonuous. Depending of the parity of the position of one of the
number a, b, c..., sometimes the whole value increases with the number,
sometimes it decreases.
He then proposes to use positive continued fractions
(mind the minus signs):
1
M(x) = ---------------------
1
a + 1 - -------------
1
b + 1 - ---------
c + 1 - x
Althouth this expression seems more complicated at first glance,
it is actually much easier to visualize and to work with.
The main reason for this is that it's easy to convince yourself that this
function of x is growing (I mean: x > y => M(x) > M(y) ).
This changes everything. We now can be sure that the lowest
edge for an interval will be M(0), and the highest edge will be M(1).
We'll usually identify a node with M(0).
Parent is still M(infinity), so we can still write the function as:
p - r x
M(x) = -------
q - s x
The only difference are the minus signs, but they are not a problem.
The first child of M(0) is easy to spot: it's M(1/2). The second child (first
sibling of M(1/2)) is M(1/3), the third is M(1/4), and so on.
The first sibling of M(0) is M(-1), second sibling is M(-2), and so on.
Here is a small part of a genealogy tree, to make things clearer:
M(infinity)-->M(0)-------->M(1/2)--> M(1/(2-1/2)) --> ...asymptotically ends at M(1)
| | | |
| \-->M(-1) | \->M(1/3)----> M(1/(3-1/2)) --> ...asymptotically ends at M(1/2)
| |
\---->M(-2) \->M(1/4)---> M(1/(4-1/2))--> ...asymptotically ends at M(1/3)
|
\-> M(1/(4-1/3))--> ...asymptotically ends at M(1/(4-1/2))
.
.
.
Then there is still a catch: how do you compute the parent from the lower edge
of the interval? Normally, as Tropashko explains it, you'd have to use the
extended euclidian algorithm, but we actually don't need to do that for a
bitcoin block tree, since each parent is present already in the database
anyway. We can therefore retrieve the parent rational value, and use it to
compute the Möbius function, and thus the rest of the tree as we need it.
So, to summarize, it's pretty easy. Each node tree will have two rational
numbers, defining an interval. If the lower edge of node A is inside the
interval of node B, then B is an ancestor of node A. To add a child to a node,
we just need to get the parent and compute the Möbius function for x = 1/(n+2),
where n is the number of already existing children. It shouldn't be difficult
to get n, we basically just need to make a loop and test for the existence of M(1/(k+2))
in the database.
Moving a whole branch should not be too difficult either. Imagine we have an
orphan block with children, and then suddenly the parent block of the orphan is
added to the database. To patch the new branch to the database, we need to
change intervals for each node of the previously orphan branch. It sure will
cost more than adding a single child to a leaf, but hopefully this situation
does not occur too often anyway.
One cool thing with the fact that positive continued fractions are increasing
functions, is that we will not have to store the ordinal value (height, depth,
however you name it) of each node. If we sort our queries by rational lower
edges, then we'll automatically get the sequencial order of our chain.