Author

Topic: A procedure for operators to vote for custodians on Layer 2 networks (Read 103 times)

legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
TL;DR - We can make a subset of Taproot leaves that have MuSig signatures, by analyzing the network availability of the nodes that want to be global funds custodians.

This algorithm really has little to do with voting at all - for lack of a better word to describe what's going on at a high level, I chose to use "voting".

As the topic title says, I'm publishing an experimental procedure for operators to vote among themselves those operators shall hold custody of keys on a Layer 2 network. The keys will be in MuSig format, so there are no issues with privacy or (provided N is large) rogue operators. In practice, there will be thousands of co-signers.

Important: Node operators != users. It is possible to design a layer where users can send and receive funds on the network without being an operator or running a node - similar to SPV wallets on Layer 1. I do not make any kind of assumption that users must run a node, or vice versa.

Abstract of the problem:

Assume a Layer 2 network that operates by operators holding collective custody of the funds. Think of it as a kind of "socialism" where the network of operators will collectively make an aggregated transaction to the mainnet at a particular interval - and of course anyone can become an operator by running some node software.

Let N be the number of operators, and M is some number greater than half of the operators (to ensure fairness by majority).

Now, N-of-N multisignatures are impractical in any L2 network, since they are vulnerable to a single node failing. In practice, you'd use a M-of-N MuSig. But since I don't know whether these things are supported or not, this can be emulated by using Taproot addresses with nCr(N,M) leaves.

Keep in mind that this will make the maximum number of combinations when M is close to half of N (the ideal value of M for such a L2 network). There is no space issue resulting from this, because you don't need to store all leaves & branches in one place.

However, there is a time issue when using all those combinations, because it will quickly overwhelm the processing speed - For example with 1000 operators and a desired 500 of them to be cosigners, you'd have nCr(1000,500) = 2.7xxxE+299 leaves to add to a transaction. For reference, a single 2GHz processor can execute maximum 2E+9 simple  x86 instructions (things like MOV, ADD, SUB and so on) and when you factor in the hundreds of assembly instructions needed to generate a Merkle tree hash, this time capacity goes down quickly.

When faced with this problem of numbers, the solution is instead of making all combinations of cosigners, you make as many diversified combinations of online cosigners as your processing power allows. Diversified combinations of cosigners minimize the number of times that a single cosigner is present in all of them. This consequentially minimizes single points of failure.

Parameters:

To calculate a node's availability, you'd sample each time a node is online over a time period T (like 10 minutes), and then take the average. The result will be between 0 and 1, because each sample is either 0 (offline) or 1 (online). Call this variable the availability coefficient or just Ai.

Next you need the ping time between that node and some other node, measured in milliseconds (because that is the prevailing unit for ping). By definition, there will be N^2 of these variables, so these will just be identified as Pi,j. Ping time can be measured using traceroute(1) code. You will also need a constant Pmax that serves as a ping time threshold - a pair of nodes with ping time above this value won't be considered when making a list of cosigners (see below).

Last, you'll need what I call a "singularity coefficient". It's inversely proportional to the number of times this node has been used in an N-of-N MuSig configuration. The idea is to limit the number of MuSig configurations a node can be in, to guard against centralization and single-points-of-failure. It starts at 1, and each time the node is included in a configuration, the coefficient is subtracted by 1/N and used in all subsequent calculations. Its lowest possible value is 0. We'll call this variable s.

Algorithm:

First of all, a node has to calculate its ping time relative to all the other nodes on the network, to make the Pi,j. So assuming node i is trying to find its own ping times, it will fill in its own particular vector of P.

After all the nodes have their ping times, it calculates a "vote" or a "score" for all the other nodes using the formula (in terms of j, because "i" is supposed to represent the node doing the calculation of its peer):

Si,j = log10[(Aj * sj * Pmax)/Pi,j]

Where Si,j represents the score. The score is supposed to represent how reliable the connection between the two nodes is, and depends on the other node's uptime, and on the ping time between them. The logarithm is for the node to filter out other nodes that do not mean minimum network connection quality requirements, and it will disregard all scores <= 0. For example, if the other node has zero uptime, i.e. 100% downtime, it will be completely disregarded because the zero availability sets the logarithm to minus infinity. Now, if I set max ping time at 200ms, it will completely disregard other nodes that are 100% up with a ping time higher than or equal to that, and nodes that are 50% up with a ping time >= 100ms, and so on. The system is designed to punish node operators who remain offline, because the funds availability for all users depends on the cosigners signing the transaction quickly.

Note: N must be > 1 for any of this to work. That is because we are working in pairs. Think of it like edges on a shape. A line has two points and 1 edge. A triangle has 3 edges, a quadrilateral 4, etc. These shapes resemble paths that a MuSig[2] payload has to take to reach all of the cosigners, only it also takes into account uptime and singularity. So having the highest score across M nodes is very important. It's basically a giant fully connected Graph infrastructure.

Each node i sends its Si,j to all the other operators, where j is all the numbers from 1 to N (I'm using 1-based indices here) - essentially a vector of scores for this operator relative to all the others.

At this point, the nodes have all the information they need to determine all the best sets of M cosigners:

1. First, the Si,j that are <= 0 are deleted along with the corresponding node entries in the received dataset. This corresponds to deleting undesireable edges in the graph:
2. Then Dijkstra's algorithm (shortest path algorithm for graphs) or a variant of it is used to find the set of M nodes with the largest score - for this we will temporarily take the reciprocal of the scores to convert it to a "smallest value problem". We make this set of M nodes a MuSig[2] configuration. Maybe use a bloom filter or the like to eliminate duplicate configurations.
3. The M nodes that formed the most recent MuSig configuration have their singularity scores subtracted by 1/N, and the formula is calculated again for updated nodes.
4. Repeat steps 2. and 3. until the maximum number of Taproot leaves that are feasible to parse have been generated.

The result represents the most diversified set of valid combinations of cosigners which can sign a transaction, adjusted based on node availability.

Resource usage:

It has a variable amount of CPU time - you can max it out if you want, it all depends on the number of combos you want to generate. Keep in mind that as the number of nodes increases, the number of combinations explodes so you will generally only want to deal with the most diversified and reliable combinations of nodes.

Memory usage is very small - storing this info by itself should not exceed a few megabytes regardless of programming language. Same goes with disk usage.

Network usage is variable and depends on the time period T and the number of nodes. Each time period, a node will send N messages, so assuming that messages are 32 bytes large [this is subject to change], when there are 1000 nodes, a node will be sending 32KB to the other nodes per time period, but it will receive (N^2)*(message size) bytes from all the other nodes in total. So in these parameters it would be 32MB received per time period. So it's very important that the time period is large enough to avoid excessive network traffic. In a practical network, with N~=10000, these numbers would be 3.2MB sent and 3.2GB received per time period. Considering the size of the Bitcoin network itself, I doubt that any L2 network will reach larger than that unless something radical happens in Bitcoin and people flock to run a Bitcoin Core node.

Advantages:

- Trustless; does not rely on any centralized entity to bring about law and order
- Similarly, it does not depend on any time-sensitive timelocks or penalties
- Independent from of the block generation time, so the set of cosigner combos can always be adjusted periodically to ensure cosigners are always available
- Does not bloat the main chain with extra fields and protocols, a big plus for Bitcoin purists - by contrast, most other scalability proposals including brickchain which is being discussed on the mailing list right now require that some additions are made to the blockchain.

Disadvantages:

- Operators must be online all the time, otherwise their score will plummet (is this even a disadvantage? No one should use offline operators anyway)
- Network usage can be excessive if you don't tune the parameters
- If max ping is configured too low, you could accidentally restrict nodes to voting only for nodes that are geographically really close to it, e.g. in the same country - making the whole formula vulnerable to government censorship.

Further work:

- Maybe make the time period vary based on the number of nodes?
- Research algorithm implementations to make this run as fast as possible (time to pull out Donald Knuth's stack of books).



Discussion welcome.

Keep in mind that this isn't a scalability protocol by itself, it is merely an algorithm for voting for the custodians for such a network. Think of it as Liquid, but without the centralized nodes.  Smiley

Edited to correct formula
Jump to: