Pages:
Author

Topic: Ultimate blockchain compression w/ trust-free lite nodes - page 4. (Read 87937 times)

legendary
Activity: 905
Merit: 1012
An index keyed by (txid:n) will have to be maintained for block validation anyway. My current plan is to have one index (hash(script), txid:n) -> balance for wallet operations, and another (txid:n) -> CCoins for validation.

Transactions within blocks are processed in-order, and so cannot depend on later transactions.
sr. member
Activity: 461
Merit: 251
@retep, here's roughly how I imagine sharding a node could be done without diluting hashing power across multiple separate chains (that sounds terrible!):

First I'll assume we include in the block headers the digest of a utxo tree keyed by (txid:n, script) instead of by (script, txid:n), as this will turn out to be much more natural, for this purpose at least.  Second, I'll assume the tx digest is created from the authenticated prefix tree of their txids, which will also turn out to be much more natural.  (Last minute thought: doesn't the tx ordering matter in the usual tx Merkle tree, i.e. earlier txs can't spent TxOuts created by later txs?  Or can it just be assumed that the block is valid if there exists some valid ordering which is up to the verifier to construct?)  The radix size turns out not to matter, but let's call it k.

Distributed block construction

Division of labor is as follows: We have a coordinator who directs the efforts of N well-mirrored branch curators who separately update each of the utxo tree branches below level logk(N), and process subblocks of any number of transactions.

A branch curator downloads the incoming txs whose txids lie in his particular branch.  Notice that due to our convenient choice of keying, all of his newly created TxOuts will lie in his own branch.  For each TxIn in a given tx, he needs to download the corresponding TxOut from his relevant well-mirrored counterparts. Note that TxOuts will always be uniquely identifiable with only a few bytes, even for extremely large utxo sets. Also, having to download the TxOuts for the corresponding TxIns isn't typically that much extra data, relatively speaking - ~40 bytes/corresponding TxOut, compared to ~500 bytes for the average tx having 2-3 TxIns.  With just these TxOuts, he can verify that his txs are self-consistent, but cannot know whether any given TxOut has already been spent.

This is where the coordinator comes into play.  He cycles through the N branches, and for each branch, nominates one of the curator mirrors that wishes to submit a subblock.  This branch curator then gathers a bunch of self-consistent txs, and compresses the few byte ids of their TxIns into a prefix tree.  He sends his respective counterparts - or rather, one of their mirrors who are up to date with the previous subblock - the appropriate branches, and they send back subbranches of those that are invalid with respect to the previous subblock.  Note that this communication is cheap - a few bytes per tx.  He then removes the invalid txs from his bunch, informs his counterparts of the TxIns that remain so they can delete the corresponding utxos from their respective utxo tree branches, deletes those relevant to him, inserts all of his newly created TxOuts into his utxo tree branch, and builds his tx tree.  He submits his tx and utxo tree root hashes to the coordinator, who also gathers the other branch curators' updated utxo tree root hashes.  This data is used to compute the full tx and utxo tree root hashes, which is then finally submitted to miners.

When the coordinator has cycled through all N branches, he goes back to the first who we note can perform very efficient updates to his existing tx tree.

Some notes:

  • Mutual trust between all parties was assumed in lots of ways, but this could be weakened using a fraud detection and punishments scheme - ingredients being e.g. authenticated sum trees, fidelity bonds, and lots of eyes to audit each step.  Trusted hardware or SCIP proofs at each step would be the ideal future tech for trust-free cooperation.
  • The job of the coordinator is cheap and easy.  The branch curators could all simultaneously replicate all of its functions, except nominating subblock submissions.  For that they'd need a consensus forming scheme.  Perhaps miners including into their coinbase a digest of their preferred next several subblock nominees, and broadcasting sub-difficulty PoW would be a good alternative.
  • Subblock nominees could be selected by largest estimated total fee, or estimated total fee / total size of txs, or some more complicated metric that takes into account changes to the utxo set size.
  • Revision requests for a chain of subblocks could be managed such that that the whole chain will be valid when each of the subblocks come back revised, thus speeding up the rate at which new blocks can be added to the chain.
  • Nearby branch curators will have no overlap in txs submitted, and very little overlap in utxos spent by them (only happens for double spends).

Distributed block verification

To catch up with other miners' blocks, branch curators would download the first few identifying bytes of the txids in their respective branches, to find which txs need to be included in the update.  The ones they don't have are downloaded.  Then in rounds, they would perform collective updates to the tx and utxo trees, so that txs that depend on previous txs will all eventually be covered.  If by the end the tx and utxo tree root hashes match those in the block header, the block is valid.

Future tech: branch curators would instead simply verify a small chain of SCIP proofs Smiley

Additional note: branch curators can additionally maintain an index of (script: txid:n) for their branch, in order to aid lightweight clients doing lookups by script.
sr. member
Activity: 461
Merit: 251
I've pushed an in-memory hybrid PATRICIA-Braindais tree implementation to github:

https://github.com/maaku/utxo-index
Cool!

On second thought, I don't think the radix size really matter too much for sharding the node.  The choice of keying OTOH...
legendary
Activity: 905
Merit: 1012
I've pushed an in-memory hybrid PATRICIA-Braindais tree implementation to github:

https://github.com/maaku/utxo-index

I may experiment with the internal structure of this tree (for example: different radix sizes, script vs hash(script) as key, storing extra information per node). 2-way tries probably involve way too much overhead, but I think a convincing argument could be made for 16-way tries (two levels per byte). Once I get a benchmark runner written we can get some empirical evidence on this.

Having sub-trees isn't so much about address reuse as it is that two different keys are needed: the key is properly (script, txid:n). In terms of implementation difficulty I don't think it's actually that much more complicated. But again, we can empirically determine this.
sr. member
Activity: 461
Merit: 251
Regarding having nested subtries for coins with the same ScriptSigs, I wonder if it's such a good idea to complicate the design like this in order to accommodate address reuse?  Address reuse is discouraged for privacy and security reasons, and will become increasingly unnecessary with the payment protocol and deterministic wallets.

Also, was there a verdict on the 2-way (bitwise) trie vs. 256-way + Merkle trees in each node?  I've been thinking lately about sharding block creation/verification, and am noticing the advantages of the bitwise trie since its updates require a much more localized/smaller set of data.
legendary
Activity: 1428
Merit: 1093
Core Armory Developer
For reference on the synchronization question, I will reference one of my previous posts.  It was a thought-experiment to figure out how to download the Reiner-tree between nodes, given that the download will take a while and you'll get branch snapshots at different block heights:

https://bitcointalksearch.org/topic/m.1408410

I just wanted to make sure it wasn't something to be concerned about (like all sorts of hidden complexity).  It looks like it's workable.
legendary
Activity: 905
Merit: 1012
If you are talking about pairs of adjacent blocks all you've achieved is making validating the chain possibly a bit cheaper,

*A lot* cheaper. But anyway:

those creating the blocks still need to have the full UTXO set.
To create a transaction you only need access to your own inputs. Why would you need the full UTXO set?

Going back to the merkle tree thing it occurs to me that achieving synchronization is really difficult. For instance if the lowest level of the tree is indexed by tx hash, you've achieved nothing because there is no local UTXO set consensus.
Can you explain this?
legendary
Activity: 1120
Merit: 1152
Here's a rough sketch of another concept:

Suppose you have 2*k blockchains where each blockheader was actually the header of two blocks, that is chain n mod 2*k and chain (n+1) mod 2*k In English picture a ring of blockchains and miners would "mine" pairs of chains.

The rule is that the difference in height between any adjacent pair of chains can differ no more than 1 block, and finding a valid PoW creates a pair of blocks with an equal reward in each chain. Because the miners get the equal reward they have an incentive to honestly mine both chains, or they'd produce an invalid block and lose that reward. To move coins between one chain and it's neighbor create a special transaction doing so which will be validated fully because a miner will have full UTXO set knowledge for both chains. Of course, this means it might take k steps to actually get a coin moved from one side of the ring to the other, but the movement will be fully validated the whole way around.

Again, what used to be a 51% attack can now become something much weaker. On the other hand because the data to store the PoW's and block headers (but not full blocks) is small PoW's for one pair of chains can include the hashes of every chain, and the system can simply treat that extra PoW as an additional hurdle for an attacker to rewrite any individual chain. What a 51% attack on a pair of chains involves is to actually manage to get into a situation where you are the only person bothering to mine a particular pair of chains - hopefully a much higher barrier if people pick the pair of chains they validate randomly.

The ring is just a nice example; in reality I think it'd good enough to just have the n chains and miners pick pairs of chains to mine. The number of pairs that needs to be mined for a full interconnected set is n(n-1) ~= n^2. The big advantage of a fully connected set is that the slicing can happen on a per-txout-hash basis, IE a transaction spending a txout starting with A and creating a txout starting with B can be mined by anyone mining both the A and B chains, though note how you'll wind up paying fees for both, and with more outputs you can wind up with a partially confirmed transaction. Also note how a miner with only the UTXO set for the A chain can safely mine that transaction by simply creating a 1 transaction block in the B chain... ugly. You probably need proof-of-UTXO-set-posession on top of proof-of-work to keep the incentives correct.

We've created weird incentives for hashers because moment to moment the reward (fees) for mining each pair will be determined by the transactions required to bridge that pair, so pools will pop up like crazy and your mining software will pool hop automatically - another perverse result in a system designed to aid decentralization, although probably a manageable one with probabilistic auditing.

Maybe the idea works, but I'll have to think very carefully about it... there's probably a whole set of attacks and perverse incentives lurking in the shadows...
legendary
Activity: 1120
Merit: 1152
The talk left me with the impression that their non-recursive SCIP proofs are inexpensive, so I wonder if recursion could be avoided.  For example, if the full state were encoded locally in pairs of adjacent blocks  - as the proposal in this thread would achieve - then a SCIP proof validating the next block could simply assume validity of the two prior blocks, which is fine if the node verifying this proof has verified the SCIP proofs of all preceding blocks as well.  Once blocks become individually unwieldy, perhaps verifying each block would simply take a few extra SCIP proof validations - with SCIP proof authors tackling the transaction and UTXO patricia/radix tree updates by branches.  Could this approach properly remove the need to nest SCIP proofs inside of SCIP proofs, or is there something obvious I'm missing?

If you are talking about pairs of adjacent blocks all you've achieved is making validating the chain possibly a bit cheaper, those creating the blocks still need to have the full UTXO set.


Going back to the merkle tree thing it occurs to me that achieving synchronization is really difficult. For instance if the lowest level of the tree is indexed by tx hash, you've achieved nothing because there is no local UTXO set consensus.

If the lowest level of the tree is indexed by txout hash, H(txid:vout), you now have the problem that you basically really have a set of merge-mined alt-coins. Suppose I have a txout whose hash starts with A and I want to spend it in a transaction that would result in a txout with a hash starting with B.

So I create a transaction spending that txout in chain A, destroying the coin in that chain, and use the merkle path to "prove" to chain B that the transaction happened and chain B can create a coin out of thin air. (note how the transaction will have to contain a nothing-up-my-sleeve nonce, likely a blockhash from chain B, to ensure you can't re-use the txout)

This is all well and good, but a 51% attack on just chain A, which overall might be a 5% attack, is enough to create coins out of thin air because chain B isn't actually able to validate anything other than there was a valid merkle path leading back to the chain A blockheader. It's not a problem with recursive SCIP because there is proof the rules were followed, but without you're screwed - at best you can probabilistically try to audit things, which just means an attacker gets lucky periodically. You can try to reverse the transaction after the fact, but that has serious issues too - how far back do you go?

Achieving consensus without actually having a consensus isn't easy...

I do like this approach as well, and hadn't thought to use fidelity bonds for expensive punishment of misbehaving anonymous 'miner helpers'.  Though it is susceptible to attacks on the p2p network, unlike a SCIP approach, by surrounding groups of nodes and blocking the relay of fraud proofs to them.  Not sure how important this is in practice though.

Bitcoin in general assumes a jam-proof P2P network is available.

An important issue is that determining how to value the fidelity bonds would be difficult; at any time the value of the bond must be more than the return on committing fraud. That's easy to do in the case of a bank with deposits denominated in BTC, much harder to reason about when you're talking about keeping an accurate ledger.
sr. member
Activity: 461
Merit: 251
A non-SCIP approach that we can do now would be to use fraud detection with punishment. Peers assemble some part of the merkle tree and digitally sign that they have done so honestly with an identity. (a communitive accumulator is another possibility) The tree is probabalisticly validated, and any detected fraud is punished somehow, perhaps by destroying a fidelity bond that the peer holds.  You still need some level of global consensus so the act of destroying a bond is meaningful of course, and there are a lot of tricky details to get right, but the rough idea is plausible with the cryptography available to us now.
I do like this approach as well, and hadn't thought to use fidelity bonds for expensive punishment of misbehaving anonymous 'miner helpers'.  Though it is susceptible to attacks on the p2p network, unlike a SCIP approach, by surrounding groups of nodes and blocking the relay of fraud proofs to them.  Not sure how important this is in practice though.
sr. member
Activity: 461
Merit: 251
However SCIP is probably years away from getting to the point where we could use it in the Bitcoin core. One big issue is that a SCIP proof for a validated merkle tree has to be recursive so you need to create a SCIP proof that you ran a program that correctly validated a SCIP proof. Creating those recursive proofs is extremely expensive; gmaxwell can talk more but his rough estimates would be we'd have to hire a big fraction of Amazon EC2 and assemble a cluster of machines with hundreds of terrabytes of ram. But math gets better over time so there is hope.
The talk left me with the impression that their non-recursive SCIP proofs are inexpensive, so I wonder if recursion could be avoided.  For example, if the full state were encoded locally in pairs of adjacent blocks  - as the proposal in this thread would achieve - then a SCIP proof validating the next block could simply assume validity of the two prior blocks, which is fine if the node verifying this proof has verified the SCIP proofs of all preceding blocks as well.  Once blocks become individually unwieldy, perhaps verifying each block would simply take a few extra SCIP proof validations - with SCIP proof authors tackling the transaction and UTXO patricia/radix tree updates by branches.  Could this approach properly remove the need to nest SCIP proofs inside of SCIP proofs, or is there something obvious I'm missing?

Edit: I suppose this would mean that Alice would be sending a slightly different program for Bob to run to produce each SCIP proof in each block?   I guess these programs would have to be a protocol standard, since 'Alice' is really everybody, and would differ only by the hash of the previous block?  All of this is very vague and magical to me still...
legendary
Activity: 1120
Merit: 1152
After watching that video I can't help but think, with my very limited understanding of it, that SCIP combined with appropriate Bitcoin protocol changes (perhaps like, as you mentioned, localizing the full state in the blockchain using an authenticated UTXO tree) will be able to remove most of the reproduction of work necessary by the network that it currently must do in order to operate trust free, as well as make it possible to shard across untrusted peers the operation of combining new transactions into Merkle trees to produce new block headers for miners to work on.  These would mean the network could remain perfectly decentralized at ridiculously high transaction rates (the work done per node would, I think in theory, scale as O(M log(M) / N), where M is the transaction rate, and N is the total number of network nodes).  This might even mean an always-on zerocoin is feasible (always-on is important so that the anonymity set is maximal, and its users aren't a persecutable (relative) minority).

Anybody with a better understanding of SCIP and its applicability to Bitcoin able to pour cold water on these thoughts for me?

You're actually quite correct. It solves the censorship problem too because the "upper levels" of this merkle tree of transactions are still cheap to validate so mining itself remains cheap. You do have issues where someone may create an imbalanced tree - the validation rules will need to have the levels of the merkle tree be sorted - but the work required to imbalance the tree increases exponentially. To be exact, it will be a patricia/radix tree rather than a merkle tree.

However SCIP is probably years away from getting to the point where we could use it in the Bitcoin core. One big issue is that a SCIP proof for a validated merkle tree has to be recursive so you need to create a SCIP proof that you ran a program that correctly validated a SCIP proof. Creating those recursive proofs is extremely expensive; gmaxwell can talk more but his rough estimates would be we'd have to hire a big fraction of Amazon EC2 and assemble a cluster of machines with hundreds of terrabytes of ram. But math gets better over time so there is hope.


A non-SCIP approach that we can do now would be to use fraud detection with punishment. Peers assemble some part of the merkle tree and digitally sign that they have done so honestly with an identity. (a communitive accumulator is another possibility) The tree is probabalisticly validated, and any detected fraud is punished somehow, perhaps by destroying a fidelity bond that the peer holds.  You still need some level of global consensus so the act of destroying a bond is meaningful of course, and there are a lot of tricky details to get right, but the rough idea is plausible with the cryptography available to us now.
legendary
Activity: 1428
Merit: 1093
Core Armory Developer
I don't have any real understanding of SCIP, but I did talk to the guys behind it, at the conference.  They are very excited about their research, and it clearly is quite powerful if it works.  However, they did say that it is extremely complicated,  and even if it does work, it may have a tough time getting confidence from any security-conscious community due to its complexity.   I imagine it will need years of fielding in order for it to actually become an option for any important application.

And of course, I have my doubts that it really works.  It sounds too good to be true, but admittedly, I haven't had time to try to understand it at the technical level yet.   One day I'll try to find some time to dig into it.  But until then, we certainly can't count on it being available for the Reiner-tree.

I'd be extremely interested to see someone with the correct background dig into it and provide a technical overview of how it works.
sr. member
Activity: 461
Merit: 251
I've been thinking that if the indexes are put directly in each main-chain block AND miners include a "signature" demonstrating computational integrity of all transaction validations in the chain, new full nodes only need to download the last block and are completely safe!!!

I wonder if block N signature's can be combined somehow with N+1 block transactions...
Does anybody know what's Ben's nick in the forum?

After watching that video I can't help but think, with my very limited understanding of it, that SCIP combined with appropriate Bitcoin protocol changes (perhaps like, as you mentioned, localizing the full state in the blockchain using an authenticated UTXO tree) will be able to remove most of the reproduction of work necessary by the network that it currently must do in order to operate trust free, as well as make it possible to shard across untrusted peers the operation of combining new transactions into Merkle trees to produce new block headers for miners to work on.  These would mean the network could remain perfectly decentralized at ridiculously high transaction rates (the work done per node would, I think in theory, scale as O(M log(M) / N), where M is the transaction rate, and N is the total number of network nodes).  This might even mean an always-on zerocoin is feasible (always-on is important so that the anonymity set is maximal, and its users aren't a persecutable (relative) minority).

Anybody with a better understanding of SCIP and its applicability to Bitcoin able to pour cold water on these thoughts for me?
legendary
Activity: 1372
Merit: 1002
I've been thinking that if the indexes are put directly in each main-chain block AND miners include a "signature" demonstrating computational integrity of all transaction validations in the chain, new full nodes only need to download the last block and are completely safe!!!

I wonder if block N signature's can be combined somehow with N+1 block transactions...
Does anybody know what's Ben's nick in the forum?
legendary
Activity: 1470
Merit: 1006
Bringing Legendary Har® to you since 1952
* ShadowOfHarbringer is watching this.
legendary
Activity: 1896
Merit: 1353
Am I reading the source code correctly that you are doing a standard Merkle-list for the UTXO tree? I couldn't find anything that looked like balanced tree updates. I'd think that's the root of your inefficiency right there - PATRICIA trees are a big part of this proposal.

I use a PATRICIA tree for addresses, and a simple list for UTXOs that belong to the same address.
I remember discussing this question on IRC, we were not sure if it was better to store UTXOs as database entries or to use addresses for the leaves of the tree (what I do)

note that if we use a patricia tree of UTXOs, we might end up doing more database queries for the hashes; what makes you think it would be less efficient?


legendary
Activity: 905
Merit: 1012
Am I reading the source code correctly that you are doing a standard Merkle-list for the UTXO tree? I couldn't find anything that looked like balanced tree updates. I'd think that's the root of your inefficiency right there - PATRICIA trees are a big part of this proposal.

You are right that this impacts Electrum significantly. We should coordinate our efforts.
legendary
Activity: 1896
Merit: 1353
I believe this proposal is of primary importance for Electrum.
I started to work on it a few weeks ago, in order to add it to Electrum servers. I finalized two preliminary implementations during this week-end.

I am pretty agnostic concerning the choice of keys; I guess hash160(rawscript) makes sense.

I would like to make the following point:
It is possible to compute node hashes much faster if you store the hash of a node at the key of its parent.
That way, it is not necessary to perform database requests to all children when only one child is updated.
In order to do that, it is necessary to keep a list of children pointers at each node; this list uses a bit more space (20 bytes/node).
Thus, each node stores a list of pointers (20 bytes), and a variable-length list of hash:sumvalue for its children

I made two separate implementations:
- a "plain vanilla" version without pointers, where a node's hash is stored at the node; this implementation was too slow to be practical.
- a faster version that stores node hashes at their parent, and keeps a list of pointers for each node.

both versions are available in on github, in the Electrum server code: https://github.com/spesmilo/electrum-server
(look for branches "hashtree" and "hashtree2")

both branches were tested with randomly generated blockchain reorgs, and they produced the same root hash.

I could run the "hashtree2" version for 184k blocks on my vps, and another user went over 200k using a faster machine, but it still took him more than 24h.
I am currently working on a third version, that will use write batches when computing the hashes; I hope to further accelerate it that way.

legendary
Activity: 905
Merit: 1012
Oh, my conservative position is still that prefixing the index key is the wrong way to solve this problem, but I'm willing to explore the idea.
Pages:
Jump to: