Author

Topic: Can dynamic accumulators be used to store UTXOs (Read 1536 times)

legendary
Activity: 1176
Merit: 1134
September 06, 2015, 01:20:55 AM
#8
With ramchains I have encoded all unspents (actually all addresses and txids too) in a 32 bit integer (rawind). 4 billion should be enough for a while, though it will need to be expanded in size in a few years for some of the tables.

Internally all operations are 10x (or more) faster due to just dealing with small integers.

As long as this is used internally, it actually doesnt matter about reorgs where the renumbering that will be different on different nodes.  However, by having a strict ordering of each block, txid within a block, vin and vout within tx, then given that all nodes are in consensus then all the 32 bit rawinds will be in consensus.

Since the exact value associated with each rawind is not needed as much as the rawinds themselves, the actual values can be stored in a memory mapped file without any overhead for indexing since they are fixed size. Just multiply the rawind by the itemsize to get the offset in the file.

The total size of all the rawinds with enough information to construct multisig spends and update all address unspent balances in realtime, it is just several gigabytes and even counting all of the data, it is around 10gb, so it achieves a nice space savings, especially with about 70% of that in memory mapped files and not using up RAM. However I dont have this as the native block format yet, so it is in addition to the standard blockchain.

On top of the ramchain data structures, I have made an Lchain, which is a snapshot every 1000 blocks of the state of all the tables, along with a cumulative SHA256 for all changes to the state of the tables. This allows for different nodes to verify consensus of the exact state of all the internal tables at specific blocks, via a combined ledgerhash and if there is a discrepancy, the exact table that is mismatched is identified.

Internally, each block generates events that change the state of a dozen different tables. So the state at block N+1 is the state of block N as modified by all the events in that block. By having a snapshot every 1000 blocks, the state of any block can be recreated with an average of 500 blocks of event updates.

If all nodes are running Lchains, then the nodes can bootstrap by verifying getting the 370 Lchain snapshots in parallel and then also getting the events between each snapshot. I dont think it would need to do it redundantly (except maybe the most recent ones), as each snapshot must match exactly. I dont have this bootstrapping coded yet, but once it is in place, then the blockchain storage requirement for each node can be reduced by 90%+.

The way to do that is to have each node archive the Lchain events between 4 random snapshots and the next one. [all nodes can easily store all the 370 snapshots as they are not very large]

Since the archive will be decentralized with TOTALNODES/100 coverage, this seems adequate, though some archival nodes would probably want to keep a complete history. This way, all nodes can have a complete distilled blockchain and the ability to reconstruct the total state as of any block quite efficiently.

I am quite busy with my projects and I dont think I will have a chance to improve the ramchains/Lchain codebase for a while, so I hope some other devs will be interested. With improvements in this area, a lot more transactions will fit into 1MB.

James

https://github.com/jl777/btcd/blob/master/libjl777/plugins/coins/coins777.c
https://github.com/jl777/btcd/blob/master/libjl777/plugins/ramchain/ramchain.c
legendary
Activity: 1232
Merit: 1094
So you can imagine a world with stateless miners and full nodes, and wallets either tracking their own coin proofs or outsourcing that work to archive nodes that help them form the proofs needed for their transactions.

The ideal deletion proofs would be that the wallet has to remember them once, and then can use them later.  Zeroing out a member means that the path for all nearby coins also changes.

The SPV protocol would have to change so that you get path updates for any coins in the UTXO set that change.  If 4000 coins change in a block, then the first 12 or so levels of every path will change.

If leaf deletion was allowed without re-balancing, the tree would naturally shrink from attrition.  If a leaf's sibling is deleted, then the path to the leaf drops by one.  On average, all leaves would have the same height.
full member
Activity: 217
Merit: 259
Are there any other group besides RSA with this property?  AFAIK for elliptic curves it is expensive but not infeasible to compute the order.
 But class groups of imaginary quadratic orders are believed strong in this respect, but there hasn't been that much work or analysis for cryptosystems based on ideals; so I'm unsure of how useful they'd be... https://www.cdc.informatik.tu-darmstadt.de/reports/TR/TI-03-11.nfc_survey_03.pdf

Another way to address the trusted setup with the RSA schemes is to use a UFO, but this means you end up with a field tens of KB in size just to get 80 bit security.


Thanks for the link.  But it looks like you need 3000 bits for security in any scheme.  And I think the CPU overhead especially for updating the witnesses is to high in any of them; you need roughly one group operation per bit in the transaction outputs.

Quote
Quote
Yes, this would be using Merkle trees, right?  But if my math is correct, the witness proofs it requires are larger than the transactions.  I'm not sure if ordering the tree is necessary (and you would then need to rebalance it) or do you just propose to put the elements in the order they are inserted?  Instead of zeroing out the transactions one could also put the new outputs in the freed spaces to avoid the trees from growing too deep or becoming unbalanced, but still the proofs are quite large.
A hashtree constructed out of 32 byte hashes with 4 billion entries would be 32 levels deep-- so the size compares very favorably with an RSA accumulator + witnesses with equivalent security.

A significant portion of the proofs are also the first couple levels of the tree, which could be cached to avoid retransmitting them-- and such caching could be negotiated on a peer to peer basis.   For example, if nodes cached the first 16 levels of the tree (2 MBs of data) a tree membership/update proof for a 2^32 tree would be 512 bytes.


I did not realize that caching of 2MB data would cut the space overhead for the witness proofs so much.

But can we take this to the extreme and cache all levels?  The witness would then just be the index in the UTXO tree.  This would be similar to what we currently have, but

  • The UTXOs don't need to be hashed; the index can be directly addressed by an array index or a file offset.
  • If the Merkle root is included in the blockchain, one can give a short proof of existence to SPV clients, that also proves that the UTXO was not yet spent

The UTXO hash tree would be 64 byte per UTXO, 32 bytes for the hash of the UTXO and 32 bytes for the levels above (there are roughly as many inner nodes as leaves in a binary tree).  That is currently 1.2GB. The witness would include the spending transaction to check the signatures.
staff
Activity: 4284
Merit: 8808
Deletion is possible in this scheme.  It requires running the extended gcd algorithm (should be cheap) and two exponentiations for every participant to update their witnesses.  Deleting from the accumulator means just replacing it by the witness.
I'll have to work through it then. I wasn't aware of how one could remove an n-th root without knowing the group order, which seemed to be what prior schemes required. Hm. Maybe the fact that the value being removed can be public is what makes it work.

Quote
Yes, this would be using Merkle trees, right?  But if my math is correct, the witness proofs it requires are larger than the transactions.  I'm not sure if ordering the tree is necessary (and you would then need to rebalance it) or do you just propose to put the elements in the order they are inserted?  Instead of zeroing out the transactions one could also put the new outputs in the freed spaces to avoid the trees from growing too deep or becoming unbalanced, but still the proofs are quite large.
A hashtree constructed out of 32 byte hashes with 4 billion entries would be 32 levels deep-- so the size compares very favorably with an RSA accumulator + witnesses with equivalent security.

A significant portion of the proofs are also the first couple levels of the tree, which could be cached to avoid retransmitting them-- and such caching could be negotiated on a peer to peer basis.   For example, if nodes cached the first 16 levels of the tree (2 MBs of data) a tree membership/update proof for a 2^32 tree would be 512 bytes.

No balancing is required, which is why I mentioned insertion ordered.  You can incrementally build an insertion ordered tree with log(n) memory (you just remember the leading edge of the tree, and as levels fill you ripple up to the root).  A search tree can be used instead, but it would require proofs for both outputs and inputs.

If you don't think 2^32 coins is enough, you can pick numbers arbitrarily large... e.g. 2^51 entries, with a 500MB cache, requires 928 byte proofs.  Because of the log factor  the cost is _essentially_ a constant in a physically limited world, so its just a question of if that constant is low enough.

Quote
I don't see why Merkle Trees are smaller than the RSA accumulator, but probably you assumed that deletion is not possible and you need a list of spent TXO.  I think you just need the accumulator (fixed size, about 400 bytes) and as witness proof another value with all inputs deleted per block.   Per transaction you also need the witness for all inputs (another 400 bytes), but this doesn't have to be stored.  With Merkle-Trees you need a single hash for the accumulator, but the witness proofs of the transactions need also be included to check the updates and merging them doesn't save much.  Even assuming a depth of only 24 (16 million UTXOs; we currently have more) you need 768 bytes for a single witness of a single transaction input assuming 256 bit hashes.
See my figures above.  The zerocoin membership witnesses are about 40kbytes in size for 80 bit security, so I may have been overestimating the size of the accumulator updates... but even if we assume that they're 400 bytes then that puts them into the same ballpark as the TXO proofs with top of the tree caching.  And, of course, verifying hundreds of thousands per second on a conventional CPU is not a big deal...

Quote
As you wrote, instead of RSA you just need a group of unknown order.  But "unknown order" means that it is infeasible to compute the order.  Are there any other group besides RSA with this property?  AFAIK for elliptic curves it is expensive but not infeasible to compute the order.
For EC the order computation is quartic in the size of the field (via the SEA algorithm) unless the curve has special structure which allows it to be computed faster; so it's not really possible to get a reasonably efficient EC group where the order is cryptographically infeasible to compute.  But class groups of imaginary quadratic orders are believed strong in this respect, but there hasn't been that much work or analysis for cryptosystems based on ideals; so I'm unsure of how useful they'd be... https://www.cdc.informatik.tu-darmstadt.de/reports/TR/TI-03-11.nfc_survey_03.pdf

Another way to address the trusted setup with the RSA schemes is to use a UFO, but this means you end up with a field tens of KB in size just to get 80 bit security.
full member
Activity: 217
Merit: 259
AFAIK These accumulator schemes require a spent coins list that grows forever; as they require a trusted party trapdoor to efficiently delete, so you would need to keep a linear database to prevent double spends. Or have you found a scheme with an efficient trustless delete?

Deletion is possible in this scheme.  It requires running the extended gcd algorithm (should be cheap) and two exponentiations for every participant to update their witnesses.  Deleting from the accumulator means just replacing it by the witness.

But yes, the main problem is the CPU overhead to keep track of the updates.

Quote
Though you actually do not need fancy number-theoretic cryptography: as has been proposed previously,  if the utxos are stored in an insertion ordered hash tree, appending to the end requires a log sized proof (just the leading edge of the tree), showing membership requires a log sized proof. Membership proofs can by updated cheaply by observing other updates that intersect with your proofs. And spending requires just zeroing out a member which can be done with the same proof path as was used to show membership.

So you can imagine a world with stateless miners and full nodes, and wallets either tracking their own coin proofs or outsourcing that work to archive nodes that help them form the proofs needed for their transactions.

Yes, this would be using Merkle trees, right?  But if my math is correct, the witness proofs it requires are larger than the transactions.  I'm not sure if ordering the tree is necessary (and you would then need to rebalance it) or do you just propose to put the elements in the order they are inserted?  Instead of zeroing out the transactions one could also put the new outputs in the freed spaces to avoid the trees from growing too deep or becoming unbalanced, but still the proofs are quite large.

Quote
If you go out and compute the concrete sizes for the above scheme you'll find two things: the results even for gigantic databases are _much_ smaller than RSA accumulator approaches even though it scales with a log() due to constant factors (and of course its must less cpu intensive too), but the bandwidth required for processing transactions/blocks is increased massively.  So it isn't clear that it would actually be a win as bandwidth tends to be the more scarce and slower growing resource.

I don't see why Merkle Trees are smaller than the RSA accumulator, but probably you assumed that deletion is not possible and you need a list of spent TXO.  I think you just need the accumulator (fixed size, about 400 bytes) and as witness proof another value with all inputs deleted per block.   Per transaction you also need the witness for all inputs (another 400 bytes), but this doesn't have to be stored.  With Merkle-Trees you need a single hash for the accumulator, but the witness proofs of the transactions need also be included to check the updates and merging them doesn't save much.  Even assuming a depth of only 24 (16 million UTXOs; we currently have more) you need 768 bytes for a single witness of a single transaction input assuming 256 bit hashes.

CPU is a problem, though, and I don't see that one can realistically go to 1000 transactions per second with an RSA accumulator.  Even with the current block chain it would probably take several days to compute a witness for an output that comes early in the block chain.

So the summary is: hash trees need to much bandwidth to be useful and RSA-based accumulators too much CPU and have the problem of the trustless setup.

As you wrote, instead of RSA you just need a group of unknown order.  But "unknown order" means that it is infeasible to compute the order.  Are there any other group besides RSA with this property?  AFAIK for elliptic curves it is expensive but not infeasible to compute the order.
staff
Activity: 4284
Merit: 8808
AFAIK These accumulator schemes require a spent coins list that grows forever; as they require a trusted party trapdoor to efficiently delete, so you would need to keep a linear database to prevent double spends. Or have you found a scheme with an efficient trustless delete?

RSA based accumulators also require trusted setup generally, violation of the trusted setup lets you make false membership proofs.  (The same protocols could be applied to non-trapdoor groups of unknown order; but the performance and security are much more questionable.)

So I'm not seeing how this can help with UTXO.

Though you actually do not need fancy number-theoretic cryptography: as has been proposed previously,  if the utxos are stored in an insertion ordered hash tree, appending to the end requires a log sized proof (just the leading edge of the tree), showing membership requires a log sized proof. Membership proofs can by updated cheaply by observing other updates that intersect with your proofs. And spending requires just zeroing out a member which can be done with the same proof path as was used to show membership.

So you can imagine a world with stateless miners and full nodes, and wallets either tracking their own coin proofs or outsourcing that work to archive nodes that help them form the proofs needed for their transactions.

If you go out and compute the concrete sizes for the above scheme you'll find two things: the results even for gigantic databases are _much_ smaller than RSA accumulator approaches even though it scales with a log() due to constant factors (and of course its must less cpu intensive too), but the bandwidth required for processing transactions/blocks is increased massively.  So it isn't clear that it would actually be a win as bandwidth tends to be the more scarce and slower growing resource.
newbie
Activity: 13
Merit: 0
I'm not aware of any possible solutions to this. I imagine it would become very convoluted and complex for the users participating in such a system while leaving vulnerabilities for attackers open. An attacker with sufficient computing power could really do some damage to such a system, no? I remember a discussion in 2013? on the dev-mailing list discussing a similar problem with utxos like this way.
full member
Activity: 217
Merit: 259
I read the zerocoin white paper again and while trying to understand the dynamic accumulator [1] used their I wondered if this can be used to tackle the UTXO problem.  I wonder if anyone has already proposed this idea (probably, it seems obvious to me) and what the problems are.  My idea has nothing to do with Zerocoin, so it isn't a solution to remove the pseudonymity of Bitcoin.

A dynamic accumulator is a way to compactly store a nearly unlimited amount of data, e.g., active UTXOs in a single number, e.g., a 3072 bit number.  In principle the whole blockchain can be stored in this number and a miner can check if a transaction only spends UTXOs using just this single trusted number and some untrusted transaction data that the sender of the transaction could provide.  It sounds a bit like magic but it actually works.

The drawback is that the everyone possessing an UTXO needs to keep track of a witness (another 3072 bit number) that proves that the accumulator contains his UTXO.  The witness can be computed from the list of all transactions that occurred between the UTXO and the transaction that spends it. We probably still need nodes that store the whole blockchain, so that users can find their UTXO and compute the witness from the transactions that follow it.  The witness can be computed incrementally, so if you update your client once a week it can update the witness for all its UTXOs by just reading the list of the transactions that happened this week.

Another drawback is the computation time.  Updating the accumulator is computationally involved, probably much more than checking a ECDSA signature and it has to be done for every input and every output in every transaction to check that a block is valid (in addition to the usual checks that the signatures in the transactions are correct).  Worse this has to be repeated by everyone who wants to update his wallet using a full client.  In fact, updating the witness is a bit more complex than checking the accumulator.  Clients that rely on a server to retrieve their UTXO and corresponding witness are still possible but the work is then just shifted to the service provider.  Basically the service has to update the witness for every user separately.

The security of the dynamic accumulators in [1] relies on the security of RSA (more precisely the strong RSA assumption).  A key of 3072 bits should give us about the security of 256 bit ECDSA.  However, if someone breaks this key he can mint coins.  In principle this can be detected if someone still has the full blockchain since the genesis block, but the purpose of this scheme is to avoid checking the full blockchain.  Another problem is to generate the RSA key in such a way that the inventor is not suspected of keeping the private key to mint his own coins later.  It seems that all published accumulators are based on RSA (or on Merkle Trees, but these requires much larger witnesses).

Of course, something like this requires at least a hard fork, more probably an altcoin.

[1] J. Camenisch and A. Lysyanskaya, Dynamic Accumulators and Applications to Efficient Revocation of Anonymous Credentials, CRYPTO 2002
Jump to: