Pages:
Author

Topic: Ultimate blockchain compression w/ trust-free lite nodes - page 16. (Read 87932 times)

donator
Activity: 980
Merit: 1004
felonious vagrancy, personified
If each challenge consists of the thin client picking the log(n)-long path and the server replying with the block headers along that path, it doesn't take many challenges or much time/bandwidth/space to drive the probability-of-being-snookered down to something well below the probability of your hardware randomly failing.

By the way, since these servers are full clients they could charge thin clients some tiny amount like 0.00000001 BTC per challenge, giving people an incentive to run full-chain clients on machines with big disks.

Thin clients should use whichever server they are able to contact that has failed the fewest challenges (an honest server will only fail challenges if there is a hostile block at the top of the chain).  Bad guys could run a server without a blockchain that just spews back junk answers, but they'd only get one challenge per thin client and then be promptly ignored -- not enough money to be worth the trouble.
donator
Activity: 980
Merit: 1004
felonious vagrancy, personified
How will the lightweight node know the path hashes are incorrect and to reject that part of a block?

All of these "links" are hashes -- they're self-validating just like the previous-block (prevblock) hash that Bitcoin currently uses.  These are just "extra prevblock pointers", but they're in the coinbase instead of in the headers and they point to an arbitrary ancestor instead of the immediate predecessor.  I'll call these new pointers non-prevblock pointers to distinguish them from the pointers we already have.


I can see how the thick clients can verify this, but an incorrect path would require all of the thick clients to reject the block if it contained a false tree.

I definitely do not propose adding any new block validity criteria.  You're right: hostile miners can add blocks with broken non-prevblock pointers.


Otherwise, we'd have ignorant thick clients forwarding malicously crafted "tree transactions" to the light clients.

If a hostile miner gets a block with broken non-prevblock pointers into the chain, it will amount to -- at worst -- a DOS attack on thin clients until the next block is found by friendly miners.  As long as the hostile block is at the top of the chain, all thin clients will believe that all servers are lying to them.  But they won't be compromised -- they'll simply twiddle their thumbs saying "Nobody has been able to convince me of what reality ought to look like".

As soon as the next {non-prevblock-pointer}-aware miner finds a block, they will add a block which refrains from creating any new paths through the malicious block.  Thin clients resume operation as normal.  In effect, blocks with broken non-prevblock pointers get excluded from the "DAG-within-the-blockchain".

Thin clients which were connected before the malicious block arrived won't be affected.

There are ways to make the cost of the DOS attack above very large (like >99% hashpower) but they add complexity.  


I think this means another protocol decision point like P2SH.

Definitely not!  I believe the trustless thin client problem can be solved without another hardfork (or even miners-only fork).


I also don't understand why you think these could be added without the "51%."

Right, so, suppose I control 1% of the network hashpower.  That means I create 1% of the blocks.  If the radix of the non-prevblock-pointer tree is, say, 200 (i.e. each block can have 200 non-prevblock pointers) that means that my block will be reducing the average path length in the tree more rapidly than the other 99% of the network is adding new blocks.  So the average path length will gradually converge to log(height), even if only 1% of the miners are adding the new pointers -- it's just that each block they add has to include more than 100 new pointers.  Of course, the more miners you have the more rapidly we get to the ideal log(height) situation…
sr. member
Activity: 966
Merit: 311

This is a subtle but important point.  A while back I wrote a section about it on the bitcoin wiki.  The Satoshi client never uses "number of blocks deep" as a measure of confidence that a transaction is valid.

A truly trustless thin client needs to be able to be able to verify a recent block's height (that there really are 180,000 blocks before this one and they obey the max-difficulty adjustment rules) rather than its depth (that there really were 6 blocks built on top of it -- also known as confirmations).

It's possible to do height verification probabilistically without 2GB of disk+download, but you need more than a tree-of-unspent-transactions to do it -- the block chain has to look more like a tree giving you multiple hash-secured O(log n)-long paths from any block to the genesis block).  These "shortcut ancestor links" can be added without 51% agreement.

How will the lightweight node know the path hashes are incorrect and to reject that part of a block?

I can see how the thick clients can verify this, but an incorrect path would require all of the thick clients to reject the block if it contained a false path. I think this means another protocol decision point like P2SH.  Otherwise, we'd have ignorant thick clients forwarding malicously crafted "tree transactions" to the light clients.

This isn't meant to be a gotcha. I like your idea (a lot), but I'd like some clarification. I've only started looking into this part of Bitcoin and I'm not a very big data structures guy.

I also don't understand why you think these could be added without the "51%."
donator
Activity: 980
Merit: 1004
felonious vagrancy, personified
Trustless lightweight-node support

It doesn't seem trustless to me. Lightweight nodes (not storing all unspent outputs) can't know whether a block is valid, so they need to trust the majority of the network's mining power. This is no more secure than SPV, though possibly a little easier for lightweight nodes.

This is a subtle but important point.  A while back I wrote a section about it on the bitcoin wiki.  The Satoshi client never uses "number of blocks deep" as a measure of confidence that a transaction is valid.  Depth is used only as a measure of the likelihood of another, longer chain branch emerging that omits the transaction.

A truly trustless thin client needs to be able to be able to verify a recent block's height (that there really are 180,000 blocks before this one and they obey the max-4x-difficulty-adjustment rule) rather than its depth (that there really were 6 blocks built on top of it -- also known as confirmations).

It's possible to do height verification probabilistically without 2GB of disk+download, but you need more than a tree-of-unspent-transactions to do it -- the block chain has to look more like an upside-down tree or DAG (most recent block is the root) giving you multiple hash-secured O(log n)-long paths from any block to the genesis block.  These "shortcut ancestor links" can be added without 51% agreement.  If each challenge consists of the thin client picking the log(n)-long path and the server replying with the block headers along that path, it doesn't take many challenges or much time/bandwidth/space to drive the probability-of-being-snookered down to something well below the probability of your hardware randomly failing.  Once you have block height verification you can be truly trustless -- or, at least as trustless as the Satoshi client.
sr. member
Activity: 269
Merit: 250
vip
Activity: 1386
Merit: 1140
The Casascius 1oz 10BTC Silver Round (w/ Gold B)
Instead of saying your idea is wrong, maybe I can contribute to it.

One of the things gmaxwell pointed out was that mining pools may be around now but there is no guarantee they will be around later, in fact he thinks they probably won't. So depending on them as fundamental architecture is probably a bad idea all around.

But imagine miners (both solo and pools) included a IP:Port calling card in the coinbase of their block. The calling card would convey the message: I am a miner, you can contact me via udp directly at (ip:port), send me your transaction, and if it looks valid, I will give you a signed promise (signed by coinbase key) that I accept and plan to confirm this transaction.

One would know what percentage of the mining pool any given calling card represents just by the number of recent blocks containing it.

Someone wanting a miner commitment on a transaction would blast out that transaction via UDP to all of the miners whose calling cards appear in the last 1000 blocks.  That sounds extreme, but we're only talking a few hundred kilobytes total, with the total going to each miner being under 1-2KB.

By using UDP instead of TCP, one could blindly blast out a bunch of simultaneous requests into the internet on a "best effort" basis with low overhead, knowing most of them will arrive and many won't and that that's OK.  The responses would arrive the same way.

Either the sender or the recipient of a transaction could immediately contact thousands of miners with a blast of udp packets and rapidly get an accurate feel for how much mining power supporting the transaction has just by gauging the udp responses that come back within the following 10 seconds.

If it is a supermajority you have success.

Such could be the standard practice for accepting zero conf transactions.  It could be an excellent revenue generator for the mining community as a whole in the form of a for-pay service (example, all miners could stipulate that this UDP confirmation service is only available if transaction fee [in the transaction being zero-confirmed] is meets a far more generous criterion than the satoshi client minimum)

To address gmaxwell's rightfully placed fear that I could pay "you" and then use the premium service to pay the doublespend to myself... if getting zero-conf is a fee-based service paid by the payer, then "you" could demand, as a condition of giving me goods with zero conf, that I include a fee big enough to ensure that you can click a button in your client and enjoy the confirmation service yourself, prepaid.
sr. member
Activity: 269
Merit: 250
I posted an idea about how to make 0-conf tx safe. There was some discussion, and it wasn't killed so far, but there wasn't much of an interest either, so I am starting to believe that it's not something demanded.

I think when one of the core developers immediately points out how it can be used to reliably defraud somebody, that makes it pretty much DOA.  But I would propose that there is indeed demand for a reliable zero-confirmation transaction.

I believe he was wrong, at least no one objected when I explained why it is not a problem, a basic double spend check from a pool completely solves it. And if you are agree with him could you elaborate your opinion here or in the thread? The good thing about the idea is that it has the same property as etotheipi's proposal of how non-disruptive it is. It doesn't require any changes in Bitcoin, just collaboration between 2 or 3 major pools.

I pay you normally without using this system, then use the pool-signed txn to pay myself in a doublespend.  It would make attacks very reliable.
A pool must check if there is any conflicting transaction already present and if true then refuse to sign the multi-signature transaction.
legendary
Activity: 1428
Merit: 1093
Core Armory Developer
I posted an idea about how to make 0-conf tx safe. There was some discussion, and it wasn't killed so far, but there wasn't much of an interest either, so I am starting to believe that it's not something demanded.

I think when one of the core developers immediately points out how it can be used to reliably defraud somebody, that makes it pretty much DOA.  But I would propose that there is indeed demand for a reliable zero-confirmation transaction.

I hate bringing up zero-conf tx in general, because there are so many issues with them that they are useful only in isolated cases, usually partial-trust situations.  I guess, this shouldn't be seen as a driving force for implementing this proposal, but more seen as an example of how much verifiable information can be obtained from the network with minimal download.  It's fast enough that a light node could obtain as much information about a zero-conf tx as a full-node, with just a few kB downloaded.  Regardless of how that information is used, it's a huge functional advantage from where we are currently.
vip
Activity: 1386
Merit: 1140
The Casascius 1oz 10BTC Silver Round (w/ Gold B)
I posted an idea about how to make 0-conf tx safe. There was some discussion, and it wasn't killed so far, but there wasn't much of an interest either, so I am starting to believe that it's not something demanded.

I think when one of the core developers immediately points out how it can be used to reliably defraud somebody, that makes it pretty much DOA.  But I would propose that there is indeed demand for a reliable zero-confirmation transaction.
sr. member
Activity: 269
Merit: 250
If someone gives me a zero-conf tx, I can check not only that its inputs exist, but that the inputs haven't been spent yet.  Zero-conf tx should not really be trusted, anyway... but at least you can verify whether it's even possible for the network to accept it, which what a full node does.
I posted an idea about how to make 0-conf tx safe. There was some discussion, and the idea wasn't killed so far, but there wasn't much of an interest either, so I am starting to believe that there is no need in 0-conf tx.
legendary
Activity: 1428
Merit: 1093
Core Armory Developer
The number one problem that this solves (besides blockchain pruning) is one of trust.  When a new node gets on the network right now, there are two options:

(1) Run a full node, with full blockchain, ultimate security -- verify everything in the blockchain yourself
(2) Run a lightweight node, with reduced level of security -- trust that someone else gave you your own correct balance, and have no way to check whether transactions are valid, especially zero-conf tx

With this meta-chain in place, you can run (1) for a lot less disk space, and (2; lightweight nodes) can achieve the ultimate security model without needing to hold the full chain.  I can verifiably import my own wallet knowing nothing but the headers and verify it directly against the blockchain.  If someone gives me a zero-conf tx, I can check not only that its inputs exist, but that the inputs haven't been spent yet.  Zero-conf tx should not really be trusted, anyway... but at least you can verify whether it's even possible for the network to accept it, which what a full node does.

legendary
Activity: 905
Merit: 1012
Maybe this will help: the trouble is when Satoshi released Bitcoin 0.1 and the whitepaper, he came up with this idea for a version of the client that only kept block headers and skimmed for transactions it cares about. It's become known as SPV--Simple Payment Verification--or a “lightweight client” and has a weaker trust model than a full verifying client.

We are not discussing that in this thread. What's going on here is the creation of a full-featured “thick client” which doesn't require the entire block chain history and could conceivably be run even on low-memory and embedded devices as bitcoin scales up. You can have your cake and eat it too.
newbie
Activity: 15
Merit: 0
I'm not talking about the inputs to the transaction that pays me, I'm talking about the transaction that pays me itself. Fred posts a transaction  which he says pays me 100 bitcoins. If I have the digested tree, or the whole block chain, I can check the transaction is valid. If I don't, I can't. But either way, it's still unverified. I'm going to wait for confirmations. Once I have confirmations, then I know it's valid too, because if I'm trusting the miners not to include fictional unspent txout digests I might just as well trust them not to include invalid transactions.

The problem this mechanism solves - validating an unverified transaction in a lightweight node - doesn't look to me like a very important one.


Sure, if you aren't mining, like spam, and don't see any value in reliably knowing you have received funds from Fred in under an hour, since you can't tell the difference between a bogus transaction from Fred with fake inputs and a real one.  You might be willing and able to wait 6 confirmations before deciding others have paid you, but others won't.

Under this proposal, miners can reliably mine using these trees and not the block chain.  If you think all miners will want to lug around a block chain whose size tends closer to infinity with each person who starts running a gambling bot, then sure, this isn't important.

Being able to validate a transaction instantly is important for spam prevention.  Nodes only relay valid transactions.  If you can't validate transactions, you have no choice but to blindly spew to your peers anything any of them sends.  You'll be a sitting duck for DoS attacks (since for every 1 message coming in you'll nominally send 7 out), and a whole network made of nodes like this would be easy to spam into oblivion.

Finally, this tree proposal isn't meant to RUN on a lightweight node.  It is meant to make a normal node be able to SERVE another lightweight node, at the same time not having to have the full unabridged block chain.


So the scenario in which this helps is where

(a) transaction volume is so high that even miners running fancy purpose-built mining rigs can't store the transaction history on a standard-issue 1TB hard drive
(b) every Tom, Dick and Harry runs a lightweight node which relays every single transaction on a P2P network.

Those two conditions contradict each other.  If transaction rate goes up that high (and I think it shouldn't, but that's an entirely different discussion), bandwidth becomes the limiting factor before storage space does. At that transaction rate, inevitably the bitcoin network evolves to a backbone of heavy nodes exchanging everything and lightweight clients which consume only data of interest to them. That's quite independent of how the history is handled.

As to unconfirmed transactions, are there really going to be that many people who will accept an unconfirmed transaction, but not be willing to trust anyone to validate it for them?
vip
Activity: 1386
Merit: 1140
The Casascius 1oz 10BTC Silver Round (w/ Gold B)
I'm not talking about the inputs to the transaction that pays me, I'm talking about the transaction that pays me itself. Fred posts a transaction  which he says pays me 100 bitcoins. If I have the digested tree, or the whole block chain, I can check the transaction is valid. If I don't, I can't. But either way, it's still unverified. I'm going to wait for confirmations. Once I have confirmations, then I know it's valid too, because if I'm trusting the miners not to include fictional unspent txout digests I might just as well trust them not to include invalid transactions.

The problem this mechanism solves - validating an unverified transaction in a lightweight node - doesn't look to me like a very important one.


Sure, if you aren't mining, like spam, and don't see any value in reliably knowing you have received funds from Fred in under an hour, since you can't tell the difference between a bogus transaction from Fred with fake inputs and a real one.  You might be willing and able to wait 6 confirmations before deciding others have paid you, but others won't.

Under this proposal, miners can reliably mine using these trees and not the block chain.  If you think all miners will want to lug around a block chain whose size tends closer to infinity with each person who starts running a gambling bot, then sure, this isn't important.

Being able to validate a transaction instantly is important for spam prevention.  Nodes only relay valid transactions.  If you can't validate transactions, you have no choice but to blindly spew to your peers anything any of them sends.  You'll be a sitting duck for DoS attacks (since for every 1 message coming in you'll nominally send 7 out), and a whole network made of nodes like this would be easy to spam into oblivion.

Finally, this tree proposal isn't meant to RUN on a lightweight node.  It is meant to make a normal node be able to SERVE another lightweight node, at the same time not having to have the full unabridged block chain.
newbie
Activity: 15
Merit: 0
Sorry to be slow, but I don't see the gain here.  If a lightweight client is going to trust that a metablock that's been merged into the chain is truthful (because it's been built into a block), then it can just as reliably trust that a transaction that's in the chain a few blocks back is valid, because it's been built into a block.


That works if, as a lightweight node, you plan only on receiving funds that have a very small number of confirmations, which eliminates your view of the majority of bitcoins that exist.  In USD terms, this would be like limiting yourself to only being able to accept crisp dollar bills that have never been handled more than once or twice.  More likely than not, you're going to need to be able to receive funds from anybody, which will have been confirmed anywhere on the block chain between the genesis block and now.  You either need the whole block chain to know whether a given incoming transaction is valid, or at least the digested tree of all unspent txouts for the entire block chain.

I'm not talking about the inputs to the transaction that pays me, I'm talking about the transaction that pays me itself. Fred posts a transaction  which he says pays me 100 bitcoins. If I have the digested tree, or the whole block chain, I can check the transaction is valid. If I don't, I can't. But either way, it's still unverified. I'm going to wait for confirmations. Once I have confirmations, then I know it's valid too, because if I'm trusting the miners not to include fictional unspent txout digests I might just as well trust them not to include invalid transactions.

The problem this mechanism solves - validating an unverified transaction in a lightweight node - doesn't look to me like a very important one.
legendary
Activity: 2128
Merit: 1073
why would you ever need to rebuild the tree?
Probably instead of "rebuilding the tree" the better phrase whould be "rehashing the nodes on the update path in the tree". The rehashing of many short strings would completely dominate the cost of the update (insertion/deletion). This is one of those things that the O() notation doesn't convey.

The benefits of a short-and-fat data structure over a tall-and-lean are four-fold:

1) less rehash overhead after update
2) less latency sensitivity when queried level-wise over the p2p network (at the expense of wasted bandwidth)
3) lower write amplification if this structure is stored locally in the block-storage device
4) ability to batch-update the nodes incurring single rehash overhead for multiple inserts/deletions.

Ultmately from the point of long-term code stability it would be better to choose a more generic structure instead of 2-3-4-tree. If we choose N-to-M-tree we could set the N and M values based on the experimentation now and possibly easily change them in the future if experience shows that our initial guess was not so good.
legendary
Activity: 1428
Merit: 1093
Core Armory Developer
If you are using a self-balancing structure then it stays balanced “for free”. And under what circumstance would you have all the transaction data, but not an existing tree structure or the block chain from which you can order the updates?

If you have a self-balancing structure, you may not have a merkle tree, ...

Any of these can be made into an authentication data structure.  Each node, including non-leaf nodes may represent data, so you just append this node's data, to the hashes of the non-null children and hash that to get the current nodes value.  Then it's parent nodes do the same to agregate their children.

I haven't defined it rigorously, but it can be done.  One issue with a 256-way Patricia/radix tree is that each level will need the values of the other 255 children in order to verify each level.  Granted, it only matters at the top levels where there's a lot of branching, beyond levels 4 and 5 basically all other children pointers will be null.  But it goes along with why I'm hesitant to endorse a patricia tree: there might be a lot of data to transfer to very a node.


vip
Activity: 1386
Merit: 1140
The Casascius 1oz 10BTC Silver Round (w/ Gold B)
If you are using a self-balancing structure then it stays balanced “for free”. And under what circumstance would you have all the transaction data, but not an existing tree structure or the block chain from which you can order the updates?

If you have a self-balancing structure, you may not have a merkle tree, which is what you'd want as a node so you can serve lookups to lite clients with no block chain that they can determine with 100% certainty whether you're telling the truth given just the merkle root.  What you have instead with all these neat tree ideas is an index to speed up database ops - which would be great if we're building a database engine - but the title of the OP is "trust-free lite nodes".  I am not sure that by enumerating all of these other wonderful tree structures that we are remembering we need the cryptographic properties that a merkle tree offers to accomplish the stated goal.

Assuming you had a copy of such a tree... the circumstance one would have possession of it, is one as a node would have acquired it from a peer as a complete and total substitute for the block chain (other than the block headers and perhaps a few days of recent blocks).  The whole point of this exercise is to ditch the storage requirement of spent transactions from the block chain for the purpose of scaling Bitcoin and dealing with the fact that the block chain will soon be too heavy to lift - not so much to shave milliseconds off index lookups.
legendary
Activity: 905
Merit: 1012
Quote from: etotheipi
On the other hand, I was hoping for a structure that wasn't too complicated, and both RB trees and Patricia tries have complicated implementations (even though the concepts behind them are fairly simple).  But if we're going to have to go with something complicated, anyway (to limit worst-case speed and time performance), then I'd have to vote for Patricia trie or variant.  Not only is it O(1)... someone brought up the very good point that updates to the tree can mostly be parallelized.  That sounds like another good property of a tree that's going to have very high update rates...
The 2-3-4 tree (aka B-tree of order 4) is really the only one I would consider in this circumstance. RB-trees are actually a special representation of 2-3-4 trees, but the implementation choices in balancing an RB-tree don't exist for 2-3-4 trees (for a given 2-3-4 tree there can exist more than one RB-trees that “encodes” that structure, but not vice versa). A higher order B-tree would also work, but then you would be trading increase CPU time for decreased I/O, which doesn't fit this application.

That said, the ability to parallelize prefix/radix tries is a very good point. You might win me over to that side yet... but if self-balancing trees are to be used, the 2-3-4 tree has some clear benefits over others (KVL, RB, etc.).


To the other posts... why would you ever need to rebuild the tree? I don't understand the purpose. If you are using a self-balancing structure then it stays balanced “for free”. And under what circumstance would you have all the transaction data, but not an existing tree structure or the block chain from which you can order the updates?
vip
Activity: 1386
Merit: 1140
The Casascius 1oz 10BTC Silver Round (w/ Gold B)
I just wanted to confirm that we understand that when the network "retargets", or the block reward halves, it isn't actually doing anything computationally intensive.  All that happens is that the client expects to see a different number in a field in future blocks as compared to past blocks.

Rebuilding a tree isn't too computationally intensive to have happen once every 3.5 hours (which is what 21 blocks would suggest - and is also a relatively fixed interval).  All that matters is that we know it is too computationally intensive to happen per transaction (which is multiple times per minute and tends to infinity).  I can't imagine picking this number is premature, as whether the magic number is 1, 3, 7, 21, or 420 blocks, all choices accomplish the goal of not pegging users CPU's to 100% chugging through this tree.
Pages:
Jump to: