The High-Value-Hash Highway - page 3.

socrates1024

full member

Activity: 126

Merit: 110

Andrew Miller

Quote from: kjj on August 07, 2012, 02:20:16 PM

Notice that building a couple of phony blocks that look legit when viewed in this chain is trivial compared to building a couple of phony blocks to fool the real chain. And by trivial, I mean billions and billions of times easier.

Good question! That's exactly the concern that motivated me to work on this problem. If a chain is valid, you will eventually process it entirely. However if a chain is work-deficient, like you described, then you can reject it early. Think of it this way: it's a validation procedure that has zero false positives, but supports probabilistic false-negatives from early rejection.

kjj

legendary

Activity: 1302

Merit: 1026

Ok, I get it now. It lets you jump backwards through the chain, leaping over portions until you get back to the first block that had this feature enabled.

But what's the point?

You can't trust this chain unless you've verified all of the blocks the hard way. And if you've done that, why do you care about following these links back?

Notice that building a couple of phony blocks that look legit when viewed in this chain is trivial compared to building a couple of phony blocks to fool the real chain. And by trivial, I mean billions and billions of times easier.

socrates1024

full member

Activity: 126

Merit: 110

Andrew Miller

Quote from: TheBitcoinChemist on August 07, 2012, 02:10:34 PM

That's true, which I believe is only another 4 bytes added onto the existing 80 byte header; but as I said, it doesn't really solve the bootstrapping problem itself. The 'light' clients still have to download entire blocks with inputs matching their addresses, whenever the network is upgraded to allow specific block requests. And it's not like just downloading the 80 byte headers is a significant burden, as all the headers to date would be less than 16 Megs. So adding 4 bytes to every header wouldn't be a burden to any full clients, but nor would it be particularly useful for them, and it would increase the blockchain headers total size by 5% for qustionable gains even for light clients that keep only headers. And if these blockchain header type light clients have to trust full nodes to deliver them blocks matching their addresses anyway, and they do, why can't those same full nodes provide the data these light clients need to be certain they are using the longest chain outside of the blockchain? 4 bytes is still 4 more bytes, replicated onto every full or header-only blockchain in the world, forever & ever.

Light clients will not need to download blocks with inputs matching their address, let alone trust a full node to do it correctly. See etotheipi's proposal for zero-trust light clients using merkle trees. https://bitcointalksearch.org/topic/ultimate-blockchain-compression-w-trust-free-lite-nodes-88208 The only remaining challenge for a lite client is (correctly) selecting the largest valid chain, which is what I am discussing.

TheBitcoinChemist

member

Activity: 70

Merit: 10

Quote from: socrates1024 on August 07, 2012, 01:59:18 PM

Quote from: TheBitcoinChemist on August 07, 2012, 01:54:55 PM

I question whether a blockchain, even a merged-mined alt-chain, is necessary to communicate this data. There are significant downsides to including more data than is minimally necessary into any blockchain style structure. It would be better for light clients, since they need open Internet access anyway, to be able to fetch this kind of data from one (or more) of several trusted webservers. Something like a podcast feed, only (perhaps encrypted) data collected by a specialized full node. Any client should be able to verify that the data fed to it is valid in a number of different ways to protect itself from trickery.

Just to be clear, the additional data I am recommending is a single additional hash per block.

That's true, which I believe is only another 4 bytes added onto the existing 80 byte header; but as I said, it doesn't really solve the bootstrapping problem itself. The 'light' clients still have to download entire blocks with inputs matching their addresses, whenever the network is upgraded to allow specific block requests. And it's not like just downloading the 80 byte headers is a significant burden, as all the headers to date would be less than 16 Megs. So adding 4 bytes to every header wouldn't be a burden to any full clients, but nor would it be particularly useful for them, and it would increase the blockchain headers total size by 5% for qustionable gains even for light clients that keep only headers. And if these blockchain header type light clients have to trust full nodes to deliver them blocks matching their addresses anyway, and they do, why can't those same full nodes provide the data these light clients need to be certain they are using the longest chain outside of the blockchain? 4 bytes is still 4 more bytes, replicated onto every full or header-only blockchain in the world, forever & ever.

socrates1024

full member

Activity: 126

Merit: 110

Andrew Miller

Quote from: TheBitcoinChemist on August 07, 2012, 01:54:55 PM

I question whether a blockchain, even a merged-mined alt-chain, is necessary to communicate this data. There are significant downsides to including more data than is minimally necessary into any blockchain style structure. It would be better for light clients, since they need open Internet access anyway, to be able to fetch this kind of data from one (or more) of several trusted webservers. Something like a podcast feed, only (perhaps encrypted) data collected by a specialized full node. Any client should be able to verify that the data fed to it is valid in a number of different ways to protect itself from trickery.

Just to be clear, the additional data I am recommending is a single additional hash per block.

socrates1024

full member

Activity: 126

Merit: 110

Andrew Miller

kjj:

Suppose the back-link from the current head looks like 000xxx (value 3). We also include an 'up'-link to the parent of the most recent block that looks like 0000xxxx (value 4). If you skip to that parent block, then it will have an 'up'-link to the parent of a block that looks like 00000xxxxxx (value 5). And so on, until you get to the parent of the largest block for which there's no up-link.

Suppose you start at that largest block, which looks like 00000xxxxx (value 5). It contains an up-link, either to a previous block with value 00000xxx (value 5), or to one with lower value. You can skip your way to the highest-value hash in every sub-interval of history.

Ben-abuya: right, this pretty much solves the same problem except with O(1) additional storage per block rather than O(log N).

Since you are busy in another thread about variable-difficulty blocks, I'd like to point out my approach to that here, although it will take a few steps to show how it relies on the hash-value-highway to be viable.

The "difficulty" of a block could be set by each individual miner who "bids" on the difficulty they want to get. So a winning block at difficulty 4 would be H(difficulty: 4 zeros || nonce || header) -> 0000xxxxxxxxx. Instead of taking the largest chain, we would take the chain that has the largest sum-difficulty (according to the difficulty bids, not the actual hash value).

There should be a disincentive for miners to spam the network with low-difficulty blocks. Each miner should have stake in the game - the miner himself could place down a bet, effectively a fee for the storage of his block header. If his block is accepted, then he gets the money back. If his fork gets swept up by a higher-difficulty fork, then he loses his bet and the fees he would have earned.

The hash-value-highway makes it efficient to evaluate the sum-difficulty of a fork, and also to merge forks by cherry picking valid transactions. Long chains of low-difficulty blocks will tend to get preempted by higher-difficulty forks. If each transaction includes a 'minimum difficulty', then you'll only be included in the difficult blocks which are less likely to be reordered.

I don't expect this part of the explanation to be totally clear, especially since I haven't elaborated on how a fork is 'swept up' by the main chain, but I will get to that later. But since you've been working on similar ideas, I think you might have the right intuition for how this works.

TheBitcoinChemist

member

Activity: 70

Merit: 10

I question whether a blockchain, even a merged-mined alt-chain, is necessary to communicate this data. There are significant downsides to including more data than is minimally necessary into any blockchain style structure. It would be better for light clients, since they need open Internet access anyway, to be able to fetch this kind of data from one (or more) of several trusted webservers. Something like a podcast feed, only (perhaps encrypted) data collected by a specialized full node. Any client should be able to verify that the data fed to it is valid in a number of different ways to protect itself from trickery.

kjj

legendary

Activity: 1302

Merit: 1026

I am completely missing the point of this.

Block X includes a hash of block (X-1) so that you can be sure that block X follows block (X-1). What is the point of including a hash of block (X-y) where y varies (but in practice will usually be either that one insanely lucky block from a while back or a block about halfway through the last difficulty bump)?

And how does this magically let you jump around the chain?

ben-abuya

sr. member

Activity: 323

Merit: 250

I love this stuff. I was contemplating something similar, where you'd store n=log(block_height) hashes in each header instead of just the last block hash. So in the current block, you'd store the hash for the previous block, the hash for the block before that, the hash 4 blocks back, 8 blocks back, 16 blocks back, all the way to the genesis block. That's about 18 hashes per block currently. This would allow some kind of SPV client to only store log n headers instead of n. I haven't really thought through how this works yet, but it sounds similar to what you proposed. Your idea is more elegant and has some advantages, though.

socrates1024

full member

Activity: 126

Merit: 110

Andrew Miller

Sure, it would also work as an overlay. It's also something an alt-chain could experiment with.

The problem you're talking about where clients need to validate the entire chain in order to see their transactions - that is already solved (conceptually) using merkle trees. Read etotheipi's proposal for that here: https://bitcointalksearch.org/topic/ultimate-blockchain-compression-w-trust-free-lite-nodes-88208 Of course, this only works if the client is able to correctly choose the right 'head'. That's what my current proposal is about.

TheBitcoinChemist

member

Activity: 70

Merit: 10

It's an interesting proposal, but there are only a few types of client that would benefit from this, so to cut this into the main blockchain isn't ideal. Would it still work if an intergrated parrallel chain were produced, say, for a Stratum-like overlay network? I can see how this would be good for part-time/disconnected clients and fresh installs trying to get up into operation quickly, but I'm not sure that it really solves the bootstrapping problem that independent light clients have, which is mostly that they have to parse the blockchain at all, just to determine their own balance.

socrates1024

full member

Activity: 126

Merit: 110

Andrew Miller

Here's the next thing cool thing you can do with a high-value-hash highway: quickly discover exactly where two blockchains meet at a fork, even if the chains are very large and have many difficulty changes.

Quickly skip up to the highest-value-hash in each chain. If the chains share the highest-value-hash, then the fork point is more recent than that block. If one chain contains a higher-value hash than the other, then the fork point is behind that block. You can use a bisection search to find the true fork point.

I suppose I'm not being too clear on what the overarching goal is with this, so let me just say what my motivating principles are:
- The blockchain should eventually converge to a single correct chain, regardless of the network behavior and changes in cpu power over time (except for a persistent 34% adversary, which is unavoidable)
- Global parameters (like 10 minutes) should not need to be hard coded, but instead should automatically adjust according to demand and availability.

I'm going to build the argument that this is all possible using just a single additional back-link per block.

socrates1024

full member

Activity: 126

Merit: 110

Andrew Miller

Lite clients need to process every header of the correct chain. However, if a lite client is presented with several head nodes, only one of which is truly the largest, it would be nice to quickly skip iterating through the rest.

Mike Hearn

legendary

Activity: 1526

Merit: 1136

Maybe I'm missing something obvious here, but lite/SPV clients need to process every block header anyway, it's fundamental to their operation. If a node presents you with a chain that is not the best chain, quickly calculating its difficulty is not particularly important:

1) The chain probably branches from your best chain not too far back anyway, so the practical effort saved is low

2) You need to filter and store any wallet-relevant transactions on the side chain even though it's not the best chain, so if/when it becomes the best chain you can update the wallets data structures

socrates1024

full member

Activity: 126

Merit: 110

Andrew Miller

Here's a simple proposal. Let me define the 'value' of a block as the number of leading zero bits in its hash. For every block, we already include the hash of the previous block. I propose to also include the hash of the most recent block with higher value than the previous.

This data structure could be considered a 'merkle skip list'. A skip list lets you traverse in either of two directions - 'backwards' or 'up'. This would let you quickly follow a chain of hashes up to the highest-value we've ever found.

This is a trivial modification to the current block chain, yet the significance of this data structure is absolutely staggering. I will just describe one application for now.

--- Proposal for "eventually zero-trust" clients ---

How does a lite client know which block chain is the largest? By iterating through the whole chain from front to back? Ick!

Using the high-value-hash highway, you can skip 'up' to the largest hash found in O(log N) effort. That immediately gives you a rough estimate of how much work was performed. You can now branch-and bound to gradually improve your estimate. From the largest block (i.e., the 'top'), skip backwards, and then up to the top again. You're now at the highest-value-hash from the era that preceded the current-highest. You can traverse the 'proof-of-work' content of a chain directly, with the most-informative hashes first and working your way down. You could quickly select the largest out of a pool of potential block chains without having to iterate through them all.

A lite client could easily use the high-value-hash highway to determine the largest chain very quickly and begin using it. The estimate improves as the process continues, and if the lite client eventually gets around to validating the whole thing, then that's great.

It's important to check the work first, and only then to validate. This should apply to the entire block chain as well as to individual blocks.

--- Experiment ---

I can show you a neat trick. What's the highest value for a hash in the current blockchain? Write that number down. Now, count how many blocks in the chain have that value or higher.

You probably counted '1', or at least very close to it.

This trick works regardless of how much time has passed, how fast our computers run, how connected our network is, how the difficulty has adjusted over time, etc. It is completely scale invariant. The overall shape of the hash highway would be roughly scale invariant as well. See also: https://en.wikipedia.org/wiki/Benford's_law

[EDIT] I finally got around to running this experiment, and it turned out I was correct. I also produced visualizations of the "overall shape of the hash highway," which I'm copying here because I think this illustration is very important. Read this post for an explanation.

Topic: The High-Value-Hash Highway - page 3. (Read 9037 times)