Pages:
Author

Topic: The High-Value-Hash Highway (Read 9005 times)

legendary
Activity: 924
Merit: 1132
March 20, 2014, 07:33:24 PM
#55
Heh. You're right that hashes are a uniform distribution, which makes it spectacularly simple to do this trick. 

Coin age destroyed is a Poisson distribution multiplied by total money supply -- which is different but if you normalize by dividing out the money supply (conveniently, meaning divide by block height) you can similar statistical tricks using different formulas.  There are at least two perturbing factors though, which are both disturbances to the pattern and sources of interesting information. 

First, the money supply itself is actually less than proportional to block height, depending on the rate at which bitcoins get lost.   The "coin age destroyed" graph, normalized by the theoretical money supply, would have an overall downward slope that would reveal approximately the rate per year at which bitcoins vanish due to lost keys (or just really persistent hoarders...).

Also the distributions would get narrower as the velocity of bitcoins increase, so the variance of the Poisson distribution would increase or decrease depending on the health of the bitcoin economy as of a particular period. And that would be another interesting thing to study.

full member
Activity: 126
Merit: 110
Andrew Miller
March 20, 2014, 12:52:31 AM
#54
Okay, I think I get it.
 Cheesy

It's also interesting that you could do the same kind of linking on any metric; for example the amount of coin age destroyed by a given block, if that were your measure of "best chain" instead of the number of hashes.
I don't agree with this part. The coin-age destroyed among a set of blocks mined by an adversary does not necessarily follow any particular distribution. However, you are right that there are many possible choices for determining "value" other than distance from 0. You could, for example, interpret the "value" (beyond difficulty) as the *least-significant bit* zeros instead, since conditioned on a mining attempt beating the difficulty target, the distribution of the value in this case is still as it should be.

Anyway since that's a really minor point, here's some more useful technical content. One surprising thing the attacker can do is to create a really *large* proof. The attacker does this by throwing away every block *except* the value=1 blocks, which don't exceed the difficulty target at all. This only requires throwing away 50% of the work, but it results in a proof where the "up" links are always null, and you have to traverse every damn block to validate his proof.

This isn't a problem, for the following reason. You can make progress on validating the SPV proof for multiple chains concurrently. So if one peer tries to send you a really slow proof, you should request different proofs from a different nodes and process them in parallel.

The option of this strategy is what motivates the performance goal I mentioned as a caveat at the end of the writeup. The "honest prover" should, with high probability, produce a proof that is reasonably compact. It's okay if a dishonest prover produces a long proof. Regardless of length, the prover should be unable (except with low probability) to produce a convincing proof that the amount of work done to bury some transaction exceeds the actual amount of work (including the attacker's work plus any honest work) by some fraction.

*** edit ***
Actually, now that I've written this response out, I realize another attack that's modeled in my security definition, but isn't adequately addressed in my scheme. The problem is that an attacker can, by rejecting 50% of his work, cause an honest proof to contain less useful back links. It's important to set the verification parameters loose enough so that even if the attacker does this, he can't cause honest chains to fail verification. But, if the verification parameters are suitably widened, it must also not make the ease of forging a proof much higher. All of this relates to analysis i haven't finished in my writeup, so these are things I need to address. I don't think this should affect the choice of data structure though (only the guidelines for choosing parameters).
legendary
Activity: 924
Merit: 1132
March 19, 2014, 10:57:24 PM
#53
Okay, I think I get it.

A hash with N initial zero bits corresponds to a probability that something close to 2^N total hashes have been performed, because, on average that's how many hashes it takes to get one with N initial zero bits.

Within the same set of hashes, you "expect" to find around two hashes with N-1 initial zero bits.  Each of these corresponds to a probability that something close to 2^(N-1) hashes have been performed.  If you have two of them, that adds up to 2^N total hashes (same as the work presented by the parent node).  If you have only one, though, your estimate is lowered (by half) and if you have five your estimate is increased (multiply by 5/2). 

Each of these hashes with N-1 initial zero bits is then expected to dominate around two hashes with N-2 initial zero bits. Meaning, if you had two at the N-1 level, you expect around four, and if you had five at the N-1 level, you expect around ten.  And each of these N-2 hashes represents around 2^(N-2) total hashes.  If you expected ten and you find seven, you lower your estimate (multiply by 7/10).  If you expected four and you found seven, you'd raise your estimate (multiply by 7/4). 

And so on, so the more of the "high nodes" on this hash tree you process, the better your estimate gets.  The law of large numbers means each "generation" verified downward from the root of the tree will lead to smaller and smaller adjustments to the estimate, and you probably have a very good idea (within 1%) of the total work by the time you've found the ~50 best hashes. 

This procedure can be carried down to the level of difficulty as of the time the block was mined, below which there's no information so you just have to treat your estimate on the individual blocks as the "truth" or as close as you can get.

So this gives you a way of getting a rapid estimate of the total amount of hashing work that went into a particular blockchain, or into the portion of it discovered since the blockchain fork. 

Okay, I see the value there.

It's also interesting that you could do the same kind of linking on any metric; for example the amount of coin age destroyed by a given block, if that were your measure of "best chain" instead of the number of hashes.
full member
Activity: 126
Merit: 110
Andrew Miller
March 19, 2014, 12:03:05 AM
#52
I wrote down a rough outline of a security game definition, and a clearer description of the algorithms with pseudocode.
https://docs.google.com/document/d/12xl5eRtoE0uISUX188jnmGSb-bwG4yXByRBbdr2r97A

I also wrote out an implementation in OCaml of the key operations (using the Lambda-auth language/compiler I made especially for this sort of thing), for building the hash-value highway and creating/verifying the compact SPV proofs. It compiles but I haven't tested it, I plan to use it to create some illustrative graphs.
https://gist.github.com/amiller/9635632

Any of the above could be totally broken, but hopefully this gets the conversation moving along more concretely!
full member
Activity: 126
Merit: 110
Andrew Miller
March 18, 2014, 02:32:03 PM
#51
Thanks for the writeup, I'm glad there's some renewed interest in this.
I can't figure out what the difference is specifically, though, with what you've written up. The proposal in this thread already includes commitments to back links. The way I think of it is that each block contains a claim about how much cumulative work went into creating it. The proof process is basically a hypothesis test, for whether the actual work matches the claim. In fact it's a recursive hypothesis test, because if you follow the "up" links you quickly (O(log n)) arrive at a block that should divide the total cumulative work into two segments. Then you do the hypothesis test on each of these.

I'm currently trying to figure out how to phrase this as a formal security game, which should make it a little more clear how to evaluate whether either of our proposals actually achieves what it's supposed to.
legendary
Activity: 905
Merit: 1012
March 17, 2014, 02:53:16 PM
#50
As a slight reformulation of this basic idea, you can also build a skip list data structure by committing to back links in the current block, and then selecting links whose distance back is less than the apparent work of the block once you've solved it. This allows you to construct compact SPV proofs with an accurate summary of the cumulative work that went into the actual blocks, but in a non-interactive way and without having to reveal more than log N block headers (where N is the length of the chain being proven). This was recently written up in a little more detail on the mailing list:

http://sourceforge.net/p/bitcoin/mailman/message/32111357/
legendary
Activity: 1102
Merit: 1014
August 18, 2012, 05:25:50 AM
#49
Cool graphs. Are you going to do a visual showing how these hashes would be arranged into the data structure you describe? It was a tree of sorts right?
full member
Activity: 126
Merit: 110
Andrew Miller
August 18, 2012, 12:32:51 AM
#48
In the OP, I suggested an experiment. What's the smallest hash (highest value hash) we've ever found? How many hashes have we found with that value? My prediction is that we've found exactly one hash with that highest value. I finally learned to parse blockchain data, and as it turns out, my prediction was spot on.

The smallest hash we've ever found (as of block #188529) is 00000000000000001e8d6829a8a21adc5d38d0a473b144b6765798e61f98bd1d. This hash has 67 leading zero bits. The next smallest hash has only 66 zero bits.

I would have predicted there should be two hashes with 66 zero bits, but in fact, there have been five:
http://blockchain.info/block-height/137445
http://blockchain.info/block-height/140105
http://blockchain.info/block-height/148246
http://blockchain.info/block-height/156636
http://blockchain.info/block-height/156980

I mentioned that this way of looking at hash values leads to a scale-invariant way of measuring time. Let me illustrate this with a couple of graphs.

First, these are the hash values (256 - log2(hash)) as a function of time, the way we normally measure it - in blocks (each block corresponds to roughly 10 minutes). The red line at the bottom represents the difficulty, i.e., the minimum acceptable hash value. You can compare this red line with mndrix's (dated) chart of log-difficulty over time.


To visualize the scale-invariant properties I'm so excited about, look what happens when we plot the hash values against the cumulative difficulty, (i.e., total work). This is also equal to the expected total number of hashes computed.


The key things to notice about this graph are:
 - The density of points decreases exponentially as you go higher. If you split the data into rows, then each row contains half as many points as the row below it.
 - The density of points is independent of the position along the x-axis.
 - The density of points is independent of the shape of the red difficulty line (except for the fact there's no data below it).
 - The x-coordinate of each point is uniform random within the entire interval.

Cumulative difficulty, not block-height, is the true way we measure time in Bitcoin. It's a lovely mechanism, and the fact it has these qualities gives me hope that a future incarnation won't need any hard-coded time parameters. A great way of partitioning/subdividing time is to sample according to the hash values. Right off the bat, there are two time periods - the time before the 67-bit block, and the time after the 67-bit block. Within each interval, you can subdivide according to the 66-bit blocks, and then the 65-bit blocks, and so on.

(Here's the code I wrote to prepare these illustrations)
sr. member
Activity: 323
Merit: 250
August 09, 2012, 07:32:10 PM
#47
The "difficulty" of a block could be set by each individual miner who "bids" on the difficulty they want to get. So a winning block at difficulty 4 would be H(difficulty: 4 zeros || nonce || header) -> 0000xxxxxxxxx. Instead of taking the largest chain, we would take the chain that has the largest sum-difficulty (according to the difficulty bids, not the actual hash value).

There should be a disincentive for miners to spam the network with low-difficulty blocks. Each miner should have stake in the game - the miner himself could place down a bet, effectively a fee for the storage of his block header. If his block is accepted, then he gets the money back. If his fork gets swept up by a higher-difficulty fork, then he loses his bet and the fees he would have earned.

Isn't wasted hashing power disincentive enough?

The hash-value-highway makes it efficient to evaluate the sum-difficulty of a fork, and also to merge forks by cherry picking valid transactions. Long chains of low-difficulty blocks will tend to get preempted by higher-difficulty forks. If each transaction includes a 'minimum difficulty', then you'll only be included in the difficult blocks which are less likely to be reordered.

I don't expect this part of the explanation to be totally clear, especially since I haven't elaborated on how a fork is 'swept up' by the main chain, but I will get to that later. But since you've been working on similar ideas, I think you might have the right intuition for how this works.

So this "sum-difficulty" is the total difficulty of a chain, including blocks that were merged in from other forks? If the block is merged into another chain, how would the original miner lose his transaction fees from that solved block?
sr. member
Activity: 323
Merit: 250
August 09, 2012, 12:35:20 AM
#46
All of those 'hacks' can be independently improved without affecting the bitcoin protocol, and the additional functions of an economy are intended to be seperate from the bitcoin protocol.  It's simply not necessary to include those things into the protocol.  Sure, hard drive tech is always improving, but techies have been trying to get the Internet to upgrade to IP6 for well over a decade, but that is not going to happen until the majority of the users have personaly evidence that IP4 broken.  The same is true with the bitcoin protocol, I don't want you screwing with a good thing & my opinion actually matters because you can't change anything without the consent of the majority of node owners.

Again, I don't want to mess with the current Bitcoin implementation at all, and I'm completely on-board with the careful and conservative changes the core devs are making. But there are alternative blockchains which don't jeopardize bitcoin in any way. If one of them scratches an important itch, many people will adopt it, and it will compete for some resources. I think the bitcoin ecosystem will eventually be much more robust if there are multiple competing blockchains around instead of just one.

The reason some of these economic functions will be based on bitcoin ideas, is that they need distributed chronological ordering, and the bitcoin ideas are the best way we've currently got to do that.
member
Activity: 70
Merit: 10
August 09, 2012, 12:11:56 AM
#45
It's precisely because the system is so stable that modifications to that system are verboten in the majority of minds here.  The old addage, "don't fix it if it an't broke" applies here.

This is pretty antithetical to the hi tech world. Are hard drives broken now? They work amazingly well. Yet, lots of people are out there making improvements to them. I just don't see how it makes sense to say that a piece of technology works well, so it shouldn't be improved. Moreover, bitcoin is broken when it comes to using it in a commercial setting. Ten to thirty minute waits just aren't acceptable. Sure there are ways to deal with that, but those are hacks.

Bitcoin takes care of money. What about the rest of the financial transactions that manage economies, like exchanges, stocks, bonds, bets, options, futures? You need higher resolution timing for those, and probably lots of other improvements as well. I don't think anybody is recommending baking this stuff into the Satoshi client. There will be alternative coins, and they'll compete on the market.

All of those 'hacks' can be independently improved without affecting the bitcoin protocol, and the additional functions of an economy are intended to be seperate from the bitcoin protocol.  It's simply not necessary to include those things into the protocol.  Sure, hard drive tech is always improving, but techies have been trying to get the Internet to upgrade to IP6 for well over a decade, but that is not going to happen until the majority of the users have personaly evidence that IP4 broken.  The same is true with the bitcoin protocol, I don't want you screwing with a good thing & my opinion actually matters because you can't change anything without the consent of the majority of node owners.
sr. member
Activity: 323
Merit: 250
August 09, 2012, 12:02:10 AM
#44
It's precisely because the system is so stable that modifications to that system are verboten in the majority of minds here.  The old addage, "don't fix it if it an't broke" applies here.

This is pretty antithetical to the hi tech world. Are hard drives broken now? They work amazingly well. Yet, lots of people are out there making improvements to them. I just don't see how it makes sense to say that a piece of technology works well, so it shouldn't be improved. Moreover, bitcoin is broken when it comes to using it in a commercial setting. Ten to thirty minute waits just aren't acceptable. Sure there are ways to deal with that, but those are hacks.

Bitcoin takes care of money. What about the rest of the financial transactions that manage economies, like exchanges, stocks, bonds, bets, options, futures? You need higher resolution timing for those, and probably lots of other improvements as well. I don't think anybody is recommending baking this stuff into the Satoshi client. There will be alternative coins, and they'll compete on the market.
member
Activity: 70
Merit: 10
August 08, 2012, 05:43:39 PM
#43
Average block solving time will never be reduced to anywhere near 10 seconds.

Care to elaborate? It'll be easier to do if we know why it's impossible.

It's not impossible, just astronomically improbable.  Those are arbitrary parameters decided by Satoshi, and in all the time since there has not been any proposal for altering them that could offer more than a marginal improvement, if that.  For example, the 10 minute block interval could have been 6 minutes, 12 minutes, 15 minutes or any other arbitrary number of seconds; but it could not have been a small number of seconds because the interval was chosen as a balance between transaction confirmation time and the expected future network propogation latency.  10 seconds, or even 120, is too fast and would result in a great deal many more network splits & orphaned blocks; potentially causing a congestion feedback. 

Also, the block interval time cannot be variable, due to the interlocking structure of the blockchain's design & purpose.  If block intervals were variable; 1) the block reward would either have to be variable also, which would make the task of verifying that the block reward was a valid amount quite difficult for an individual node to perform or the distribution rate of newly issued bitcoins into circulation would no longer be predictable, and that would be bad; and 2) the relatively common blockchain splits would be significantly more difficult to resolve in any automatic fashion, which is the only way it could be done.

Of course your points make perfect sense, but isn't this circular reasoning? You're saying that the target time can't be reduced because of network splits and orphaned blocks, but those are exactly the kind of things these proposals are trying to address.



But those aren't actual problems, the protocol is self-healing as it is.  You're trying to debate a new solution to a non-issue.

Quote
Satoshi is a genius, and it's uncanny how well his somewhat arbitrary constants worked right off the bat. It would have been unwise of him to add in extra complexity to reduce those numbers. But time has passed, the system has proven itself stable, and it's time to consider even more amazing things that can be done with these core concepts. Of course, pressing issues shouldn't be neglected, but some people are going to want to work on more experimental stuff, and I think that's great.


It's precisely because the system is so stable that modifications to that system are verboten in the majority of minds here.  The old addage, "don't fix it if it an't broke" applies here.
legendary
Activity: 1102
Merit: 1014
August 08, 2012, 05:29:59 PM
#42
I don't think it's plausible that this change could be pulled off with Bitcoin but I'm supremely interested in fully understanding this proposal and its implications.
sr. member
Activity: 323
Merit: 250
August 08, 2012, 04:59:12 PM
#41
Average block solving time will never be reduced to anywhere near 10 seconds.

Care to elaborate? It'll be easier to do if we know why it's impossible.

It's not impossible, just astronomically improbable.  Those are arbitrary parameters decided by Satoshi, and in all the time since there has not been any proposal for altering them that could offer more than a marginal improvement, if that.  For example, the 10 minute block interval could have been 6 minutes, 12 minutes, 15 minutes or any other arbitrary number of seconds; but it could not have been a small number of seconds because the interval was chosen as a balance between transaction confirmation time and the expected future network propogation latency.  10 seconds, or even 120, is too fast and would result in a great deal many more network splits & orphaned blocks; potentially causing a congestion feedback. 

Also, the block interval time cannot be variable, due to the interlocking structure of the blockchain's design & purpose.  If block intervals were variable; 1) the block reward would either have to be variable also, which would make the task of verifying that the block reward was a valid amount quite difficult for an individual node to perform or the distribution rate of newly issued bitcoins into circulation would no longer be predictable, and that would be bad; and 2) the relatively common blockchain splits would be significantly more difficult to resolve in any automatic fashion, which is the only way it could be done.

Of course your points make perfect sense, but isn't this circular reasoning? You're saying that the target time can't be reduced because of network splits and orphaned blocks, but those are exactly the kind of things these proposals are trying to address. Satoshi is a genius, and it's uncanny how well his somewhat arbitrary constants worked right off the bat. It would have been unwise of him to add in extra complexity to reduce those numbers. But time has passed, the system has proven itself stable, and it's time to consider even more amazing things that can be done with these core concepts. Of course, pressing issues shouldn't be neglected, but some people are going to want to work on more experimental stuff, and I think that's great.
member
Activity: 70
Merit: 10
August 08, 2012, 02:49:14 PM
#40
Average block solving time will never be reduced to anywhere near 10 seconds.

Care to elaborate? It'll be easier to do if we know why it's impossible.

It's not impossible, just astronomically improbable.  Those are arbitrary parameters decided by Satoshi, and in all the time since there has not been any proposal for altering them that could offer more than a marginal improvement, if that.  For example, the 10 minute block interval could have been 6 minutes, 12 minutes, 15 minutes or any other arbitrary number of seconds; but it could not have been a small number of seconds because the interval was chosen as a balance between transaction confirmation time and the expected future network propogation latency.  10 seconds, or even 120, is too fast and would result in a great deal many more network splits & orphaned blocks; potentially causing a congestion feedback. 

Also, the block interval time cannot be variable, due to the interlocking structure of the blockchain's design & purpose.  If block intervals were variable; 1) the block reward would either have to be variable also, which would make the task of verifying that the block reward was a valid amount quite difficult for an individual node to perform or the distribution rate of newly issued bitcoins into circulation would no longer be predictable, and that would be bad; and 2) the relatively common blockchain splits would be significantly more difficult to resolve in any automatic fashion, which is the only way it could be done.
sr. member
Activity: 323
Merit: 250
August 08, 2012, 01:46:26 PM
#39
My mission is to eliminate every last hard-coded global parameter in Bitcoin, so that it grows into an indisputably scalable and universal protocol. On the chopping block are "10 minutes," and "difficulty adjustment every 2016 blocks."

Two of the things I'm going to propose next (absorbing orphaned forks, and self-adjusted difficulty without timestamps) are going to potentially to create a much larger number of headers, so I wanted to explain my solution to that first, especially starting with efficient validation for lite-clients. If it's not interesting, then no need to read ahead - but save your place.

I'm looking forward to reading this.
sr. member
Activity: 323
Merit: 250
August 08, 2012, 01:43:40 PM
#38
Average block solving time will never be reduced to anywhere near 10 seconds.

Care to elaborate? It'll be easier to do if we know why it's impossible.
kjj
legendary
Activity: 1302
Merit: 1026
August 08, 2012, 01:32:57 PM
#37
My mission is to eliminate every last hard-coded global parameter in Bitcoin, so that it grows into an indisputably scalable and universal protocol. On the chopping block are "10 minutes," and "difficulty adjustment every 2016 blocks."

In that case, good luck, but I don't think you'll get much traction with it.  A few people are greatly concerned with the magic numbers, but most of us don't care.  I've personally read dozens of threads on that subject, and so far no one has ever come up with a good argument why removing them would be beneficial.

This isn't physics where we kept getting 1/137 but didn't know why.  We already know why we have 600 and 2016, and we are ok with it.
member
Activity: 70
Merit: 10
August 08, 2012, 12:56:56 PM
#36

Adding a block index to the block gives you the ability to move forward, but it also gives you full random access.  Why bother with a second link when you can instantly seek by index?

Are you talking about an index within a block?  No, too big times too many full clients.  A better idea would be what I mentioned earlier, a web accessible block index on a regular webserver.  It's not like any particular client needs to trust the index, as it just references blocks that can be downloaded and verified over the p2p network.  Also, the external index could do the 'last harder hash' index after the block is produced, and this wouldn't require any changes in the codebase or protocol.  For that matter, every full client already indexes everything, and that is how blockexplorer.com works.  The trick is just setting up a machine readable standard for light clients to query the indexing server, and then hunt those blocks down over the p2p network.
Pages:
Jump to: