What is stopping pruning from being added to the qt client?

mustyoshi

sr. member

Activity: 287

Merit: 250

I am starting to think the only way to prune effectively is to create an archive node, and such a rule for lightnodes that they always keep a connection or route to an archive node.

Transactions' Inputs would only be pruned after the coinbase in their block was spendable, that would be the rule so that all lightnodes have the same pruned chain.

When a new node connected they would either find an archive node and start the whole process, or query lightnodes to determine a consensus of the legitimacy of a pruned block. Using the archival node would be preferred due to it being verifiable. To ensure archival nodes exist in sufficient quantities, perhaps there could be a system to buy access to them? Thus making them an avenue for profit.

TierNolan

legendary

Activity: 1232

Merit: 1094

Quote from: TierNolan on July 17, 2013, 03:59:46 PM

(Re: Amiller's high hash highway)

No I hadn't, that's interesting.

Ok, reading more. The rule is that you have 2 links, one to the previous block and one to the best (lowest) hash before previous that beats previous. He uses number of leading zeros rather than has value for some reason. He has a big discussion that there is probably only 1 "best" hash in that context. Using the full value means that you are pretty much guaranteed that.

If 100 billion hashes are performed, then each hash has an equal chance of being the best. The fact that there is a difficulty threshold means some never become block headers, but the best ones always will, so it doesn't affect the result.

This means that the best hash so far (ignoring the previous block for some reason) will on average be 50% of the way back, in terms of POW. It splits the chain into 2 equally important halves. Each step backwards would decrease total POW by on average 50%, so it takes log(total POW / POW(genesis)) steps to get back the genesis block.

Blocks that don't have the 2nd link would have to be ignored (so assumed to have a worthless (infinitely high) hash). So, blocks after 270k wouldn't be allowed to link to blocks before 270k and 270k would link to the genesis block. Effectively, the blocks from 1 to 269999 would be considered to have an infinite hash.

[edit]

Reading even more. The reason for taking zeros is so that collisions can occur.

If the highest hash has 50 zeros, then there will be 2 hashes with 49 and 4 hashes with 48 and so on.

The "up" link goes to the most recent block with more zeros than this block (or the genesis block, if none).

If the current block had 47 zeros, then you could find all 48 zero blocks following the chain

Code:

while (block != genesis)
  if (block.zeros == 48) {
    addToList(block)
    block = block.prev;
  } else if (block.zeros < 48) {
    block = block.up;
  } else {
    block = block.prev;
}

If the current block has less than 48, you go upwards. This will skip blocks with zeros less than or equal to the current block, so will never skip a 48 block.

If the current block has 48, then add it to the list and go to the previous block.

If the current block has more than 48, you need to go to the previous block too, since using the up link could cause 48's to be skipped.

This can be used to find a chain of each type of difficulty. This gives more points, so less variance to work out total difficulty.

However, even just showing the 50 zero header would be strong evidence of the total hashing power. The up link would point directly to the genesis block. It would also show the minimum work against that genesis block.

TierNolan

legendary

Activity: 1232

Merit: 1094

How much flexibility have ASIC chips for header changes? Does it have to be 80 bytes with the nonce in the exact right location or did they make them more general?

The current scheme is

80 bytes is expanded to 128 bytes (2 64 byte blocks)

The header is broken up (indexes are in bytes)

Block 1: = 64 bytes
Block 2: =

The first block doesn't contain the nonce, so doesn't change as the counter runs.

This means that they need to perform the following.

First SHA256 pass

~~sha256(block one)~~ (can be pre-calculated)
sha256(block two)

result: 32 bytes

Second SHA256 pass
sha256(32 bytes rounded up to 1 block)

They are unlikely to have the first block as flexible, since that would be unnecessary hardware for the SHA round.

For the 2nd block, there are a few ways to do it.

Allowing the counter to be moved means they would need to add multiplexors to slide the nonce. This creates an incentive to not include that flexibility. However, it might be worth it.

Changing the non-nonce inputs into the 2nd block would just require flip flops, so it is a low cost. There might be some logic optimisations that can't be performed if those bits aren't hardwired to zero. This would seem to be well worth it, since they are maintaining forward compatibility.

The second stage isn't affected by the exact nature of the header, so could support a header change.

There are 40 bytes remaining in the header, before the first hash would require 3 rounds. That puts a maximum limit of 120 bytes for the block header and a rule preventing the nonce from being moved. However, adding a nonce2 would probably be ok. It would only be a problem if the communication speed from controller to ASIC was so slow that 4 billion hashes could be completed before the next nonce2 was sent.

justusranvier

legendary

Activity: 1400

Merit: 1013

Quote from: gmaxwell on July 17, 2013, 03:56:45 PM

If someone has a complete and trusted UTXO set then what need would you expect them to have to fetch archival data?

It could be a new node which has just entered the network by downloading the UTXO set, and wants to fully validate the chain back to some point in the past to increase its assurance that it really is operating on the correct branch.

TierNolan

legendary

Activity: 1232

Merit: 1094

Quote from: gmaxwell on July 17, 2013, 12:12:58 PM

Have you seen Amiller's high hash highway stuff?

No I hadn't, that's interesting.

From what I can see, his proposal is (effectively) to store the lowest hash ever produced by the network.

In much the same way having 10 zeros proves you have done 1024 hashes, the lowest ever hash for the network shows that you have done that work.

gmaxwell

staff

Activity: 4326

Merit: 8951

Quote from: justusranvier on July 17, 2013, 03:33:17 PM

When blocks become very large it will be more efficient to download them in parallel from multiple peers. Allowing that means you've got to subdivide the blocks somehow, might as well subdivide at the transaction level.

Without debating the efficiency of parallel fetch (which is, in fact, debatable)— it's unrelated to talking about archival information— if you want hunks of the latest blocks from your peers you can simply ask them for them. If you want to think of it as a hashtable it's a D1HT, but it seems unnecessary to think of it that way. Besides, prefetching the transaction list requires a synchronous delay. Simply asking your peers for a blind orthogonal 1/Nth of block $LATEST is more efficient.

As a toy example, if you have 8 peers and hear about a block, you could ask each peer to give you transaction n where (n+x)%8==0 where x is their peer index. If not all 8 peers have received the block you can go back and re-request their fraction from the ones that do. Additionally, peers can filter their responses based on transactions they've already send you or seen you advertise. By grabbing blind hunks like this you avoid transmitting a transaction list (which is a substantial fraction of the total size of the transactions) and avoid round trips both to get the list and negotiate who has/doesn't have what. Though this is something of a toy example, it's fairly close the functionality our bloom filtered getblock already provides.

Quote

In addition I'm assuming that all nodes are going to maintain a complete copy of the UTXO set at all times. That means if they wanted to download old blocks the only data they should need to fetch from the network is the pruned transactions from those blocks.

If someone has a complete and trusted UTXO set then what need would you expect them to have to fetch archival data?

justusranvier

legendary

Activity: 1400

Merit: 1013

Quote from: gmaxwell on July 17, 2013, 01:59:41 PM

There is no need in the bitcoin system for uncorrelated random access to transactions by transaction ID. It's a query thats not needed for anything in the system.

When blocks become very large it will be more efficient to download them in parallel from multiple peers. Allowing that means you've got to subdivide the blocks somehow, might as well subdivide at the transaction level.

In addition I'm assuming that all nodes are going to maintain a complete copy of the UTXO set at all times. That means if they wanted to download old blocks the only data they should need to fetch from the network is the pruned transactions from those blocks.

gmaxwell

staff

Activity: 4326

Merit: 8951

Quote from: justusranvier on July 17, 2013, 01:18:46 PM

I was suggesting this as a way of storing prunable transactions, not blocks.

There is no need in the bitcoin system for uncorrelated random access to transactions by transaction ID. It's a query thats not needed for anything in the system.

Quote

Why is a multihop probe unreasonable when it comes to retrieving archival data?

Because it's lowers reliability and makes DOS attack resistance hard. It's part of why the sybil problem is not solved in DHT's on anonymous networks. (By anonymous there I mean that anyone can come or go or connect at any point, freenet opennet is actually insecure— at least in theory). Moreover, it's unreasonable because it doesn't actually improve anything over a simpler solution.

justusranvier

legendary

Activity: 1400

Merit: 1013

Quote from: gmaxwell on July 17, 2013, 12:46:51 PM

That would have very poor locality, resulting in a lot of communications overhead. Because it would be probabilistic it would also multihop probing to find nodes that actually store a particular block.

I was suggesting this as a way of storing prunable transactions, not blocks. Why is a multihop probe unreasonable when it comes to retrieving archival data? Also, it's possible to improve the locality with path folding.

Quote from: gmaxwell on July 17, 2013, 12:46:51 PM

It's not necessary to cargo-cult freenet's behavior. Our task is different and fundamentally easier in some regards.

Of course it's easier, because you don't need to include strong anonymity into the feature set. Still, their data storage which is deployed and working in practise with a ~30 TB capacity solves a lot of the same problems.

gmaxwell

staff

Activity: 4326

Merit: 8951

Quote from: justusranvier on July 17, 2013, 12:21:39 PM

All nodes could become archives nodes which only store a fraction of the historical data.

1) Interpret the transaction hash key space as a being circular.
2) Pick a random hash value, call this the target hash.
[...]

That would have very poor locality, resulting in a lot of communications overhead. Because it would be probabilistic it would also multihop probing to find nodes that actually store a particular block. Please think about the actual access patterns which are needed to perform useful processing on the data. For historical data, stuff beyond a few weeks old, "Pick a random starting point at least $storage from the end, keep $storage worth of block data starting there" is ~optimal considering the access patterns and a lot simpler, and still maintains the relation that so long as the aggregate storage is great enough the probability of being unable to find some of the data is negligible.

It's not necessary to cargo-cult freenet's behavior. Our task is different and fundamentally easier in some regards.

piotr_n

legendary

Activity: 2058

Merit: 1416

aka tonikt

@justusranvier, I think its a good idea, but rather for bitcoin 3.0
IMHO, there is no way you could introduce such a system in a foreseeable future.
It just seems to complex. And people wont like it.

justusranvier

legendary

Activity: 1400

Merit: 1013

All nodes could become archives nodes which only store a fraction of the historical data.

1) Interpret the transaction hash key space as a being circular.
2) Pick a random hash value, call this the target hash.
3) Design a probability function with the shape of a bell curve which produces a value of 1 at the target hash and on average would match 1% (or any other desired fraction) of the key space.
4) Apply the probability function to the hash of any prunable transaction the node is aware of. Permanently store transactions for which the function is true and discard the rest (alternately, save all of them but use the probability function to selectively discard transactions when disk space is low).
5) As long as the total number of nodes is a sufficiently high multiple of 1/(desired fraction), and the nodes have sufficient disk space on average to hold their desired fraction of the key space, then the all of the historical transaction will probably remain retrievable.

gmaxwell

staff

Activity: 4326

Merit: 8951

Quote from: TierNolan on July 17, 2013, 03:52:09 AM

Isn't that what happens with the main client at the moment? Script is not verified until the client passes the last checkpoint?

No, not quite. The expensive scriptsig validations are skipped, but inflation prevention/fees/etc are all preserved. All of it is run if checkpoint=0 is set from the commandline. My preference, after switching to header first sync is to randomly validate signatures in old blocks in proportion to the computation burrying them... and to launch rockets if there is a failure.

Quote

(1) Prevents DOS attacks against node synchronization. Under the original synchronization behavior an attacker can send a node hundreds of gigabytes of difficulty 1 blocks (which thanks to FPGAs and asics can be very cheaply produced). This difficulty 1 chain might— once synced far enough in _theory_ eventually have more sum diff than the real chain, so a checkpointless node can be forced to evaluate it. Checkpoints kill this attack dead, though they aren't the only way to kill it, they're a simple and reliable one.

That is a 51% attack. POW builds directly with hashing power. Building a 1 difficulty chain with more POW than the main chain is just as hard as building a shorter chain that has more POW.

I suppose I was unclear, I'm pointing out that we must evaluate a competing chain with lots of diff 1 blocks in order to determine if it gets ahead. With the current way sync works doing this opens up a DOS attack because the data is fetched even when the chain never comes close to getting ahead.

Quote

This could also be achieved by setting a lower bound on the total POW rather than adding a checkpoint.

As my message points out…

Quote

This allows much faster detecting of dishonest nodes.

I don't see how it does. It would make your sync fail to produce connectable results in the natural and nonmalicious case of having peers on different branches and would require a bunch of code to switch back to scalar fetching from a single peer once an unconnectable result was detected. Might be useful, but I'm not sure it would be worth the client's complexity.

Quote

If/when there are more clients, then they should either make sure that they only ever add checkpoints after they have been agreed (either by adding to the reference client or some other consensus system).

Certainly there should be none placed where there is a risk of an honest competing fork existing.

Quote

It seems that rigid checkpointing gives a performance boost as you can skip script checking.

Nah, you don't need checkpoints in order to do that. Though doing randomized checking completely requires proof of treachery.

Quote

However, it allows the node to estimate the difficulty of the chain and also detect hostile nodes that lie about their chain.

Have you seen Amiller's high hash highway stuff?

piotr_n

legendary

Activity: 2058

Merit: 1416

aka tonikt

Quote from: Peter Todd on July 17, 2013, 06:54:43 AM

Quote from: d'aniel on July 17, 2013, 03:48:15 AM

We've got a rabid one here.

Sorry about that - piotr_n is a bot I wrote to [...]

well - then congratulations, man!
as the first person in the world, you have managed to write a bot that would be smarter than his creator.
that's indeed an achievement that one could be proud of - a breakthrough on the field of AI science, I would say.

but to not get this post deleted because of an OT (which it should), let me add something, on topic.
I've tried to analyze different approaches to calculating a reliable hash of any UTXO database, and it seem to be pretty straight forward.
obviously one would want to be able to calculate the checksum, while having only the UTXO db (without a need to re-parse the entire chain).
obviously one would also want to quickly apply differential changes to the checksum, that come after applying each block.
both can be achieved simple by serializing each UTXO record, then hashing it - and XOR-ing all the hashes together to get the ID of a UTXO snapshot.
extending the hash size to i.e. 512 bits might be a reasonable option as well, though it would be worth to have some numbers to support it.
when an output is being spent, its hash is XOR-ed with the checksum again - this way it gets "removed" from the UTXO db ID.
~~maybe you don't even need to serialize and hash the record - maybe it would be just equally safe to use (TxID ^ Vout) as the value~~ - EDIT: sorry, what was I thinking

as always, feel welcome to slap me with a constructive criticism. just please make sure to double check first what "constructive" means.

Peter Todd

legendary

Activity: 1120

Merit: 1164

Quote from: jgarzik on July 17, 2013, 03:58:06 AM

Quote from: TierNolan on July 16, 2013, 05:05:19 AM

What about creating a new node type "block-server". These nodes maintain historical data only.

That is an existing proposal known as "archive node."

I've got another one, "partial node": http://sourceforge.net/mailarchive/message.php?msg_id=31178377

Peter Todd

legendary

Activity: 1120

Merit: 1164

Quote from: d'aniel on July 17, 2013, 03:48:15 AM

We've got a rabid one here.

Sorry about that - piotr_n is a bot I wrote to get all the good material out of gmaxwell's brain to plagiarize for my upcoming book on crypto-coin theory, "Proof of Work Consensus" (yes I see the irony of the title)

I'll see if I can tweak the settings a bit.

jgarzik

legendary

Activity: 1596

Merit: 1100

Quote from: TierNolan on July 16, 2013, 05:05:19 AM

What about creating a new node type "block-server". These nodes maintain historical data only.

That is an existing proposal known as "archive node."

TierNolan

legendary

Activity: 1232

Merit: 1094

Quote from: gmaxwell on July 17, 2013, 02:01:28 AM

You were suggesting that historic data would be made unavailable and non-validated. That isn't what checkpoints do.

Isn't that what happens with the main client at the moment? Script is not verified until the client passes the last checkpoint?

Quote from: TierNolan on July 16, 2013, 08:23:36 AM

(3) Simplifies discussions about the risk of mass-chain-rewrites.

It certainly does that. There is a risk that a disagreement could cause a fork.

If/when there are more clients, then they should either make sure that they only ever add checkpoints after they have been agreed (either by adding to the reference client or some other consensus system).

Maybe there could be an "advisory" BIP on the topic.

Quote

I'd personally like to remove the checkpoints: Difficulty checkpointing and headers-first synchronization are superior replacements, though they are slightly more complicated to implement. They are superior because they provide protection even far away from the checkpoints and when they are not been recently updated. Eliminating the rigid checkpoints also would allow for some better performance optimizations.

It seems that rigid checkpointing gives a performance boost as you can skip script checking.

Quote

Sync headers first, this is only 80 bytes of data per header.

As I said in the other thread, this can be improved by allowing a header step.

Ask for every 20160th block header.

Pick one at random and ask for 10 headers in steps of 2016 starting from there. In face, maybe a count field could also be added.

Repeat with steps of 201 and 20, and 1.

That is very little data. However, it allows the node to estimate the difficulty of the chain and also detect hostile nodes that lie about their chain.

Quote

Even better robusness to header spamming could come from reverse header syncing but it really wouldn't be important.

That works too. However, I think my suggestion eliminates the spam problem. Effectively, the other node sends a claim and then your node replies with a challenge.

Also, by having fixed steps from the start, all nodes will agree. You aren't asking one node for 245000 and another for 245001. This allows partitioning of the nodes into sets which agree with each.

d'aniel

sr. member

Activity: 461

Merit: 251

We've got a rabid one here.

piotr_n

legendary

Activity: 2058

Merit: 1416

aka tonikt

Quote from: gmaxwell on July 17, 2013, 02:01:28 AM

Quote

Are you saying that I don't understand how the concept of checkpoints reduces bitcoin's security model?

You were suggesting that historic data would be made unavailable and non-validated. That isn't what checkpoints do.

Not quite.
First of all, "what checkpoints do", among all the bullshit that you've lectured us about, is making sure that a node will have an exact content of UTXO db, at a specific block height.
Second, all I suggested was extending a protocol with a possibility to automatically mine into block a hash that points to moving snapshots of UTXO db. I call such a hash a "checkpoint", but feel free to call it however you like.
There is no better chain compression (and bootstrapping time improvement) than this.

In my concept, it would be totally up to a user whether he wants to get a valid UTXOs by doing it the old fashion way (starting from the genesis block and verifying through all the 250k blocks), or if he prefers to enter a hash of a block that he knows is trusted and, using the reference mined into that block, to fetch the entire content of UTXO within a few minutes.

Your great achievement, so much advanced SPV clients that nobody wants to use, isn't anyhow more secured than the solution I proposed would have been.
The only difference is that the later would at least be useful - very useful, since we see all the time people complaining about bootstrapping times and blockchain size (aka pruning).

Quote from: gmaxwell on July 17, 2013, 02:01:28 AM

Quote

Maybe you don't understand that if the only way for a bitcoin software to work is by validating a chain up to a checkpoint that is fixed inside the very same code that validates the chain - then validating the chain up to that point is just a big waste of time, bandwidth, CPU and storage.

That simply isn't correct. If the history is validated then it cannot be forged and replaced with a false history which invents additional coins out of thin air.

False history that ends up at the exactly same hash? Hmmm.. Do you even have an idea what a 256 bit number is?

Stop dreaming and get real. It would be easier to brute force the private keys, since there is much more of them, rather than to find a matching tx history leading to the same hash and tricking the entire network into distributing this copy, instead of the original one.

If you think 256-bit hash is not secured enough, I have no problems with you making it 512 bits, but bear in mind that if anyone manages to reverse SHA256^2 then you can as start looking for another job, because the entire bitcoin is totally broken then - no matter compressed, or not.

Topic: What is stopping pruning from being added to the qt client? (Read 2795 times)