What is stopping pruning from being added to the qt client? - page 2.

gmaxwell

staff

Activity: 4326

Merit: 8951

Quote from: piotr_n on July 16, 2013, 03:38:19 AM

What I don't understand about writing bitcoin software, that you do?
You have this patronizing attitude all the time,

I suggest you review this thread before you continue to complain about patronizing.

When someone says "new nodes don't need the data anyways." and you respond "Indeed.", it's a little hard to not get the impression that you actually might have no clue what the Bitcoin security model is, even when you do follow it up by, "But you know, they are very busy now with "tackling the low lying fruit" - whatever useless feature it is"— suggesting that you're just agreeing with a confused complaint and trivializing the actual nuance and complexity of the issue for the sake accusing people actually doing work on the reference client of working on useless things.

One of these explanations has lower entropy than the other, I suppose. But the trolls in the mining subforum seem to have dulled my Occam's razor. Feel free to provide your own alternative psychoanalysis. In the meantime, I'm stuck providing the detailed technical discussion that you declined to provide, whatever your reasons.

Quote from: TierNolan on July 16, 2013, 06:43:43 AM

It does raise the question of how to find the longest chain in a hostile environment.

Sync headers first, this is only 80 bytes of data per header. Set a minimum difficulty of 100k past the first point where that was achieved (which can easily be maintained by a dozen ASICs now, so there should be no prospect of a viable Bitcoin falling below that ever again) to eliminate cheap header flooding. Add a minimum sum difficulty hint to prevent isolation from being anything but a DOS unless it is also a high hashpower attack. Even better robusness to header spamming could come from reverse header syncing but it really wouldn't be important.

TierNolan

legendary

Activity: 1232

Merit: 1094

Quote from: piotr_n on July 16, 2013, 06:01:27 AM

But this is just a different way of doing checkpoints - moving one (current height minus 5K blocks), instead of values fixed in the code.

No, it isn't a checkpoint. It is a filter to decide if you should store a block. The idea is to prevent storing forks that are unlikely to be part of a valid chain.

If I have stored blocks 0 to 50k of a chain and then I find another chain that is 100k blocks long. If I have checkpointed at 45k on the first chain, then I can't recover.

However, with a filter, I would immediately be able to store all the blocks on the new chain.

The disk space rule might actually kick in though.

However, the quota increases by 1MB every 2.5 mins. This means that once I find the main chain (and stay locked to it), it will generate 1MB every 10 mins and but the quota will increase by 4 MB in that time, so the "window" will open.

It does raise the question of how to find the longest chain in a hostile environment.

piotr_n

legendary

Activity: 2058

Merit: 1416

aka tonikt

Quote from: TierNolan on July 16, 2013, 05:53:32 AM

The process is
- download headers first to get estimate of the longest chain
- only store forks that fork within 5k of the longest chain
- total storage limit of 1MB * (current time - genesis time) / 2.5 mins

But this is just a different way of doing checkpoints - moving one (current height minus 5K blocks), instead of values fixed in the code.
You basically define a rule that your node does not accept any fork older than 5k blocks.
Which is of course a much better solution than the current checkpoints scheme.

TierNolan

legendary

Activity: 1232

Merit: 1094

Quote from: piotr_n on July 16, 2013, 05:17:00 AM

Bitcoin would work, but the node's storage could easily get stuffed with bogus blocks that hook very early in the chain, thus are extremely cheap to be created.
And it's almost guaranteed you that as soon as you'd switch them off, such attacks would appear.

Without checkpoints, it could still be reasonably safe.

The process is
- download headers first to get estimate of the longest chain
- only store forks that fork within 5k of the longest chain
- total storage limit of 1MB * (current time - genesis time) / 2.5 mins

Block deletion could happen but have a timeout of say 60 days. If a block was stored more than 60 days ago and has not been part of the main chain at any point in that time, it can be deleted.

However, it is unlikely to be actually needed.

piotr_n

legendary

Activity: 2058

Merit: 1416

aka tonikt

Quote from: Mike Hearn on July 16, 2013, 05:04:56 AM

Well, I'm not sure they're needed exactly. They're an additional safety measure on the grounds that defence in depth is a good idea. But Bitcoin would still work fine without them.

Bitcoin would work, but the node's storage could easily get stuffed with bogus blocks that hook very early in the chain, thus are extremely cheap to be created.
And it's almost guaranteed that as soon as you'd switch them off, such attacks would appear.

Moreover, since we're talking about block purging.
Block purging implies that there must be some kind of judgement whether an orphaned block buried deep inside a chain is already old enough to purge it - you cannot have any purging without some sort of checkpoints that would tell you how old an orphan needs to be, to be considered too old.

TierNolan

legendary

Activity: 1232

Merit: 1094

Quote from: gmaxwell on July 15, 2013, 09:09:29 PM

Mistakes made in database structure changes or mistakes made in the p2p changes could all cause network wide failures, loss of funds, etc. So these actions must be undertaken carefully.

Actually that is a potential side/sub-project.

What about creating a new node type "block-server". These nodes maintain historical data only.

You can ask them for blocks (and the block header chain), but not for transactions.

These nodes would download all new blocks and verify the headers and POW. They wouldn't even have to check any of the internal data. Any block that is part of the chain up to the most recent checkpoint is stored and any block (including orphans) that can be traced back to the checkpoint would be stored as long as they meet the POW requirements. By keeping the checkpoint reasonably recent, overloading these servers is harder.

Writing a node like that from the ground up wouldn't be that difficult.

Once there was a reasonable number of those nodes on the network, then block pruning would be much less potentially disastrous. Even if there is a bug and it causes a "prune" of historical data for the entire network, there would still be some nodes with everything.

Mike Hearn

legendary

Activity: 1526

Merit: 1136

Well, I'm not sure they're needed exactly. They're an additional safety measure on the grounds that defence in depth is a good idea. But Bitcoin would still work fine without them.

piotr_n

legendary

Activity: 2058

Merit: 1416

aka tonikt

Coming from this angle, when you want to start disabling things in the source code, then everything becomes optional.

But seriously, I'm not criticizing checkpoints here.
Checkpoints are needed - for the very reasons that they were put in, in the first place.
I mean, they should probably be replaced with some other mechanism, something that would not be a value fixed in a source code, something which does not require upgrading the client in order to move a checkpoint, but eventually we still need the protection that they add - no argument here.

Mike Hearn

legendary

Activity: 1526

Merit: 1136

Checkpoints are optional. If you don't want to trust them or suspect they might be wrong in some way, delete them and the code will still work in a zero trust manner.

However if old data gets thrown away completely and all you have is a checkpoint, then they aren't optional any more. That's the key difference.

piotr_n

legendary

Activity: 2058

Merit: 1416

aka tonikt

Quote from: gmaxwell on July 15, 2013, 09:09:29 PM

piotr_n, Its really frightening to see someone claiming to write Bitcoin software for other people to use who doesn't understand this, and proposing changes which would reduce the security model with some implication that they weren't a trade-off and weren't being done simply because someone was lazy or stupid. I'll take comfort in the view that you're probably making these claims not because you believe them but because they serve to further your inexplicable insulting conduct.

Excuse me?
What I don't understand about writing bitcoin software, that you do?
You have this patronizing attitude all the time, but I don't think you are smarter than me, so I am not going to bend before you, just because you are here longer.
If I have an idea, I am always open for a criticism - as long as it is technical and specific (not like in your quote above).

Are you saying that I don't understand how the concept of checkpoints reduces bitcoin's security model?
Oh, trust me that I do. But I am not the one who invented them and put them into the code, in the first place - I am not the one who broke the security model already.
I am only a one who proposed getting a further advantage of checkpoints, in order to implement an ultimate purging solutions.

Maybe you just don't understand something here?
Maybe you don't understand that if the only way for a bitcoin software to work is by validating a chain up to a checkpoint that is fixed inside the very same code that validates the chain - then validating the chain up to that point is just a big waste of time, bandwidth, CPU and storage.

gmaxwell

staff

Activity: 4326

Merit: 8951

Quote from: mustyoshi on July 15, 2013, 04:31:13 PM

But if transactions have already been embedded in the blockchain, new nodes don't need the data anyways.

They do, in fact. Otherwise miners could just "embed" in the blockchain a bunch of transactions stealing people's coin or creating additional coin out of thin air and your node would happily accept it. Part of the check and balance of Bitcoin is the ability of miners to cheat is reduced to very narrow cases, making it unprofitable for them to do so.

Bitcoin is based on the concept of autonomous validation— The Bitcoin algorithm is zero trust as much as is possible. It would be great if the whole thing could be completely autonomous, but there appears to be no way to autonomously achieve consensus ordering, so Bitcoin uses a vote of scarce resource for just to determine ordering. But this requires actually checking the historical data, not just trusting it blindly. There could be a lot of flexibility in how nodes go about doing this— no need to delay the initial usage for it, for example, but it's not a pointless activity.

piotr_n, Its really frightening to see someone claiming to write Bitcoin software for other people to use who doesn't understand this, and proposing changes which would reduce the security model with some implication that they weren't a trade-off and weren't being done simply because someone was lazy or stupid. I'll take comfort in the view that you're probably making these claims not because you believe them but because they serve to further your inexplicable insulting conduct.

An additional point of correction, which is obvious to anyone who is familiar with the development of the reference software but which might not be known to readers of the thread: Bitcoin 0.8 featured a complete rewrite of the entire storage engine used it Bitcoin which was the necessary first step to implementing pruning. In addition to the massive short term performance improvements this risky and laborious rewrite also achieved the complete decoupling of the historical data and the data needed for validation... prior to 0.7 Bitcoin needed random access to the blocks even during normal operation, which would have made pruning pretty difficult to implement. Right now all thats required is solving the P2P issues related to announcing and discovering historical blocks in a network where not every full node has them. Mistakes made in database structure changes or mistakes made in the p2p changes could all cause network wide failures, loss of funds, etc. So these actions must be undertaken carefully.

piotr_n

legendary

Activity: 2058

Merit: 1416

aka tonikt

Quote from: mustyoshi on July 15, 2013, 04:31:13 PM

Quote from: piotr_n on July 15, 2013, 04:27:57 PM

it does remove unspent outputs as they get spent
it doesn't remove the block data - because then it would not be able to serve blocks to clients that are fresh and need to download the chain.

But if transactions have already been embedded in the blockchain, new nodes don't need the data anyways.

Indeed. https://bitcointalksearch.org/topic/m.2683001
But you know, they are very busy now with "tackling the low lying fruit" - whatever useless feature it is

mustyoshi

sr. member

Activity: 287

Merit: 250

Quote from: piotr_n on July 15, 2013, 04:27:57 PM

it does remove unspent outputs as they get spent
it doesn't remove the block data - because then it would not be able to serve blocks to clients that are fresh and need to download the chain.

But if transactions have already been embedded in the blockchain, new nodes don't need the data anyways.

piotr_n

legendary

Activity: 2058

Merit: 1416

aka tonikt

it does remove unspent outputs as they get spent
it doesn't remove the block data - because then it would not be able to serve blocks to clients that are fresh and need to download the chain.

mustyoshi

sr. member

Activity: 287

Merit: 250

Can't the client just download and parse the entire blockchain removing the unspent outputs as they get spent?

I don't understand why it hasn't already been implemented, I imagine it would be immensely useful for miners who don't want to foot the entire 10GB blockchain, and just want the probably .5GB block header + unspent output that would be the result of the parsing.

Topic: What is stopping pruning from being added to the qt client? - page 2. (Read 2796 times)