Blockchain Compression | Bitcointalksearch.org

piotr_n

legendary

Activity: 2058

Merit: 1416

aka tonikt

Quote from: Mike Hearn on July 08, 2013, 03:03:49 PM

I'd actually be OK with including a database copy in the downloads, it's a much simpler and easier thing to do than complicated shared UTXO commitment schemes.

This would only improve the bootstrapping time, but it is nohow a "blockchain compression" solution.

Calculating a reliable check-sum of UTXO snapshot might be a bit time consuming, but the process itself is not a rocket science and the algo should be very simple to implement.
Even if you do the actual hash of all the records, the operation should take far below one minute, on a modern PC.
But if you choose a smarter checksum calculation method, you can speed it up a lot by doing some kind of differential UTXO checksum adjustment while processing each block, which would only require fractions of seconds.
A node would already have the expected checksum value, because a UTXO checksum in a block would actually refer to the UTXO state after the previous block.
Moreover, if you still find calculating of UTXO checksum too much resource unfriendly, you may only allow such checksums to be mined (thus verified) each few thousands blocks.

Mike Hearn

legendary

Activity: 1526

Merit: 1136

I'd actually be OK with including a database copy in the downloads, it's a much simpler and easier thing to do than complicated shared UTXO commitment schemes. You can't verify the database by hand, but you can't verify the binaries either and we obviously provide binaries. The same techniques could work fine - a bunch of different people all collaborate to do a threshold-signing of the included database just like they sign the binaries. Then anyone can tell their node to audit the developers just by letting it recalculate the database like anyone can audit the binaries using gitian.

However it seems I'm in the minority on that one. Whenever I've suggested including a database copy in the downloadable binaries lots of other people jumped up to disagree with me. So - SPV it is.

piotr_n

legendary

Activity: 2058

Merit: 1416

aka tonikt

OK, agreed.
But since you already have checkpoints, why don't you get advantage of them?

justusranvier

legendary

Activity: 1400

Merit: 1013

Quote from: piotr_n on July 08, 2013, 12:16:06 PM

Though, again, who am I talking to + the idea of checkpoints is stupid and goes totally against the main purpose of the solution

I don't think there ultimate is a nice, clean, mathematically and cryptographically pure solution to the problem that will work in practise. We'll end up with a messy set of overlapping partial solutions that will provide sufficient security when balanced against usability.

The developers on GitHub could commit malicious checkpoints, but due to the nature of Git there's no way to hide what they've done. Anyone who's been running a full node can notice the discrepancy and spread the word via any number of communication platforms and bring enough scrutiny to the problem that everybody can manually figure out what's going on and apply a fix if necessary.

In the pure theoretical realm of solving the Byzantine Generals' Problem one doesn't normally include the existence of IRC, Twitter, Facebook, and Reddit as part of the solution, but in the real world relying on the ability of users to communicate out of band can sometimes be an acceptable way to resolve an attack. I'd like to see checkpoints distributed via as many mediums as possible, including print media, by as many different entities as possible to make it as difficult as possible for someone to distribute incorrect ones without being detected. We really want everything to work automatically, but it's good to have backup plans to fix things manually when something breaks.

piotr_n

legendary

Activity: 2058

Merit: 1416

aka tonikt

Maybe actually it should be simpler.
Each miner would have a possibility to mine in a hash of the current utxo db - of course it needs to be sorted, or something, but wrong hash means block is invalid.
And then, by other means, not necessarily ones that need to be sponsored by bitcoin foundation, people would find a way to fetch these snapschot of utxo
as long as you add to the protocol a possibility to mine in hash to utxo and make sure that all the miners get it...
two years at least.
ten since it's my idea

piotr_n

legendary

Activity: 2058

Merit: 1416

aka tonikt

Maybe there should be something like a general solution, to compress it once and for all.
Its all about the most recent content of UTXO - which is not so big, comparing to the so much faster growing blockchain size.
Maybe there should be an automatic checkpoint (like a small-generic) each 2016 blocks or so (with a hash of the UTXO db content included inside the block) from which nodes would be able to start their existence, using the chosen hash as a trusted seed to quickly fetch entire UTXO, instead of re-doing the chain up to that point.
If that works, there would be no more need to store, nor to share, the entire blockchain - it could be stored in a museum, though

piotr_n

legendary

Activity: 2058

Merit: 1416

aka tonikt

Quote from: Peter Todd on July 08, 2013, 12:20:41 PM

With checkpoints the client only skips validating transactions, it still builds the UTXO set. You've still shown that an vast amount of hashing power work has been spent towards the blocks leading to the checkpoint, and validating that work, as well as the work after the checkpoint, is pretty easy. More importantly if the checkpoint is invalid and the majority of hashing power isn't following it you'll soon notice because your client doesn't see as many blocks as it should.

But if you are taking a piece of software where some devs have put a checkpoint, why would you trust them in a part of choosing a honest block hash, but not trust that they honestly verified the chain up to that point, difficulty check including?
It just doesn't make any sense to use this software to verify from a scratch a chain that can only be valid if it reaches a block with a specific hash - value fixed inside the same software that is doing (again) the blocks' verification, to reach the same hash - because only then it can work.
It's just a huge overhead.

Peter Todd

legendary

Activity: 1120

Merit: 1164

Including a UTXO hash in the checkpoint makes for a much less secure model than not doing so.

With checkpoints the client only skips validating transactions, it still builds the UTXO set. You've still shown that an vast amount of hashing power work has been spent towards the blocks leading to the checkpoint, and validating that work, as well as the work after the checkpoint, is pretty easy. More importantly if the checkpoint is invalid and the majority of hashing power isn't following it you'll soon notice because your client doesn't see as many blocks as it should.

On the other hand a UTXO hash is trusting the developers to compute the UTXO set properly for you. They could easily leave UTXO's out and you would never know until your node tried to spend one of them and wound up forked from the main network, or with a majority of hashing power, someone found out their coins were now unspendable by developer fiat. Point being fraud on the part of the developers isn't detected automatically or immediately.

UTXO hashes will be acceptable once UTXO proofs are a requirement for a block, and you'll be able to reason about how much work has been done for any given UTXO proof, but as you know we're a long way from finishing that.

piotr_n

legendary

Activity: 2058

Merit: 1416

aka tonikt

I agree.
If there are these checkpoints anyway, you can as well distribute a new version of the client with the content of the last checkpoints' UTXO, instead of only a block's hash.
This would make bootstrapping so much faster, but also require no storage for all the previous blocks, since any node would be able to download the last checkpoint's db content from the peers - with a checksum it would be equally "safe" as the checpoint values now, but so much more useful.
It's a bold move, but in fact not bolder than the idea of checkpoints itself.
And it would at least optimize something.
Though, again, who am I talking to + the idea of checkpoints is stupid and goes totally against the main purpose of the solution

maaku

legendary

Activity: 905

Merit: 1014

Mike, I will make one comment that if you are going to trust the checkpoint (which you do currently - validation in Bitcoin-Qt isn't performed until after the last checkpoint, right?) and if the checkpoint included an authenticated UTXO index hash under the soft-fork scenario, then you could just download that UTXO set and start verification from there, with *exactly* the same security as you would otherwise would have had. This let's us keep the "one to two days" to perform IBD to a small value, and not continue to increase over time.

Mike Hearn

legendary

Activity: 1526

Merit: 1136

Yes, exactly. Pruning is not a bandwidth saver. Mostly when speed of chain download has been examined, it turned out that disk IO and CPU were the limits, not bandwidth. Mostly people could download the chain faster than Bitcoin can digest it. It might have changed a bit since 0.8.1+/ultraprune/leveldb/other optimisations, but you'd still rapidly get back to the point where disk and CPU are the limits rather than the network.

This is why it's so dangerous to throw around random assumptions about what changes should be made. You might end up not fixing the problem you wanted to fix.

If you ARE bandwidth limited, Jeff already runs a blockchain torrent. I don't remember if it's compressed but as it's just a file, it certainly could be. If you want to bootstrap a new node faster, downloading the torrent is one way to do it. You still have to build the indexes yourself of course, and that's going to take a long time no matter what.

For the vast majority of new users, the correct client to use is MultiBit, and since a few days ago that's now our recommendation on bitcoin.org. The Bitcoin-Qt client is now listed as an app you run if you want to help the network by donating resources. This is a scalable fix - MultiBit will always be fast no matter how popular Bitcoin gets, whereas anything you can do to Bitcoin-Qt either changes the security model to be more like MultiBits (in which case why bother) or will make a percentage improvement taking us from "slow" to "slow". Once you make an informed decision to donate your resources to the network over the long term, the exact amount of time it takes to catch up with the chain head becomes less relevant. You're going to run it for months we'd hope, so one or two days doesn't make a big difference.

maaku

legendary

Activity: 905

Merit: 1014

Quote from: IIOII on July 06, 2013, 01:32:37 PM

Implementing pruning would be a great leap forward in usability.

The problem isn't (yet) the size of the block chain on disk. 10GB isn't a lot of space. It's network bandwidth - downloading 10GB of data takes a while on any connection. Pruning would do nothing to fix that (you still have to download the blocks before selectively throwing portions away), and it may even make things worse since fewer peers will have the full blockchain to seed from.

piotr_n

legendary

Activity: 2058

Merit: 1416

aka tonikt

Quote from: IIOII on July 06, 2013, 01:32:37 PM

Implementing pruning would be a great leap forward in usability.

Problem with pruning is that if all the nodes remove spent txs from their storage, who is going to serve the block chain? Charities?

mechs

full member

Activity: 210

Merit: 100

It seems to me their is one quick solution which would be to simply compress the blockchain with a known encryption algorithm such as rar (I as an exercise was able to shrink mine to 65% of its original size) and then the client can decrypt in the memory in the fly as needed. Hardly a very cpu intensive task for most modern processors and it can be made a feature that can be disabled at the user's option. It would result in 35% faster downloads and 35% less disk space used at the cost of using the cpu a bit more and some more memory.

This is not a replacment at all for pruning, however this strikes me as fairly quick to implement and should be a part of any solution. Is there any flaw here I do not see?

IIOII

legendary

Activity: 1153

Merit: 1012

Quote from: piotr_n on July 05, 2013, 02:09:14 PM

Quote from: hazek on July 03, 2013, 06:50:52 AM

You know I really don't care if that's true. Because what I do know is that it's wasting time on a problem that we might have over spending vital time we don't have on a problem that we have RIGHT NOW which frustrates me to no end.

I understand you very well, but all I can tell you is: get used to this - it's normal. They only fix/optimize things if people have reported (and proven) an issue.
And since by their definition an uncompressed blockchain data is not an issue, they just keep following the usual road map, which is always about adding over-complicated things that:
1) nobody really needs, nor they help him with anything
2) can be as well implemented outside the bitcoin node

But try to explain them that no sane bitcoin user seeks a technology that requires him to submit all his personal details (full name, address, etc.) to a certificate issuing authority, in order to receive a bitcoin... and good luck with it!
They already found it important and they are going to spend 12 months while nothing else matters implementing and testing it, repeating that this is very important and widely requested feature - there is nothing any of us can do about it, except maybe laughing on how the suckers waste their time by burying a complex code of unnecessary features inside a so crucial piece of software that a bitcoin core is. So yeah, at least it's funny

You're so right.

Instead of improving core functionalities, Bitcoin gets bloated with features that are optional "nice-to-haves" which could be more efficiently solved by third party services.

It's false to claim that blockchain size isn't a problem. Every single bitcoin user I know complains about blockchain download times. Implementing pruning would be a great leap forward in usability.

piotr_n

legendary

Activity: 2058

Merit: 1416

aka tonikt

Quote from: hazek on July 03, 2013, 06:50:52 AM

You know I really don't care if that's true. Because what I do know is that it's wasting time on a problem that we might have over spending vital time we don't have on a problem that we have RIGHT NOW which frustrates me to no end.

I understand you very well, but all I can tell you is: get used to this - it's normal. They only fix/optimize things if people have reported (and proven) an issue.
And since by their definition an uncompressed blockchain data is not an issue, they just keep following the usual road map, which is always about adding over-complicated things that:
1) nobody really needs, nor they help him with anything
2) can be as well implemented outside the bitcoin node

But try to explain them that no sane bitcoin user seeks a technology that requires him to submit all his personal details (full name, address, etc.) to a certificate issuing authority, in order to receive a bitcoin... and good luck with it!
They already found it important and they are going to spend 12 months while nothing else matters implementing and testing it, repeating that this is very important and widely requested feature - there is nothing any of us can do about it, except maybe laughing on how the suckers waste their time by burying a complex code of unnecessary features inside a so crucial piece of software that a bitcoin core is. So yeah, at least it's funny

TierNolan

legendary

Activity: 1232

Merit: 1094

I think also there could be a difference between verification for mining and general verification.

The problem with mining is that you need verification to be extremely quick. A distributed system (or at least a p2p distributed system) isn't likely to be able to keep up with a centralised system.

However, with bad block proofs, a distributed system may be able to detect a bad block within 5-10 blocks. This wouldn't be sufficient to mine, but would be sufficient to keep miners honest.

Ofc, there is still the problem of proving that data is missing.

Mike Hearn

legendary

Activity: 1526

Merit: 1136

It's not quite that simple - to combine provable computations with blocks you could "just" provide the proofs to the clients that verify them, which indeed is a huge upgrade over SPV security assuming the verifications are sufficiently cheap. However as all such schemes require encoding of the entire input to the function and have complexity related to the size of the inputs, the right way to do it is to have the provable program actually work with a UTXO merkle tree and output deltas to that tree. At least, that's the scheme we came up with when I discussed this with Bryan Parno.

If someone wanted to start calculating UTXO merkle trees in order to experiment with provable verification, then I'd think that was cool but it's a long way from "we should actually make this a part of Bitcoin". See that as zerocoin style research. Cutting edge stuff that has great potential but it's not something for todays world.

d'aniel

sr. member

Activity: 461

Merit: 251

Quote from: d'aniel on July 03, 2013, 09:24:42 PM

Partially verifying nodes is what I ended up being interested in utxo set commitments for, and they seem to be really useful if/when we get efficient verifiable computing. Though there definitely isn't any near-term need for this.

Re: verifiable computing and utxo set commitments, I noticed that a commitment of the utxo set digest into the block header wouldn't actually be necessary since you'd already have a succinct proof of its validity which is much stronger than simple inclusion in a block header. Though the authenticated utxo set is still necessary for being able to turn blockchain verification into verifying a chain of succinct verifiable computing proofs(?), and for knowing your copy of the utxo (sub)set is valid at any given time.

TierNolan

legendary

Activity: 1232

Merit: 1094

Quote from: Peter Todd on July 04, 2013, 06:31:34 AM

That's not true unfortunately. You'll need to do some extra UTXO queries strategically to get the right information on the branches next to the transaction being changed, but with that information you have enough to calculate the next UTXO tree state without having any of the data.

Right.

In fact, you could require all the extra data be attached to the transaction.

Quote

You're missing a key point: transactions can touch multiple parts of the UTXO set so once the UTXO set is split into subsets participants validate (at minimum) pairs of those subsets.

I was thinking that you share out the transactions based on the transaction's own hash. However, you are right, to actually verify, you need all the info.

In theory, transactions could be passed between the 16 sub-chains, and you need a verification for each of the 16 sub-chains to be actually verified.

You could have 16 blocks per actual "full" block. A sub-block would contain

- all transactions from the previous block, except
-- Any transactions added 16 blocks previously (they count as confirmed)
-- Any transactions that have inputs from transactions in this sub-chain, but are invalid
- Any new transactions for this sub-set

This is similar to my suggestion to have odd/even transaction rules. Basically, you guarantee that the next block won't alter certain aspects of the UTXO set.

This allows the next block to be prepared ahead of time.

Topic: Blockchain Compression (Read 8686 times)