Well said but it can make processing the data a little bit faster if I fully understand what they are doing
but it makes the code more complicated to understand and it's been over engineered as it is at every
To me, Segwit is another layer of cryptographic data abstraction. In bitcoin's original block chain, you have essentially 3 levels of cryptographic data abstraction: the header list, the Merkle tree, and finally, the real stuff, the transaction. Segwit adds another layer of abstraction: the node in the Merkle tree is an abstraction for whatever the transaction is, of which the details are now in the witness.
These different layers of abstraction allow for different levels of cryptographic verification of authenticity, and use different cryptographic mechanisms. The top layer, the headers list, is a linked list of hashes, containing the proof of work (which is bitcoin's conventional signature of authenticity at the highest level). The second layer, the block, has a Merkle tree, which is a linked binary tree of hashes, of which the head is in the block header. You only need to descend into that tree if there is specific information you want to verify within a block: ideally, you'd only need the Merkle tree "path" down to the location where the node information resides. With a legacy node, that node is the transaction information you're looking for. As all this stuff are linked hashes, from the single node, through the Merkle tree, up to the block header, to the header chain, the authenticity of all these elements is secured, because they are linked together with hashes. Within the node information itself, however, the authentication is done by digital signature. All the signed information used to be in the node in the legacy protocol ; with segwit, this can refer also to "witness data", which is outside. This is somewhat similar to locking up the hash of a big book in a bitcoin transaction: you can transmit the book data outside of the chain, and verify that it is the book that has the right hash that was in the chain. So what Segwit essentially does, is to allow the node (the former transaction) to be itself an abstracted form of cryptographic data link to the rest of the system.
So in a way, bitcoin is built to be a growing "data base" of stuff of which each element can be verified to be indeed included, without, that's the whole point, having to download the entire database. I can prove to you that a certain node Y is part of the block chain at block X, by providing you only with:
- the full header list
- the part of the Merkle tree of block X that goes to node Y
- the full data of node Y
You can verify all that, and you can verify that there's no way that I'm faking you if you can find out whether the current block header "on the net" is the block header at the end of what I gave you. You don't need to download the full block chain for that.
This is exactly what a light wallet does, where the only nodes Y it is interested in, are the transactions to and from the addresses of the wallet.
With Segwit, there's now, "next to the block chain", also data that is related to node Y, which is called its witness. With legacy transactions, the witness is empty, with segwit transactions, that witness contains the actual transaction data.
So now we have that in order to be sure that I have all I need to know concerning Y:
- the full header list
- the part of the Merkle tree of block X that goes to node Y
- the full data of node Y
- the witness of node Y
If there is a single, accepted, block chain out there, then these elements are all I need to be cryptographically certain that this information is authentic in that chain. So as a user, I don't need anything else, apart from the "actual head of the chain that is accepted". I don't need the full block chain. I can do with the minimum, but I can use somewhat more if I like:
I could download the full block Merkle tree, instead of just the path to the node I'm interested in.
I could also download the full transaction data of the block containing my node data Y.
I could also download ALL blocks.
And I could, or not, download all the witness data.
But all this doesn't change a thing for me, namely, to know whether my data Y are authentically accepted in the sole agreed-upon block chain out there.
This is why, for a normal user, there's no "data burden", and that's what Satoshi explained in his mail of November 2008.
So who could want to download all of the block chain information ? (non-mining full node)
Who is, by doing so, exposed to the full burden of data ?
Well, there's of course more to bitcoin than just having a correct set of nodes: the essence of bitcoin is that these nodes correspond to "correct transactions". In order to be able to verify whether a transaction is correct, one needs two things: one needs information about the previous transactions referred in the to-be-verified transaction ; but one also needs to verify that some information is NOT PRESENT in the WHOLE of previous data blocks, to avoid double spending. So, in order to verify the correctness of a transaction, one needs the whole chain.
So if you want to be able to say whether a transaction IS ALLOWED TO BE included in a block or not, you do need the full chain (and with segwit, ALSO the witness data). However,
who is to decide ? The answer to this question is: the mining nodes, and ONLY the mining nodes. Indeed, bitcoin's solution to the Byzantine General's problem is based upon a vote by proof of work. Nothing stops a mining node, technically, from including erroneous node data (or even erroneous Merkle trees, or erroneous headers). But his block will be voted away by the other mining nodes,
that will not build upon it. As mining is economically costly (proof of work), for a mining node, there's no reason to make blocks his peer mining nodes will not vote for. He will be wasting resources for nothing. So the sole, consensual, block chain that is going to be built is
a consensus of mining nodes. As long as more than 50% or so of the hash power of mining nodes plays by the rules, the block chain will be constructed by the rules. But in order to do so, in order to be able to make correct blocks,
mining nodes need the full data burden of the full block chain. There's no way they could verify the correctness of a new block if they didn't have the full data set. On the other hand,
if ever mining nodes came to the agreement to include non-conform blocks, and continue building on it, as a user, there's nothing that you can do about it. There is no alternative chain. If node Y within block X contains a transaction that is not correct, but it is deeply buried inside the chain and the mining nodes continue to build upon that chain, there's nothing you, as a user, can do about it. Even if you download all that data. Even if you see that it is wrong. Because there's no other chain out there.
If you have big stakes in bitcoin, however, it might be a good idea, to be aware of it, even if you cannot do anything about it. So if you have big stakes in bitcoin, it might be a good idea to download all of the data, with all of the burden it represents, just to know whether the miners did include funny things or not. But by far most users don't need the full data burden.
Distributed systems are not designed to work by filling up a node until the 200gb file grows to
a terabyte and the development team knew this for day one so it's problem-reaction-solution
and the solution is the lightning network, miners are begging for it because the hubs are banks
which will need hardware and BTC to fund the private ledgers which is just what the miners have
by some strange stroke of luck.
The fundamental error is to think that every user needs the full data burden. That's not feasible. But not all users of Wikepedia download all of the Wikipedia data. They only look at those pages they want to read. As a normal bitcoin user, you don't need the full data burden. You only need to do the cryptographic verification that the information you download, is corresponding to what's in the sole block chain out there.
The big mistake in this whole debate is to confuse two aspects of bitcoin's cryptography:
1) verify that the transaction data you have, is the correct data that is in the consensual, single block chain out there
2) verify that the mining nodes that have the consensus power in bitcoin, followed the rules they are supposed to follow in building the sole consensual chain out there.
The normal user only needs to be concerned with 1). Even if he found out in 2) that the sole chain out there is not built according to the rules he thought were in effect, there's nothing he can do about.
However, big players may want to look at 2). Big exchanges for instance. But it is sufficient that some look. There will be a whistle-blower. If you think it is fun, you can do it. It comes with a data burden.
The "hard" consensus is the one of the mining nodes, with their vote by proof of work. If mining nodes decided collectively to come to consensus A, then that's what's out there, and nothing else. If mining nodes don't come to a consensus, we have a fork, but a fork without bidirectional incompatibility is very dangerous (can be overtaken). So usually, a fork with the same protocol will not last, which means, that for the normal user, there is only one chain out there.
But there is also "soft power" in this game. Suppose the mining pools come to the consensus that they increase the block reward. They can, of course. If they come to that consensus, and the sole block chain out there has now bigger block rewards, there's *technically* nothing a non-mining node can do about it, apart from seeing that that is what is happening. So why don't mining nodes decide to do so ? Of course, for them to do so, they would have to collude. There's the "tragedy of the commons" effect (Nash equilibrium) that stops an individual mining node to take the initiative to give himself more rewards: his block would not find consensus by his peers. But suppose that the Nash equilibrium of honest mining node breaks down and that all of them collude over larger mining rewards. The chain out there now prints more bitcoin per block.
As mining nodes are rewarded in coins, they are also sensitive to the market. They can print themselves as many bitcoin as they want, if bitcoins aren't worth zilch in the market, they have been spending a lot of economic cost for nothing. Printing more bitcoins is something the market would be highly negative about. So if there is one set of entities that are kept by soft power, it are the mining nodes. Even if they are not decentralized enough for the Nash equilibrium to be secure, they are simply kept by the "soft power" of the market.
This is why the total centralization of mining nodes didn't bring a disaster: they need the market to like bitcoin. If they do crazy things, the market will blow their investment to oblivion. Even if there was only one single mining node out there, it most probably would still play by the rules, because if it didn't, the market would crash and his entire investment would be dead.
So essentially, those suffering from the data burden, are the mining nodes themselves: they need it technically. It is also very probable that those with big stakes in bitcoin, have an incentive to be "nice to the bitcoin users" and allow them to use their datacenter as a proxy for the block chain (to allow users to connect their light wallets).
Normal users only need to verify the cryptographic soundness of that the data they have, is the real data of the real block chain, and a light wallet is enough for that. They rarely *need* to verify the work of the mining node consensus. They might be the whistleblower of a problem, but they cannot do anything about it. As mining nodes are kept by the market sentiment to finance their huge economic spendings in proof of work, the very last ones to want to crash the market are the mining nodes. So even if they are highly centralized, they most probably will continue to play by the rules.
So the "data problem" is not real. And that's what Satoshi already explained in November 2008. Satoshi made many mistakes, but on this one, he was right.
There is still one potential issue: who is going to provide normal users with enough network infrastructure so that they can download the data for their light wallet verification ? The answer to this one is actually very simple: the bitcoin industry, of course ! The bitcoin industry are all those people making profit over the existence of bitcoin: the mining nodes in the first place, and also, the big exchanges, and if it ever happens, other entities in the ecosystem that have benefit from users using bitcoin. These are the entities that have an incentive to put at the disposal of their bitcoin users/customers, a "full node data centre" that can connect to the light wallets. Mining nodes may be incentivised to build a geographically distributed net of data centers "serving the block chain and capturing transactions". These are the entities that are exposed to the "data burden", but for such entities, this databurden is ridiculously small as compared to the data burden of other data centres. Compared to say, TV over internet, bitcoin's data burden is ridiculously SMALL.