Pages:
Author

Topic: Creating an "official" protocol specification for the Bitcoin internet currency (Read 4701 times)

legendary
Activity: 1232
Merit: 1094
You're totally right, but we've known this for a long time; fundamentally my objections to large blocks on the basis that they can be used to flood miners on low bandwidth connections is an example of consensus failure due to network rules. It's just the "rules" in that case happen to be laws of physics rather than software.

What are your views on having nodes validate only 1% of the block-chain?

The system needs basically a distributed hash table.  The hash becomes the key.  It could resolve to children on the merkle tree and transactions.

If you ask for a merkle root, it will resolve to the 2 child hashes.  You can then resolve one of those hashes to follow the chain downwards.

The big problem is how to handle missing (or intentionally withheld) data.

Quote
My blockheaders over Twitter thing, while at one level an April Fools joke, is at another level not a joke at all. We do need alternate methods of block and transaction propagation on the network, and unlike validation having as much variety as possible is only a good thing. It's the fundamental security principle of Bitcoin: information is easy to spread and hard to stifle.

Cool, and yeah that is exactly the point.  The network is just a means to an end.

The big benefit of the exactness of a spec is validation of a block chain.

Having said that, the network spec should be relatively easy to define exactly too.
legendary
Activity: 1120
Merit: 1152
An alt client which implements the 95% of the network rules would probably be functional.  Maybe it wouldn't protect against spam quite so well etc.

You're totally right, but we've known this for a long time; fundamentally my objections to large blocks on the basis that they can be used to flood miners on low bandwidth connections is an example of consensus failure due to network rules. It's just the "rules" in that case happen to be laws of physics rather than software.

My blockheaders over Twitter thing, while at one level an April Fools joke, is at another level not a joke at all. We do need alternate methods of block and transaction propagation on the network, and unlike validation having as much variety as possible is only a good thing. It's the fundamental security principle of Bitcoin: information is easy to spread and hard to stifle.
legendary
Activity: 1232
Merit: 1094
What is the core problem that bitcoin solves?  The distributed consensus problem.

Forks are caused by 2 or more client implementations disagreeing about whether a block is valid or not.

This is the part that needs to be exactly right to prevent forking.

Slight differences in the protocol implementation are not as fatal.

Quote
The currency aspect of bitcoin is simply a layer on top of the distributed timestamping service.  namecoin is an example of a non-currency use of distributed timestamping.

The timestamping requires header verification.

The spec could have 2 parts, net protocol and validation rules.

The validation rules are more critical to get exactly right.  A full node requires 100% support for the validation system and all clients need to be the same.

An alt client which implements the 95% of the network rules would probably be functional.  Maybe it wouldn't protect against spam quite so well etc.
legendary
Activity: 1120
Merit: 1152
What is the core problem that bitcoin solves?  The distributed consensus problem.

There have been chains of hashes and chains of digital signatures before. What makes bitcoin different is that it is timestamping these digital messages, and protecting those timestamps against being reversed.  The currency aspect of bitcoin is simply a layer on top of the distributed timestamping service.  namecoin is an example of a non-currency use of distributed timestamping.

Actually I think you are oversimplifying a bit... just timestamping isn't enough. If all Bitcoin did was timestamp SHA256 hashes of transactions, it'd be useless because there would be still no consensus about double spends. The actual data itself must be public.

Better is to describe Bitcoin as an data store with inherently difficult to change order. Now it happens to be that Bitcoin is implemented in such a way that it will never store conflicting data twice, but if it wasn't, it would still work just fine - the second transaction would be useless garbage and ignored. Similarly actually having the time attached to the data is only an artifact of the PoW implementation. Again, the order is what matters, not the time. If the difficulty was fixed at the current value, rather than adjusting every two weeks, Bitcoin would work just fine strictly speaking, though you would have to wait months to get a block.(1)

I'm being a bit pedantic... but it's an important distinction in the case of alt-chains that try to bootstrap off the Bitcoin PoW function. For instance you could try to define an alt-chain where ordering was determined by getting the block header hash timestamped into Bitcoin - I've proposed the idea before on the bitcoin-dev email list - but solving the actual consensus problem of being sure everyone knows about all transactions is very tricky with such "pseudo-mining"

1) You know, Bitcoin could have been usefully implemented via telegraph and semaphores back in the late 1800's had the block interval been set to, say, 1 month or so. A small team of human calculators could probably calculate a SHA256 hash for a transaction in a few hours, especially with purpose built mechanical aids, and back then they could have used a much weaker hash function and gotten away with it. Though they'd have eventually had a rousing debate about Sir. Charles Babbage's "mechanical work engines", not to mention allowing more than a few dozen transactions per month...
legendary
Activity: 1596
Merit: 1100

There's only one thing you actually need to worry about - block validation, which is explained here.

I disagree.  The core validation is a description of the rules for validating transactions.  This would include the script language and the rules for the encryption.

What is the core problem that bitcoin solves?  The distributed consensus problem.

There have been chains of hashes and chains of digital signatures before. What makes bitcoin different is that it is timestamping these digital messages, and protecting those timestamps against being reversed.  The currency aspect of bitcoin is simply a layer on top of the distributed timestamping service.  namecoin is an example of a non-currency use of distributed timestamping.

Thus, validation of blocks and transactions is important, but in a way misses (and ignores in testing) the part about bitcoin that makes bitcoin work.

legendary
Activity: 1232
Merit: 1094
Indeed, it's full of such small but significant pitfalls, that can't just be ignored. I'm still not convinced that a human can write a complete spec (of the block chain validation, not so much the network protocol, that's trivial in comparison) that catches all of these cases and completely matches the client. But I'm happy to be proven wrong, and look forward to a well-written block chain spec.

The way to do it is a "soft" fork.  As long as all transactions defined by the spec are accepted by the default client, then all you need is for the miners to reject all transactions/blocks that fail to meet the spec.  All clients would still accept the blocks since the spec is a subset of what is allowed.  Also, if a small number of miners don't do the check, then they will just generate some forks that will be orphaned quickly.

Looking at the other thread about the DER thing, it looks like there are loads of examples.  It means that it isn't just a theoretical problem.  This means putting in exceptions won't work, as the list would be very long.

The spec could just define a checkpoint and depth, and have the stricter rules apply to all blocks after that step.  Effectively, the checkpoint says that all blocks leading up to the checkpoint are considered correct and then the rules apply after that.  As long as the checkpoint has is correct, this secures the chain.
hero member
Activity: 812
Merit: 1022
No Maps for These Territories
I would claim that not even Gavin knows this completely. For example, the DER-encoded signature is not always valid according to the DER specification, because OpenSSL does some things differently. Link. This means that an implementation that rejects these invalid signatures would reject an otherwise, according to bitcoin-qt, valid block, and it would then be on a different chain than the main (bitcoin-qt) chain.
Indeed, it's full of such small but significant pitfalls, that can't just be ignored. I'm still not convinced that a human can write a complete spec (of the block chain validation, not so much the network protocol, that's trivial in comparison) that catches all of these cases and completely matches the client. But I'm happy to be proven wrong, and look forward to a well-written block chain spec.

Anyway that's why I proposed creating the spec mechanically from the actual executable code. It could catch such subtleties, too, at least if dependent libraries are included in the scope. It would be an interesting (but nevertheless very difficult) project...

But yeah, there is no financial incentive in doing either, I have to agree with that. But that's not a technical issue and kind of besides the point.
legendary
Activity: 1232
Merit: 1094
This is a nice idea, but in practice you end up creating a DoS vulnerability: if I create an invalid transaction on purpose and send it to you, then you determine it is invalid and rush to announce the fact to the rest of the network, spamming everyone with an invalid transaction.

I was thinking that you would only "warn" if it was part of the main chain.

For example, the first thing the client should do is download the block headers.  From this, you can work out the the depth for each block.

If, when you get the full block, you find an invalid transaction that is on the main chain and buried more than, say, 12 blocks deep, you would broadcast the proof that it is invalid.

Other nodes would check the proof, and if they also thought that it was on the main chain, then they would forward the proof.  Once they receive the proof, they would tag the block as invalid and exclude any chain built on it.

The big difficulty with the system is proving that information has gone missing.  A Markle root is useless if you don't have all the hashes associated with it.
full member
Activity: 154
Merit: 100
Quote
There's only one thing you actually need to worry about - block validation, which is explained here.

I disagree.  The core validation is a description of the rules for validating transactions.  This would include the script language and the rules for the encryption.

Much of what is in the above rules is, as the title says, protocol rules.  They are mostly to ensure efficient operation of the network and keep spam low.
Actually that page is fairly detailed on what constitutes valid and invalid transactions and blocks. for example:
Quote
There's also the question of why you actually want to implement a fully verifying node. The only reason I can think of is if you want to mine with it - so you don't waste your effort producing invalid blocks. Other than that, you can get all the functionality you need from SPV checks and a full blockchain (making sure to verify the merkle root) for significantly less development effort, less cpu+ram use, less likelihood of forking yourself into oblivion, for a tiny reduction in security.

It would be nice to be able to validate the system in a distributed way.  Also, the SPV system is vulnerable to double spends, since you have to trust the node you connect to.
Not really, your attacker still has to craft a whole fake block for you which must pass the current proof-of-work - this is exactly as hard as creating a legitimate block. Additionally, if you are connected to more than one node, then the other nodes are going to be telling you about new blocks much faster, and the fake one you got is going to be orphaned. So the result is that an attacker has to give up what would be a 25BTC reward for solving a block in exchange for a window of a few minutes where he may be able to take advantage of a double spend against me.


At the moment, there are 2 extremes, either you validate everything or validate almost nothing.  A system for notifying invalid blocks which proof of invalidity attached would allow distributed validation.  If you have 10k nodes and each node validates a random 1% of the transactions, then the probability of an invalid tx been missed is (0.99)^10000 = 2 * 10^(-44).
This is a nice idea, but in practice you end up creating a DoS vulnerability: if I create an invalid transaction on purpose and send it to you, then you determine it is invalid and rush to announce the fact to the rest of the network, spamming everyone with an invalid transaction. The current way this is handled is that you determine my transaction is invalid and ban me for 24 hours. The rest of the network never have to hear about my transaction at all. The other problem is that you have to perform the attached proof in order to accept it, which results in the same amount of work as if you'd just performed the verification yourself, but with a lot more network traffic.
legendary
Activity: 1400
Merit: 1013
If I were to guess I'd say the process of arriving at a spec is going to be a trial by fire.

One alternate implementation (bits of proof) looks like it will be deployed shortly, and may come to represent a significant fraction of the network. Surely there will be others as well.

Undoubtedly these implementations are not fully compatible with each other, or what anyone thinks the spec is. Some of the incompatibilities will be discovered by code review and fixed preemptively, and others won't be found until the chain forks like it did on March 11th. When that happens, everybody involved will rapidly deploy a fix and correct the bugs in their respective implementations.

My prediction is that an attempt to turn the existing behavior into a spec will never be completed. Instead multiple implementations will evolve into a spec via a series of unexpected chain fork events.
legendary
Activity: 980
Merit: 1008
As it appears to be consensus that deriving the spec is too complex and tricky for a human being
The only important part is the binary format of transactions and blocks, so that you can verify proof-of-work and merkle roots, other than that it really doesn't matter how you obtain them - you could get all the data you need from blockexplorer.com and bypass having to connect to the network at all if you really wanted to.
No, the binary format is not the only important part. It's a necessary part, but the really tricky part is defining how one should handle the data - what constitutes valid and invalid data? What should be rejected and what should be accepted?

I would claim that not even Gavin knows this completely. For example, the DER-encoded signature is not always valid according to the DER specification, because OpenSSL does some things differently. Link. This means that an implementation that rejects these invalid signatures would reject an otherwise, according to bitcoin-qt, valid block, and it would then be on a different chain than the main (bitcoin-qt) chain.

This is what is really hard to capture in the spec, and - when the spec is done and published - we could uncover additional quirks that would cause an implementation of the spec to disagree with bitcoin-qt on the longest chain, and thus become non-standard.
legendary
Activity: 1232
Merit: 1094
In fact, I think it speaks volumes about those complaining that every one of these threads talk about a 'protocol spec', when in fact the way that the network passes messages is largely irrelevant to the working of Bitcoin (aside from the fact that it's very well documented and simple).

I think it is more accurate to say that the spec should have 2 parts, network and validation rules.

You are fundamentally right, what prevents forking is the rules for accepting blocks and transactions.  If a client messes up the protocol, then little harm is done, as long as the blocks and tx info is still propagated.

Even if a client failed to forward a block or something, as long as a reasonable portion of the network does it right, the blocks and tx's will get to the miners.

Quote
There's only one thing you actually need to worry about - block validation, which is explained here.

I disagree.  The core validation is a description of the rules for validating transactions.  This would include the script language and the rules for the encryption.

Much of what is in the above rules is, as the title says, protocol rules.  They are mostly to ensure efficient operation of the network and keep spam low.

Quote
There's also the question of why you actually want to implement a fully verifying node. The only reason I can think of is if you want to mine with it - so you don't waste your effort producing invalid blocks. Other than that, you can get all the functionality you need from SPV checks and a full blockchain (making sure to verify the merkle root) for significantly less development effort, less cpu+ram use, less likelihood of forking yourself into oblivion, for a tiny reduction in security.

It would be nice to be able to validate the system in a distributed way.  Also, the SPV system is vulnerable to double spends, since you have to trust the node you connect to.

At the moment, there are 2 extremes, either you validate everything or validate almost nothing.  A system for notifying invalid blocks which proof of invalidity attached would allow distributed validation.  If you have 10k nodes and each node validates a random 1% of the transactions, then the probability of an invalid tx been missed is (0.99)^10000 = 2 * 10^(-44).
full member
Activity: 154
Merit: 100
As it appears to be consensus that deriving the spec is too complex and tricky for a human being
I really don't understand this sentiment, especially from one of the devs. I think that everyone who complains that it's too difficult have just not bothered to find out how anything works.
In fact, I think it speaks volumes about those complaining that every one of these threads talk about a 'protocol spec', when in fact the way that the network passes messages is largely irrelevant to the working of Bitcoin (aside from the fact that it's very well documented and simple). The only important part is the binary format of transactions and blocks, so that you can verify proof-of-work and merkle roots, other than that it really doesn't matter how you obtain them - you could get all the data you need from blockexplorer.com and bypass having to connect to the network at all if you really wanted to.

There's only one thing you actually need to worry about - block validation, which is explained here. If it's too much to implement in one go, start with SPV style checks and work your way up, it's better to accept bad blocks than to reject good ones. Just don't mine or relay blocks until you're sure you've got verification completed, or isolate yourself from the rest of the network by connecting to a single trusted reference client node.

There's also the question of why you actually want to implement a fully verifying node. The only reason I can think of is if you want to mine with it - so you don't waste your effort producing invalid blocks. Other than that, you can get all the functionality you need from SPV checks and a full blockchain (making sure to verify the merkle root) for significantly less development effort, less cpu+ram use, less likelihood of forking yourself into oblivion, for a tiny reduction in security.

Additionally, there is almost zero financial incentive for creating an alternative implementation - no-one will trust it if it's not open source, and no-one will buy it if it is open source.

The biggest problems in creating an alternative client are software engineering type problems, not implementing the network protocol. How do you guarantee database consistency or recover from corruption if the power goes out while you're processing a block? How do you deal with re-orgs? How do you store your blocks/headers for best performance? How do you minimize the expensive processing you perform on bad data before you realize it's bad? How do you properly deal with errors like failed verification or your DB crapping out so that you don't confuse one with the other and end up stuck on a fork? If you receive an invalid block a number of times from different peers, when do you decide that either you're under attack or maybe you're rejecting a block that's actually good, and what do you do? Etc.
I think people are just scared of having to deal with network sockets and processing binary data, if everything was dealt with as JSON there'd be no complaints (other than network usage).
jr. member
Activity: 42
Merit: 11
You can't send an email to an address in the format [email protected] ?
Actually, you can, using address literal. It's written like this: user@[111.111.111.111].
But this form IMO can't be considered "most massive" system.
hero member
Activity: 616
Merit: 500
Firstbits.com/1fg4i :)
Quote
the only other massive peer-to-peer decentralised comms system.. email
email isn't decentralized - it's relying on DNS, which is centralized.

...

You can't send an email to an address in the format [email protected] ?
jr. member
Activity: 42
Merit: 11
Quote
the only other massive peer-to-peer decentralised comms system.. email
email isn't decentralized - it's relying on DNS, which is centralized.
Quote
As it appears to be consensus that deriving the spec is too complex and tricky for a human being
No, it is not.
1) Given enough resources, it's possible
2) We can make saner spec and fix the client
hero member
Activity: 812
Merit: 1022
No Maps for These Territories
As it appears to be consensus that deriving the spec is too complex and tricky for a human being, It would be an interesting computer science project to mechanically, automatically extract a specification from the source code, or even the binary. The resulting spec would be very detailed (perhaps over-complete, but pruning unnecessary details is easier than adding missing details).
Seeing what the s2e guys have done (https://s2e.epfl.ch/) with for example, automatic reverse engineering of network drivers, it should be possible.
hero member
Activity: 718
Merit: 545
There are plenty of smart people who understand Bitcoin enough to knock up a new version..

Sure - John Carmack invented Quake, but others took it to the levels 3D shooters are now..

I too feel a new chain, with all the benefits of hindsight, may be the answer, but then again, they said the same about the only other massive peer-to-peer decentralised comms system.. email.

And it just keeps on ticking.. :-)

You can't force people off Bitcoin. Just as you can't force people to stop using email. That's the beauty of decentralisation..

I think in the end, as always, the answer will lie somewhere in the middle. BitEmail anyone ?
legendary
Activity: 980
Merit: 1008
No need to enumerate "exceptions" from spec. Apply spec only from certain block number, use checkpoint hash to set all old blocks as valid. New spec-compliant do not need to to verify this old blocks, but only correctly parse them.
Even the minimal sane behavior is surprisingly complex. But beyond that, "lock old _invalid_ blocks in with checkpoints" is really pretty hideous. You'd still have to distribute code to validate them because part of the point of the system is that none of it depends on some magic values being trustworthy. Though perhaps its preferable to relegate the legacy support to a validation tool.
I think the idea here is for third parties to develop their own Bitcoin-compatible clients. If this means that they can only verify the block of the previous two years, for example, I think that would be acceptable to most. If they want they can always use legacy code to verify all the blocks (bitcoin-qt). I think a clean specification that only verifies the blocks of the last n-years will be very valuable to Bitcoin.

I made a post outlining the same concept put forward by r.willis in another thread. I'll quote myself for further clarification:

The problem is that the rules (as defined by Satoshi's implementation) simply pass data directly to OpenSSL, so the network rule effectively is "whatever cryptographic data OpenSSL accepts", which is bad. OpenSSL has all reasons for trying to accept as much encodings as possible, but we don't want every client to need to replicate the behaviour of OpenSSL. In particular, if they add another encoding in the future (which, again, is not necessarily bad from their point of view), we might get a block chain fork.
Considering cases like these, it occurs to me that it might be desirable to - at some point - split the Satoshi code into three parts: "legacy", "current" and "next".

The "legacy" part would handle the odd corner cases described in the above quote. It would basically pull in all the relevant OpenSSL code into the legacy module (including the bugs in question), where it would stay untouched. This module would only be used to verify already-existing blocks in the chain; no new blocks would be verified with this code, as pulling in OpenSSL code into Bitcoin and managing future patches is next to impossible. This is the part that should be possible for future clients to not implement. They will miss the part of the block chain that follows the rules defined by this module, but I reckon that we really don't need 5+ year old blocks for more than archival purposes.

The "current" module would handle verifying current blocks, and be compatible with the "legacy" module. It would depend on OpenSSL still, and if changes are made to OpenSSL that break compatibility with "legacy", patches would need to be maintained against OpenSSL to work around this. This module cannot code-freeze OpenSSL, as vulnerabilities can become uncovered in OpenSSL, and no one must be able to produce blocks that exploit the uncovered attack vectors. Newly uncovered attack vectors aren't a problem for the "legacy" module, as it only verifies already-existing blocks, produced before the vulnerability in question was uncovered.

The "next" module would be backwards incompatible with the "legacy" and "current" module. This module changes verification rules to not accept, for example, the otherwise invalid signatures that OpenSSL accepts. The "next" module would have a block chain cut-off point into the future where, from that point on, a Bitcoin transaction would be considered invalid if, for example, it includes an invalid signature (was it a negative S-value?) even though it's accepted by the old OpenSSL code. It's sort of a staging module, where undesirable protocol behavior is weeded out. These protocol changes wouldn't take effect until some point in the future (a block number). The block number cut-off point would be advertised well in advance, and from this point on, the "next" module would become the "current" module, and the old "current" module would move into the "legacy" module (and no new blocks would be verified using this module). The new "next" module would then target fixing undesirable protocol behavior that was uncovered when running the previous "current" module, and would set a new cut-off time into the future, at which point new blocks would need to follow this improved protocol to get accepted.

Couldn this work? It would mean we could slowly (very slowly) clean up the protocol, while still maintaining backwards compatibility with clients not older than, say, 2 years, or however long into the future we choose the cut-off point for the "next" module's protocol to become mandatory.
hero member
Activity: 616
Merit: 500
Firstbits.com/1fg4i :)
If life-like tests are run on testnet, would there really still be a risk of unintentional hard-forking?
uh. Testnet didn't prevent the hardforking we created in Bitcoin 0.8 even though we believe that the relevant cases were tested already. (there were already super large blocks there).
But did it had a similar ratio or outdated clients and miners as the live net?
Pages:
Jump to: