Formalised Bitcoin Protocol Standard - page 4.

CIYAM

legendary

Activity: 1890

Merit: 1086

Ian Knowles - CIYAM Lead Developer

Quote from: Mike Hearn on January 05, 2013, 09:00:54 AM

Quote

Maybe documenting the protocol could lead to fixing said bugs at an agreed-upon block height leading to a clearer and more consistent protocol. I really don't see the harm in documenting what happens under the hood, bugs included.

Such a document would end up being nearly as long as the source code, and not much easier to read. I'm all for adding more detailed comments to the source, though.

So you really think rather than using RFC 1939 one should read through someone's source code (and worse yet "comments" that the compiler ignores) to work out how to do POP3?

I am seriously now beginning to wonder whether anyone here has worked on large scale software projects at all (say > 100 devs and say > 100 million USD).

Mike Hearn

legendary

Activity: 1526

Merit: 1134

Quote from: davout on January 05, 2013, 07:44:46 AM

For one that is true only if miners are to run alternative implementations.

No, there are cases where other people can suffer from chain splitting bugs too. Let's say you're a high volume merchant or payment processor that runs an alternative implementation. I can make a transaction which it believes is invalid but the rest of the network believes is valid, for whatever reason. Once that gets included into a block, your business will grind to a halt because it'll split off onto a chain that no longer gets extended, or only gets extended very slowly, meaning you can't process payments anymore until the problem is noticed and you find and fix the conformance bug.

If this causes you to lose X coins per hour of business, then I can try to anonymously extort you for a bit less than X by claiming I know of such a bug in your software. It's very hard to prove it's not there. You'd have to have a lot of confidence in the robustness of the testing of your implementation.

What about if you accept invalid transactions? If you're providing a good or service in return for unconfirmed transactions then this obviously can undermine your risk model because you'll receive a transaction that you believe is valid, not a double spend, you don't see any double spend alerts or conflicting transactions - but it'll never confirm and I can still spend the money. I don't have to wait and mine a block any more like I would if doing a Finney attack, so it's much cheaper.

Quote

Maybe documenting the protocol could lead to fixing said bugs at an agreed-upon block height leading to a clearer and more consistent protocol. I really don't see the harm in documenting what happens under the hood, bugs included.

Such a document would end up being nearly as long as the source code, and not much easier to read. I'm all for adding more detailed comments to the source, though.

Quote

What I want to see are competing implementations of a clearly defined protocol, not a centralized black-box maintained by a few who know exactly which bugs should be treated as features.

Unfortunately, what you've got is the latter and it's not really easy to fix. We keep discovering new odd edge cases where what the software does, isn't what you'd actually expect given a description of how it's meant to work.

Quote

The English language, combined with diagrams, tables etc. are designed for humans to understand and are thus the ideal format, as opposed to C++. And the Satoshi client is not very friendly to human eyes, so indeed another implementation could make it much easier to understand, but why waste time on doing that if you could write a human-friendly specification?

I think the Satoshi client is quite straightforward to read, for the most part. A few parts are somewhat inscrutable because they're written very tightly, but unfortunately there is no substitute for just puzzling it out - if you write a description of what you think the code does it may not match reality. We have seen this demonstrated several times, like with the merkle tree calculation. What people thought it did, wasn't quite what it actually did. If you simply duplicated Satoshis algorithm, you would duplicate his bug too so no chain-split attack would have been present. If you re-implemented it based on an English description, you'd have introduced an exploit.

Quote

Sorry? People can re-implement bitcoin. You know that, obviously.

I've implemented the SPV mode (not re-implemented, as for a long time there was no other implementation of this). Matt Corallo went ahead and has extended my work to do full validation. He's done a TON of testing, very in depth testing, despite that I would be very concerned if I heard that a big mining pool or BitPay or whoever was using it. At least in it's current state. It's not clear to me how much work would be required until I felt comfortable with high value operations using bitcoinj in full mode, and the documentation when 0.7 is released will make that clear.

CIYAM

legendary

Activity: 1890

Merit: 1086

Ian Knowles - CIYAM Lead Developer

I really think that MatthewLM has a very valid point here - if the C++ standard was just "read GCC" then I think the language would never have even been used to write Bitcoin (or anything else of value just like languages such as D).

Although such documents are very hard to write there is a point in having such standards (or do you guys prefer Microsoft or Google to *make* standards in code rather than using ISO ones?).

Anyone thinking that "Bitcoin is an exception" is kidding themselves (or are wanting to become the next Microsoft or Google themselves most likely).

MatthewLM

legendary

Activity: 1190

Merit: 1004

Quote from: Steve on January 04, 2013, 09:44:19 PM

Unit Tests are the final and most complete form of behavior specification

The unit tests are just there to ensure things are working as expected. They aren't designed to provide a reference to how things are supposed to work.

Quote from: Steve on January 04, 2013, 09:44:19 PM

It's best when both are expressed in languages free of a lot of syntactic noise. C++ is far from ideal in that regard, but you live with compromises born out of practicality. In languages that are less encumbered by syntactic noise, this perspective is much more readily apparent. The tests and the implementation are so easily comprehensible that other documentation isn't worth the effort to maintain (and can even be a detriment).

The English language, combined with diagrams, tables etc. are designed for humans to understand and are thus the ideal format, as opposed to C++. And the Satoshi client is not very friendly to human eyes, so indeed another implementation could make it much easier to understand, but why waste time on doing that if you could write a human-friendly specification?

Quote

I'm not really surprised though, this has come up a few times already and the answer was already pretty much along these lines.

For some reason I couldn't find very much when I searched for it, except vague references.

Quote

The fact is that re-implementing Bitcoin exposes not only you, but all participants, to a class of "chain splitting" bugs that don't really exist in other network technologies, or at least are nowhere near as severe.

All the more reason to make the protocol as clear and easy to understand as possible.

Quote

The browser wars of the 90s were bad, but at least developers could check which browser the user ran and adapt to it on the fly. The Bitcoin equivalent is dramatically worse.

Things have gotten better when things have become more standardised.

Quote

When you reimplement Bitcoin, it's not enough to build things as you think they should work. You have to implement them exactly as Satoshi did, including all his bugs. And because some parts of the protocol are directly exposed to underlying libraries like OpenSSL, you have to match their behaviour exactly as well, including all their bugs. Failure to do so can lead to people losing money.

Once again, more reason to have clear documentation. Though you do not need to do everything the same way as the Satoshi client. You only need to conform to the protocol requirements.

Quote

In this one, it will simply mislead people into thinking they can reimplement Bitcoin

Sorry? People can re-implement bitcoin. You know that, obviously.

Quote

Note that SPV nodes are much less risky. But Matthew isn't implementing an SPV client.

I'm implementing code which can be used for full validation or headers only validation. My plans for a client will includes a mixture between the two, offering the best of both worlds. The block-chain will thus be checked by headers-only validation against full validation done by a server.

davout

legendary

Activity: 1372

Merit: 1008

1davout

Quote from: Mike Hearn on January 05, 2013, 06:40:15 AM

The fact is that re-implementing Bitcoin exposes not only you, but all participants, to a class of "chain splitting" bugs that don't really exist in other network technologies, or at least are nowhere near as severe. The browser wars of the 90s were bad, but at least developers could check which browser the user ran and adapt to it on the fly. The Bitcoin equivalent is dramatically worse.

For one that is true only if miners are to run alternative implementations. Secondly I find your statements a little FUDdy because even in the case of a chain split, most transactions would make it to both chains until it's resolved.

Quote from: Mike Hearn on January 05, 2013, 06:40:15 AM

When you reimplement Bitcoin, it's not enough to build things as you think they should work. You have to implement them exactly as Satoshi did, including all his bugs. And because some parts of the protocol are directly exposed to underlying libraries like OpenSSL, you have to match their behaviour exactly as well, including all their bugs. Failure to do so can lead to people losing money.

Maybe documenting the protocol could lead to fixing said bugs at an agreed-upon block height leading to a clearer and more consistent protocol. I really don't see the harm in documenting what happens under the hood, bugs included.

Quote from: Mike Hearn on January 05, 2013, 06:40:15 AM

At some point, if you realize you have to match the behaviour of another codebase exactly, down to the tiniest detail, you realize that the only precise enough specification for that is the source code. Which means if you can't read C++ fluently you can't reimplement Bitcoin, yes, but who cares? If you can't keep up, don't step up.

Maybe that's the sign that the specification-software is getting too convoluted, which will ultimately lead to unmaintainable poor quality software.

What I want to see are competing implementations of a clearly defined protocol, not a centralized black-box maintained by a few who know exactly which bugs should be treated as features.

Putting all your eggs in a single basket is never a good idea (especially when they're golden eggs), what happens the day a critical exploit is discovered in the reference implementation ? Does everything collapse ?

Oh, and there's a reason why Bitcoin is still not 1.0 Wink

Mike Hearn

legendary

Activity: 1526

Merit: 1134

It bothers me that this topic keeps coming up. The fact that Bitcoin is different to other technologies isn't intuitive but by the time you're writing an actual implementation, it should be obvious. Maybe it's worth reading the thread with grau about his reimplementation also?

The fact is that re-implementing Bitcoin exposes not only you, but all participants, to a class of "chain splitting" bugs that don't really exist in other network technologies, or at least are nowhere near as severe. The browser wars of the 90s were bad, but at least developers could check which browser the user ran and adapt to it on the fly. The Bitcoin equivalent is dramatically worse.

When you reimplement Bitcoin, it's not enough to build things as you think they should work. You have to implement them exactly as Satoshi did, including all his bugs. And because some parts of the protocol are directly exposed to underlying libraries like OpenSSL, you have to match their behaviour exactly as well, including all their bugs. Failure to do so can lead to people losing money.

At some point, if you realize you have to match the behaviour of another codebase exactly, down to the tiniest detail, you realize that the only precise enough specification for that is the source code. Which means if you can't read C++ fluently you can't reimplement Bitcoin, yes, but who cares? If you can't keep up, don't step up.

Having detailed protocol documentation is something I'd agree with in any other project except this one. In this one, it will simply mislead people into thinking they can reimplement Bitcoin. Unless they are willing to make absolutely massive effort and take serious risks, they can't.

Note that SPV nodes are much less risky. But Matthew isn't implementing an SPV client.

davout

legendary

Activity: 1372

Merit: 1008

1davout

I disagree with the folks that find tons of reasons not to document. I'm not really surprised though, this has come up a few times already and the answer was already pretty much along these lines.

As much as I understand that the core contributors don't really feel like doing it for various reasons (they already write tests and contribute code after all), I'm really surprised that no one really seems to encourage MatthewLM to go forward with it.

Yes, tests are good, but add a complete spec and it gets even better. Yes, it's a fact that the main implementation is currently both the specification and the implementation, nobody can argue that. However, arguing that it's a good thing, that it shouldn't change, that a full protocol documentation is unnecessary isn't quite the same thing IMHO.

Steve

hero member

Activity: 868

Merit: 1008

I agree with Gavin's point of view. Unit Tests are the final and most complete form of behavior specification and the implementation is the final and most complete form of design. It's best when both are expressed in languages free of a lot of syntactic noise. C++ is far from ideal in that regard, but you live with compromises born out of practicality. In languages that are less encumbered by syntactic noise, this perspective is much more readily apparent. The tests and the implementation are so easily comprehensible that other documentation isn't worth the effort to maintain (and can even be a detriment).

Check out OMeta and some of the papers at vpri.org if you're really into this sort of thing...with OMeta, they managed to create a system that could almost directly execute TCP/IP from the RFCs. It was a complete TCP/IP implementation in under 200 lines of code (including the parser specification for the RFC ascii art). See this summary about it: http://www.moserware.com/2008/04/towards-moores-law-software-part-3-of-3.html ...to me, this is a proof point that code really should be regarded as self documenting (with little more than annotations to accompany it)...if it's too challenging for people to easily comprehend, it points to a shortcoming of the language, not of the concept that the code is the documentation.

stevep

jr. member

Activity: 30

Merit: 4

I'm also concerned with needing to refer to the reference client source code but the reference client is called reference for a reason

My concerns are that as the reference client struggles to stay relevant for end users the core developers focus on performance rather than for use as a reference.
Performance and readability do not tend to go hand in hand.

Is the creation of better protocol documentation a better solution?
I personally think so, as the core aspects of the protocol are effectively set in stone we should be able to document them in an accessible/understandable manner.

As Gavin identified the creation and maintenance of specs however is time consuming.
The reference client developers are free to spend there time however they feel is best. There are always issues to be fixed and new features to be implemented. We wouldn't want to stop the reference client from moving forward.

I'd like to offer my help in updating/maintaining the documentation. I've made a few minor edits to the Bitcoin wiki for some of the under specified or unclear areas that I've found.

Where do you feel the content of the Wiki currently falls short?

In my experience I've found that the status of some of the BIPs are out of date and I've tracked a few of them down and updated their status.
Once a BIP is accepted I think we should aim to roll its implications into the base documentation.
This information is recoverable by comparing the reference implementation to the BIPs.

In what ways do you feel that the Reference client falls short as use as a reference?

In my experience something the reference client does not capture well are the "gotchas" that have been solved over time that are relevant to all Bitcoin peer implementations.
When reading the reference code you might not realize that a piece of code evolved to its current state to solve a serious issue and that the naive implementation wouldn't be sufficient.
Again this information isn't lost we can recover it from the history and issue tracker and present it in a more accessible way.

bullioner

full member

Activity: 166

Merit: 101

Quote from: Mike Hearn on January 02, 2013, 12:05:08 PM

The reality of how Bitcoin works means that Satoshis code is the protocol definition.

That sounds like a statement that might apply at a particular moment in time. It isn't an argument against specifying the protocol separately from a particular implementation in future, though it is obviously something to take account of while writing the specification.

There was probably a time in 1990 when the reality of how the web worked meant that the CERN httpd implementation was the protocol definition for HTTP (I don't know for sure, but this a pattern seen with many protocols and other interface elements: original PGP implementation -> OpenPGP protocol; this was more or less the way ssh went as well regarding initial implementation to decent public protocol definition too). That doesn't mean it was a bad idea to create standardisation processes once the technology took off and there was interest in multiple compatible implementations, and in managing changes / extensions.

Quote from: Gavin Andresen on January 02, 2013, 04:37:44 PM

[...]
That's why I spent a lot of time over the past year developing test cases and tools that you can run your code against instead of writing specs.

That's good stuff too -- but is certainly not an argument against trying to get the protocol specified in some sort of, for example, IETF-RFC-like document. Specifications and test suites go together really well, but are not alternatives for one another. Test suites are sometimes good for clarifying intent where a spec's ambiguous, and as you say above they're also great aiding implementors with completeness and correctness.

Bitcoin needs a protocol spec for the technology to mature. One doesn't want to do it while the design's in flux, but Bitcoin's past that stage now. Any incompatible design changes would be brought about as a new crypto currency rather than as changes to existing Bitcoin.

It is frankly pretty worrying to see Gavin and Mike be so dismissive of MatthewLM's suggestion. Hopefully some others involved have more wisdom and experience in protocol engineering at Internet scale.

MatthewLM

legendary

Activity: 1190

Merit: 1004

Quote from: DannyHamilton on January 02, 2013, 01:19:48 PM

Isn't most of the protocol right here?

https://en.bitcoin.it/wiki/Protocol_specification

You have the format of the messages on there but nothing about the network operation, validation, scripts etc. THere are other wiki articles that have more information but the information is incomplete and scattered around.

Quote from: Gavin Andresen on January 02, 2013, 04:37:44 PM

In my experience, developers are really good at either ignoring documentation or interpreting it in a way different than the way the author intended.

Yes maybe, but surely it's better than developers trying to decipher source code and learn bits here and there?

Quote from: Gavin Andresen on January 02, 2013, 04:37:44 PM

And spec authors are really good at getting details wrong, no matter how careful they are. And they're really bad at keeping track of changes.

That's why I spent a lot of time over the past year developing test cases and tools that you can run your code against instead of writing specs.

I may just be cynical because I spent so much time in 1997 working on the ISO/IEC-14772-1 Official, Formal Standard.

Well since bitcoin is an open protocol, there can be any number of people contributing to a bitcoin protocol specification, and anyone could spot mistakes and suggest improvements. It doesn't have to be bureaucratic or closed in nature.

Mike Hearn

legendary

Activity: 1526

Merit: 1134

Flying 3D sharks from the past!

Gavin Andresen

legendary

Activity: 1652

Merit: 2311

Chief Scientist

In my experience, developers are really good at either ignoring documentation or interpreting it in a way different than the way the author intended.

And spec authors are really good at getting details wrong, no matter how careful they are. And they're really bad at keeping track of changes.

That's why I spent a lot of time over the past year developing test cases and tools that you can run your code against instead of writing specs.

I may just be cynical because I spent so much time in 1997 working on the ISO/IEC-14772-1 Official, Formal Standard.

DannyHamilton

legendary

Activity: 3472

Merit: 4801

Isn't most of the protocol right here?

https://en.bitcoin.it/wiki/Protocol_specification

Mike Hearn

legendary

Activity: 1526

Merit: 1134

The reality of how Bitcoin works means that Satoshis code is the protocol definition.

MatthewLM

legendary

Activity: 1190

Merit: 1004

Well by formalised I meant, but together professionally into a specification document (as opposed to now). I din't mean much more than that. It doesn't necessarily have to go by any usual conventions, if that means the document can be both easy to follow and fully detailed.

2112

legendary

Activity: 2128

Merit: 1073

Quote from: MatthewLM on January 02, 2013, 08:53:52 AM

It would be very useful and wise, in my opinion, if there was a formalised document describing the protocol to every detail, but in a way that is easy for anyone to follow.

Perhaps if you post an example of a specification that is both "formal" and "easy for anyone" we could make a better comments. Common way of thinking leans toward saying that those are polar opposites.

Anyway, the major points against are:

0) extremely expensive
1) a lot of work with comparatively little benefit
2) hard to prove internal consistency
3) hard to verify consistency with non-formal, but actual implementations

When asked for pitfalls of "formal modeling" I nowadays point towards the ARM Architecture Manual and the way how multi-million company with clearly clever and well motivated staff ended up with BE32 and BE8 (a.k.a. just plain BE): two largely incompatible ways to implement big-endianess.

gmaxwell

staff

Activity: 4284

Merit: 8808

Quote from: grantbdev on January 02, 2013, 10:24:24 AM

What about Satoshi's paper on Bitcoin? Isn't that the official specification?

The paper is a design overview, not a specification. It presents the argument that something like bitcoin can work at all, but doesn't tell you the details of building something compatible with it.

grantbdev

sr. member

Activity: 292

Merit: 250

Quote from: MatthewLM on January 02, 2013, 08:53:52 AM

I've thought about this and I'm surprised I've not seen (or can find) very much discussion eluding to this. At the moment, for anyone that wants to understand the bitcoin protocol, they would be able to use the bitcoin wiki somewhat, as well as forums and other websites but ultimately have to look at the source code of bitcoin implementations, or rely on the knowledge of other people.

It would be very useful and wise, in my opinion, if there was a formalised document describing the protocol to every detail, but in a way that is easy for anyone to follow. It would be a document that would be used as a reference for developers and would reflect all of the agreed (In majority use/Majority mining power) protocol features. The protocol standards document would then be amended as the protocol is modified. A separated set of documents could describe other features which are not core to the protocol such as wallet formats or whatever.

I had a hunch this would be something the Bitcoin Foundation was set up for, but it seems not. Do other people think this would be very useful to work upon? Otherwise the information will continue to be disorganised and a nightmare to piece together.

What about Satoshi's paper on Bitcoin? Isn't that the official specification?

MatthewLM

legendary

Activity: 1190

Merit: 1004

I've thought about this and I'm surprised I've not seen (or can find) very much discussion eluding to this. At the moment, for anyone that wants to understand the bitcoin protocol, they would be able to use the bitcoin wiki somewhat, as well as forums and other websites but ultimately have to look at the source code of bitcoin implementations, or rely on the knowledge of other people.

It would be very useful and wise, in my opinion, if there was a formalised document describing the protocol to every detail, but in a way that is easy for anyone to follow. It would be a document that would be used as a reference for developers and would reflect all of the agreed (In majority use/Majority mining power) protocol features. The protocol standards document would then be amended as the protocol is modified. A separated set of documents could describe other features which are not core to the protocol such as wallet formats or whatever.

I had a hunch this would be something the Bitcoin Foundation was set up for, but it seems not. Do other people think this would be very useful to work upon? Otherwise the information will continue to be disorganised and a nightmare to piece together.

Topic: Formalised Bitcoin Protocol Standard - page 4. (Read 10544 times)