Request for Standardization | Bitcointalksearch.org

ThePiachu

sr. member

Activity: 444

Merit: 313

If you have a group of devoted people that choose to maintain the code in a given language, it doesn't really matter what the language is. End-user cares only if the program is easy to install, does what it needs to and if it efficient at it (all are yes for the main client). Developers need a language that they are comfortable using, and is suitable for the job (has the right libraries, can be altered quickly in case of a bug, etc).

It is nice to see different versions of Bitcoin being written in different languages by different people, as it gives every programmer who wishes to understand the nitty-gritty details different reference material they can use, and some readily available code they can use to alter the behaviour of the client to their needs in the programming language they understand.

2112

legendary

Activity: 2128

Merit: 1073

Quote from: Vandroiy on November 11, 2011, 11:23:15 AM

But this is totally out of place for Bitcoin. Aspect-oriented programming is the way to go here.

Yeah, totally out of place. What Bitcoin needs is nothing less than silver-bullet-oriented programming. I mean what is the better proven stable store of value: the silver quarter minted in 1960 still buys you a gallon of gasoline. And silver is the best conductor of electricity thus assuring the success in the modern world of fast-paced e-commerce.

But seriously: CS curricula everywhere need to be extended to include mandatory satire-writing classes. It is just a requirement for a well-rounded education in computer science.

Vandroiy

legendary

Activity: 1036

Merit: 1002

I agree fully that C++ is not a suitable language to maintain the main BTC client in. Yes, I was doing C++ programming in the past, and I see there is some kind of "coolness" in being the "advanced" hacker who juggles in the subtleties of template programming or the insanity of pointer magic, implicit type casts and the likes.

But this is totally out of place for Bitcoin. Aspect-oriented programming is the way to go here. We need contracts, and anything else that improves security and makes outcomes crystal-clear. This is not some toy project where one can just fix a bug after it became apparent, and the "security" offered in C++ is beyond outdated. Most people have probably never seen a competition on harmless-looking backdoor programming. It's just marvelous what possibilities C alone offers in terms of "oops, this pointer is going somewhere else now". Attacks on Bitcoin will be a whole different level than what anyone has faced in the past.

That said, it is a very good thing to have multiple clients. If one gets wiped out, then maybe Bitcoin survives if enough others remain (or were offline at the point of the attack).

Gavin Andresen

legendary

Activity: 1652

Merit: 2314

Chief Scientist

Quote from: ThePiachu on November 10, 2011, 07:43:30 AM

@Gavin Andresen

Quote from: Gavin Andresen on July 20, 2011, 06:34:37 AM

I've been making slow but steady progress on my at-the-network-level testing tool. [...]

What's working: python-based code that serializes/deserializes messages in both bitcoin's binary format (to talk to the node being tested) and JSON (so it is easy for us humans to tweak/examine test data). Connecting and requesting all blocks.

Where can I get my hands on that? I was looking for something like this for a long while.

https://github.com/gavinandresen/Bitcoin-protocol-test-harness
Start with dumpblocks.py

ThePiachu

sr. member

Activity: 444

Merit: 313

A very interesting topic to follow.

@Forp:

I am looking forward to seeing the translated version.

@Gavin Andresen

Quote from: Gavin Andresen on July 20, 2011, 06:34:37 AM

I've been making slow but steady progress on my at-the-network-level testing tool. [...]

What's working: python-based code that serializes/deserializes messages in both bitcoin's binary format (to talk to the node being tested) and JSON (so it is easy for us humans to tweak/examine test data). Connecting and requesting all blocks.

Where can I get my hands on that? I was looking for something like this for a long while.

Forp

full member

Activity: 195

Merit: 100

As someone who is teaching advanced level computer science for a living (and is fairly fluent in C++) my thoughts on this are:

The Satoshi client contains a particularly advanced style of C++ programming which occasionally is hard to read but is very refined, highly thought through and very efficient.

Currently I am working on source code documentation, reading line by line, adding some assertions and test and peek and debug code and writing down what each line really does. I did so for some 60-70% of the code. In parallel I am working on a documentation which describes the code in plain language; there are some 80 pages now, in draft version. This is a description, not an RFC-like spec.

Originally I also felt the strong desire of reimplementing this, also in a different language. Right now, after some 400 hours spent in reading BTC C++ code, I have reversed my opinion: Reimplementing BTC is probably a bad idea, since the client has a very large number of tricky details which one is likely not to see upon first or second reading (and which are not documented, neither in the code nor in the wiki nor in the rules). Moreover, these aspects are hard to describe in a specification of RFC-like style (which is the reason that I recently introduced a "advanced p2p concepts" chapter in my book on BTC).

However, some reworking of C++ would help the client. The problems I see are:

* Too many classes in one file, bad source code to file mapping
* Too much global state, bad encapsulation and isolation
* Many preprocessor constructions which make the code hard to read
* The class structure is smart and efficient but not quite well adapted to the conceptual elements of BTC. Obviously, the code was growing throughout the project and there was no clear object oriented analysis at the beginning. This probably is the biggest problem since this would require some far reaching code refactoring completely accross the current object structure.

Moreover, I think that C++ is the exactly right language for this (although it leads to a much larger code base than in a Python implementation). Python is the "wrong way of thinking" for this type of code - and there is the efficiency argument as well.

I am happy to share thoughts and drafts of my book at a bit later stage but advice that currently the text is in German. There will be an english version later, but currently it is in German since I need it in German :-)

netrin

sr. member

Activity: 322

Merit: 251

FirstBits: 168Bc

Quote from: etotheipi on July 19, 2011, 02:09:33 PM

You forget the part where it's in very archaic C++. I work in C++ for a living, yet I still have a very tough time following it, much less developing for it, correctly.
...
I'd much rather spend the time upfront developing such a client, instead of spending 80% of my time in the C++ worrying about syntax, templates, polymorphism/inheritance, casting issues, linker errors, etc.

As someone who does not work in C++ for a living, but would none the less like to understand and contribute to the Satoshi client, I humbly confess, I've been unable to compile the latest code, much less add patches or implement my own. I look forward to separate components, standard dependencies, better comments, unit testing and applaud recent changes in this direction.

Mike Hearn

legendary

Activity: 1526

Merit: 1134

The Bitcoin Consultancy implementation seemed very far from complete when I looked. As far as I know, BitCoinJ is the most complete alternative implementation.

I agree that a regression test suite will be a fantastic asset. I think some refactorings will be needed to make the code base more testable though, even if it's purely talking to an implementation over a network connection. In BitCoinJ I introduced a "unit test network" with difficulty set to the lowest possible (half of all block solutions are valid).

This means test chains can be constructed quickly. If bitcoin doesn't have such a mode, the test suite is going to end up with lots of big data files containing hand crafted binary chains, which isn't much fun.

Gavin Andresen

legendary

Activity: 1652

Merit: 2314

Chief Scientist

I've wanted a libbitcoin since... well, since I first started browsing the bitcoin source code.

The consensus of the core bitcoin development team is to move towards a libbitcoin in small-ish, incremental steps, NOT to move to a full-blown API in one fell swoop.

The Bitcoin Consultancy folks disagree with that approach, and are moving ahead with a libbitcoin of their own that it rewritten from scratch, and I suspect there will be at least two or three other alternative implementations rewritten from scratch popping up over the next year or so. Which is why I'm spending a lot of time thinking about and working on cross-implementation testing.

(I'm supposed to be on vacation here in Australia, but I'll try to find some time to upload what I've done so far to github).

etotheipi

legendary

Activity: 1428

Merit: 1093

Core Armory Developer

So I've decided I was partially wrong, and also had a change of heart on this topic.

(1) I don't I have the drive to work on a full-client implementation. I am still not too familiar with the networking part, but some of the nuances of the algorithms are biting me after I thought I had finished them (I gotta make some more elaborate unit-tests...). I can't imagine getting 100% of the client implemented perfectly, at least not without a big team.
(2) I am starting to question how "safe" it is for others to implement this... ever. As Gavin said, what does it take to avoid a new client forking the blockchain? Well, maybe we can't.

But, we can side-step both of these problems with a bit of extra work from the community. I believe that the best thing to do is to separate out the minimal amount of code from the reference client to create a "node" or "engine", and isolate that from the GUI. Then, spend the effort to make this reference BTC engine accessible in every language out there, so that the official implementation can be used as a drop-in backend for any budding project. It could be accessible as a background process via localhost sockets, or shared-library/dll which can be integrated into your target language (such as wrapping it with SWIG to get it into python).

This solution appeases everyone's concerns, here. The community can decide on an "interface" or set of interfaces that can be implemented for "correct" access to the official BTC engine, and then developers can work on getting this interface implemented in their language and document it. New developers won't have an excuse not to use it, if it's easily accessible in their favorite language (unless they really want to change the underlying mechanics). Then, this guarantees that there is a single, official engine that follows the community's standards, and most people will use it.

-Eto

etotheipi

legendary

Activity: 1428

Merit: 1093

Core Armory Developer

Gavin, I am very interested to help out with this. I think unit-tests are great, but I wasn't sure how to do it with networking. It sounds like you already have some ideas. I have started a test-suite in python (for python) for what you mentioned above, that involves endianness checks, serialization/unserialization of different object types, example transactions, blocks and addresses to verify consistency, and a ECDSA signature verification test. I'd like to add some scripting-engine tests... my understanding is that even if non-standard scripts aren't generally accepted, the client has to be able to evaluate them in case they show up in the blockchain.

As for people not following the standardization documents: I don't think it really matters. Some people are going to want to do their own thing, and not let "standards documents" get in their way. But at least when they try to release their client and there is a cold reception for it... we have somewhere to point them if they want to get serious. There won't be any "Why does my client have to do this?" or "I didn't know I needed to accommodate [some complicated thing requiring a lot of code rework]". And hopefully a little less of the community saying "well, just look at the C++ code, it's all in there." I think that C++ code is a terrible way to document something that really needs to be standardized...

Additionally, I think Mark's concerns for me are more easily communicated: "you do realize that you have to do all this stuff, right?" Instead of him just telling me there's a lot to do and I have no way to gauge before jumping in.

Gavin Andresen

legendary

Activity: 1652

Merit: 2314

Chief Scientist

I've been making slow but steady progress on my at-the-network-level testing tool. I don't put a lot of faith in standards documents-- it is too easy to misinterpret or ignore them. Good implementation-independent test suites seem like a better investment of time.

What's working: python-based code that serializes/deserializes messages in both bitcoin's binary format (to talk to the node being tested) and JSON (so it is easy for us humans to tweak/examine test data). Connecting and requesting all blocks.

Still todo: actually start writing test cases, figure out what other tools I need to write to create good test cases, and start systematically going through the "rules of bitcoin" and devising tests to make sure the rules are being followed-- starting with the super-important "get this wrong and you split the blockchain" rules.

I hope to recruit some of you to help out with all that... I'll be creating a github project with my progress so far very soon.

wumpus

hero member

Activity: 812

Merit: 1022

No Maps for These Territories

Quote from: etotheipi on July 19, 2011, 04:41:16 PM

To get back on topic: I will look into the wiki page linked before, and see how I can contribute to it. I think, regardless of our discussion on the virtues of alternative clients, the information should be organized and available to those who want to try it.

Sure, don't interpret this as us trying to stop you

The last large open source project I was involved in was a game engine, a branch of development in which many people have a bad case of NIH syndrome, so I'm a bit wary.

But you seem to have this thought out well, so why not try, and improve the wiki on the way.

etotheipi

legendary

Activity: 1428

Merit: 1093

Core Armory Developer

@Mike: I guess we'll have to respectfully disagree, and not because of reasonable disagreement, but my naivety on the matter won't be solved through debate/discussion. I have to try it and find out. The worst thing that can happen is I will become an "expert" on the topic and know better where to put my effort in the future (which has already started, now that I've got most of the scripting and ECDSA code implemented). My effort isn't well-spent on the C++ code, because there's too many uncertainties hitting my brain at once, I don't understand anything more or do anything productive. When I understand the algorithms, I'll consider battling the C++.

Btw, I have been working 20-25 hours per week on this, for less than 2 weeks. At that rate, 2 more months I think is reasonable. Even if you are right and it doesn't work, I find it fascinating, educational and rewarding to learn the finer details of the system, and I don't think there's a more efficient way to get a deep understanding of it.

To get back on topic: I will look into the wiki page linked before, and see how I can contribute to it. I think, regardless of our discussion on the virtues of alternative clients, the information should be organized and available to those who want to try it.

WakiMiko

newbie

Activity: 59

Merit: 0

Quote from: etotheipi on July 19, 2011, 12:58:41 PM

Some people want to just spend and receive coins without even seeing address strings or wallet details. Others will be power users, and want more fine-grained control over their wallet(s), keys, encryptions, and super-security techniques. And there's a million shades of gray in between. I don't think all this can be achieve with one client.

I think a nice future addition to the Satoshi client would be some kind of EXPERT MODE that can be turned on in the options (disabled by default) that enables all that (and possibly more).

Mike Hearn

legendary

Activity: 1526

Merit: 1134

There are quite a few pages on the wiki already, I think you can probably contribute to some of them. The protocol and network rules are quite well documented, if there are gaps please do fill them out.

I still think you are underestimating the amount of work involved. You say "two months" but that's nowhere near enough if you want to do it right (eg, with a solid test suite). I've been working on BitCoinJ for six months and it's not a full implementation. That's with around a half to full day each week, with occasional spurts of more.

It's possible I'm a stupid and very slow programmer, but I think it's more likely that two months is an underestimate unless you are working on it full time. BitCoinJ/# is by far the most complete alternative implementation. John is right, we have seen about a billion announcements of Python implementations that get as far as the network protocol and then stop.

Quote

It's lack of speed is the only downside, which is not really a concern for the BTC network.

I strongly disagree. Keeping pace with the Bitcoin network is a CPU and IOP intensive task that involves the manipulation of complex binary data structures. An interpreted language will hit the wall faster than the Satoshi client will, and as Python isn't thread safe you won't be able to buy time with multi-threading.

Bitcoin is a rather performance sensitive application, if you ask me.

Quote

Binary I/O, serialization/unserialization is as simple as it gets, networking/sockets are much more pleasant, exception handling is a breeze, and the ability to pull in external libraries quickly and painlessly are all fantastic features for exploring design ideas.

These aren't the things that make an implementation difficult. The difficulties lie in things like OP_CHECKSIG, though that is only one of the many barrels of laughs you'll encounter along the way. As you know, Bitcoin involves complicated and very sensitive algorithms.

Nobody is saying there should be only one client. Just that many have tried to reimplement the full thing and nobody has done so yet.

I suggest you start by implementing a lightweight node, ie one that does SPV like BitCoinJ. Once you reach that milestone, you can press forward and do the additional verifications and indexing needed for a full node.

etotheipi

legendary

Activity: 1428

Merit: 1093

Core Armory Developer

By the way, I've learned from working with other people's algorithm designs at work, the only way to actually, really understand the complexities of a complicated algorithm, is to try implementing it yourself. Without doing it, you just can't realize the subtleties/pitfalls/nuances of the system without actually battling them yourself. I would argue that having people dive right into an existing codebase (especially a deep C++ project) is dangerous for a security-sensitive piece of software.

Prime example: if I hadn't gone through the effort to actually implement scripting on my own, I never would've realized the complexities of OP_CHECKSIG and created this: http://forum.bitcoin.org/index.php?topic=29416.new;topicseen#new.

The point here is that you complain about wasted effort, re-doing work. But I argue that not only will new clients good for BTC as a whole (if done right), it breeds expert developers who can further the technology... which is exactly what I am trying to do on these forums! Perhaps one day I will take my expertise and contribute it to the Satoshi client. But battling algorithm details with complex C++ jargon simultaneously is a recipe for breaking things.

Perhaps my request is better made by starting a wiki page for what I want... seed it with the fields that I think should be there, even if they're wrong, and let the community hit equilibrium on what a "full node" means. Any recommendations for where to start this? Perhaps the page linked earlier would be a perfect starting point.

-Eto

etotheipi

legendary

Activity: 1428

Merit: 1093

Core Armory Developer

You forget the part where it's in very archaic C++. I work in C++ for a living, yet I still have a very tough time following it, much less developing for it, correctly. My goal is to produce a python implementation, because python code can be extremely simple, functional, and readable all at the same time, with one-tenth the lines of code (part of the reason it's readable). It's lack of speed is the only downside, which is not really a concern for the BTC network. Binary I/O, serialization/unserialization is as simple as it gets, networking/sockets are much more pleasant, exception handling is a breeze, and the ability to pull in external libraries quickly and painlessly are all fantastic features for exploring design ideas.

I'd much rather spend the time upfront developing such a client, instead of spending 80% of my time in the C++ worrying about syntax, templates, polymorphism/inheritance, casting issues, linker errors, etc.

-Eto

wumpus

hero member

Activity: 812

Merit: 1022

No Maps for These Territories

All of those could be implemented in the current client and would be welcome patches...

Really if all the people trying to rewrite the client would have spent that time improving the satoshi client, we'd be a lot further Tongue

Sometimes I don't get this open source thing... you have the source, it's as liberally licensed as possible, and still you want to re-do work.

etotheipi

legendary

Activity: 1428

Merit: 1093

Core Armory Developer

Some examples:
--I want to be able to rewrite wallets to be more-straightforward and easily-recoverable than the current BerkleyDB format. I've battled corrupted wallets quite frequently and I'm disappointed I can't reconstruct them easily (and that the DB is easily corruptable). The database of keys is really add-only, so I don't see why we need a complex DB engine to track them.
--I want to redesign the file formats to better accommodate wallet recovery, memory pool organization, transaction filtering, reduced memory footprint, etc
--I want to have custom account types that support various CONOPs, such as tracking, locking, uploading, recovering android/iPhone accounts, and sync'ing between computers.
--I want to be able to separate out private keys from public/keys/addresses to allow tracking of balances while being able to keep the private keys safe on external media
--I want to be able to sign certificates containing desired transactions from an offline computer, and transfer the certificate to my client online and have it broadcast (if you have $10,000 worth of BTC, you would probably be willing to buy a $200 Eee PC just to engage in this type of security measures)

I think there's a lot of good reasons someone would want to change the under-the-hood stuff, instead of just re-wrapping the reference client with a new GUI.

-Eto

wumpus

hero member

Activity: 812

Merit: 1022

No Maps for These Territories

I think Mike is sceptical because we saw a lot of alternative clients announced already, in Python and various other languages, but none was ever finished.

Well the Java and .NET implementation work, though it doesn't function as a full node yet doesn't do validation/mining.

Quote from: etotheipi on July 19, 2011, 12:58:41 PM

Some people want to just spend and receive coins without even seeing address strings or wallet details. Others will be power users, and want more fine-grained control over their wallet(s), keys, encryptions, and super-security techniques. And there's a million shades of gray in between. I don't think all this can be achieve with one client.

This is being covered by making it easier to interface with alternative GUIs by making the current client into a library and supporting an easier external interface... If you just want to slap another face on it I don't see why you'd want to re-implement the whole thing.

Then again, you might have a good reason, good luck with your work!

etotheipi

legendary

Activity: 1428

Merit: 1093

Core Armory Developer

I don't think I'm underestimating at all -- I know it's a massive amount of work. that's why I think we need a community-maintained "standards" document (wiki) that will allow developers to see what it will take to get there. In return, such clients can conform to the "rules of the land" and make sure that the network stays "pleasant."

I don't think Bitcoin has much of a future without the ability for devoted developers to develop their own, full clients. I think having only a single "real" client will stifle the development of the Bitcoin in general. The ability for individuals, groups and companies to develop their own implementations, tailored for a variety of different use-cases is going to help BTC succeed. Some people want to just spend and receive coins without even seeing address strings or wallet details. Others will be power users, and want more fine-grained control over their wallet(s), keys, encryptions, and super-security techniques. And there's a million shades of gray in between. I don't think all this can be achieve with one client.

I have the intent to do one myself, even if it takes me 2 months to get the details right. And I think we should promote it. And I think the best way to promote it is to have a thoroughly-hashed-out description of these details, with as many unit-tests as possible for developers to apply to their code. The alternative is for developers to create clients but not know they weren't supposed to forward transactions of unknown validity, then open up the door for people to start flooding the network.

-Eto

wumpus

hero member

Activity: 812

Merit: 1022

No Maps for These Territories

Quote from: etotheipi on July 19, 2011, 11:20:11 AM

What I'm concerned about is someone making a really great GUI (like I plan to),

Hey, why not help me with the better GUI instead of starting yet another one?

Quote from: etotheipi on July 19, 2011, 11:20:11 AM

- Does a node have to be able to check validity of all blocks/transactions to forward them?

Yes

Quote from: etotheipi on July 19, 2011, 11:20:11 AM

- If a node is a reduced-memory-footprint node (not holding the entire blockchain), should it forward transaction even if it can't verify they are valid?

No, never forward potentially invalid transactions.

Quote

- How much flexibility will we allow for developers to implement their own fee schedule?

All flexibility you want, you can require any fee you want for putting the transaction in a block, and you can add any fee you want to a transaction (though the chances it'll be included under the minimum fee/kb of the standard client are slim)

Mike Hearn

legendary

Activity: 1526

Merit: 1134

See here for some:

https://en.bitcoin.it/wiki/Protocol_rules

However, I think you're underestimating how much work is involved in reimplementing the Satoshi client from scratch. So far no credible reimplementation has appeared.

etotheipi

legendary

Activity: 1428

Merit: 1093

Core Armory Developer

Hi all, I'm mostly active in the parent discussion group, but I am working on an alternative client in python, so I should be spending more time here. In recent days, I have realized that alternative clients could seriously disrupt BTC network unless their is very high standards for what it does. What I'm concerned about is someone making a really great GUI (like I plan to), that has a subtle vulnerabilities that the base group of developers have tried so hard to avoid (which I don't plan to do). The last thing we want is to have 30% of users switch to a client that has a scripting bug in it that evaluates certain scripts to true that should be false, then the block chain starts getting messy with hundreds of invalid blocks and "reversed" transactions.

So, of course, I would ask if there is an existing description like what I ask for. If not, or if it's not complete, I would like to start one here. I want to create a living, master checklist for developers to use, that they need to conform to before declaring their client "correct." This would include flood defense techniques, community-agreed transaction fee schedules, scripting unit-tests, network broadcast/forwarding rules, etc. I recognize a lot of this is in the C++ source code, but that's not the best way to document the standard...

I will start here with a short a short list of questions, and I'd like to get feedback concerning what is needed for a new client to be considered safe & "correct."

-- Node connection protocols: node-discovery techniques, node-cycling, suggested IP-rules, backwards-compatibility handling
-- Scripting:
- Would like to see a list of OP_KEYWORDs to be supported by the scripting engine (are they all the non-"Currently Disabled" words on the scirpting wiki?)
- Would like to see/create a set of reasonably-complex unit-test scripts that can be used to check the scripting engine, including most required keywords
- Tests would look like (scriptBinaryTest1, True), (scriptBinaryTest2, True), (scriptBinaryTest3, False), etc,
-- Reduced-memory nodes
- If a node is not intended to hold the entire blockchain, how should its network-protocols be adjusted?
- What classes of nodes can we define, and what are the associated capabilities:
-- A node that only holds it's own transaction list should only connect for getting information, not forwarding transactions or blocks (or maybe forwarding blocks, but not transactions?)
-- A reduced-footprint node should identify itself in some way on the network -- and other nodes would be aware and pick it's peers diversely... how?
-- Flood-disconnect conditions:
-What conditions should be applied before disconnecting a node (besides the bug currently disconnecting those downloading the block chain)
-What conditions determine whether a broadcast transaction is forwarded to other nodes/peers? Blocks forwarded?
- Does a node have to be able to check validity of all blocks/transactions to forward them?
- If a node is a reduced-memory-footprint node (not holding the entire blockchain), should it forward transaction even if it can't verify they are valid?
-- Transaction Fees:
- What is the current acceptable fee schedule?
- How much flexibility will we allow for developers to implement their own fee schedule?
- If a transaction fee is not included where we think it should be, should it be forwarded to other nodes?
-- Other:
-What are the timeout rules for peer nodes?
-What are the timestamping rules for creating new blocks (I know there's a median calculation based on a bunch of peers)
-Would like to see a list of checks required before declaring a transaction valid. A developer could easily remember to check the merkle tree root, but not that sum(TxIn)==sum(TxOut)

I know I'm forgetting a ton of stuff, but that's why this is a discussion and not a monologue

-Eto

Topic: Request for Standardization (Read 4706 times)