We need to split up the Satoshi client

grau

hero member

Activity: 836

Merit: 1030

bits of proof

Quote from: flipperfish on November 01, 2012, 05:14:49 AM

This goes into the right direction. Will this become a full-node client?

Yes, this will be a full server. I target with it the server installations that will deal with the volume magnitudes above today's.
It has a radically modular/extensible architecture.

I will release the code before end of this week.

flipperfish

sr. member

Activity: 350

Merit: 251

Dolphie Selfie

Quote from: 2112 on October 31, 2012, 07:02:47 PM

Quote from: flipperfish on October 31, 2012, 05:06:45 PM

.NET/WCF supports the publish/subscribe-model, even for multiple clients. But I don't know how connection errors are handled.

All right. So it looks like I've been successfully trolled by flipperfish, and in almost exactly the same way that ruski had trolled me over a year ago on the same forum.

Quote from: ruski on August 25, 2011, 02:36:36 AM

@ 2112, are you sure it would be so hard to modify it? My favoured language is VB/VB.NET, and while I can read C++ and it won't take long to pick up, I won't get anywhere trying to read the entire program for what I need. If you could find the initialization code ie. sub Main for the whole program, so I can work from there, it'd be a big help. May not be so difficult as you think.

The take-away for me is that I'm really vulnerable to Microsoft trolls who offer .NET as viable, portable and open sourced implementation of Bitcoin.

Props to flipperfish.

Maybe you should make up your mind about who's trolling here. You wanted to hear the name of an implementation, that fulfill your (IMO oversized) requirements, I named one. Just to make this clear: I never proposed using .NET/Microsoft as core technology for bitcoin. If you are aiming at my mentioning of C# as possible language, please take a look at mono.
By the way: I'm perfectly fine with an alternative implementation of bitcoin using Microsoft-Technology. As long as there's a free alternative bitcoin client availiable, I see no problems with that.

@grau
This goes into the right direction. Will this become a full-node client?

grau

hero member

Activity: 836

Merit: 1030

bits of proof

This is an excerpt of my implementations's configuration. Is this the kind of modularization you are looking for ?

Code:

10

2112

legendary

Activity: 2128

Merit: 1074

Quote from: flipperfish on October 31, 2012, 05:06:45 PM

.NET/WCF supports the publish/subscribe-model, even for multiple clients. But I don't know how connection errors are handled.

All right. So it looks like I've been successfully trolled by flipperfish, and in almost exactly the same way that ruski had trolled me over a year ago on the same forum.

Quote from: ruski on August 25, 2011, 02:36:36 AM

@ 2112, are you sure it would be so hard to modify it? My favoured language is VB/VB.NET, and while I can read C++ and it won't take long to pick up, I won't get anywhere trying to read the entire program for what I need. If you could find the initialization code ie. sub Main for the whole program, so I can work from there, it'd be a big help. May not be so difficult as you think.

The take-away for me is that I'm really vulnerable to Microsoft trolls who offer .NET as viable, portable and open sourced implementation of Bitcoin.

Props to flipperfish.

flipperfish

sr. member

Activity: 350

Merit: 251

Dolphie Selfie

Quote from: 2112 on October 31, 2012, 03:42:54 PM

Then please name those implementations! I'm aware of very few and they are all from moderately to very complex.

.NET/WCF supports the publish/subscribe-model, even for multiple clients. But I don't know how connection errors are handled.

Quote from: 2112 on October 31, 2012, 03:42:54 PM

By reliable I mean the following:

1) Can discover a temporary transport failure and provide a way to reconnect at the transport level and to replay the missing notifications from the server to the client.

2) Can garbage collect truely stale connections from the clients that really died.

3) Does 1) and 2) with reasonable overhead. I'm immediately disqualifying any implementation that keeps a per-client queue of outgoing messages on the server. They are just too easy to DDoS, although I'm aware of several vendors actively peddling such solutions (just not in the Bitcoin domain). Any implementation that can replay from a single shared transaction log is fine with me.

I can imagine, that this would create some problems with models like Electrum/Stratum, where many clients are served and have to be notfied about new events in the network. I can also see, that in that case DDoS maybe a problem. But I think these issues are of lower prirority for the development of a modularized full-node client. The node-state module of a full-node client has by definition to run at the same machine. Finally a solution will be found for thin clients (or better their servers), which can be adopted for one of the many possible implementations of the node-state-module.

2112

legendary

Activity: 2128

Merit: 1074

Quote from: flipperfish on October 31, 2012, 03:20:43 PM

I don't understand the problem you see here. Most RPC standards use TCP as underlying layer, which ensures reliability (and even order of arrival).
Two-way communication is needed anyways, else the server could not send its response back to the client.

By two way communication I mean the conceptual two-way not transport two-way. In other words: client makes one request and server keeps providing multiple replies until canceled.

Quote from: flipperfish on October 31, 2012, 03:20:43 PM

On the conceptional level, 2-way communication can also be achieved by implementing both server and client on both sides. Many RPC-implementations also offer this as part of their protocol (like in the example you mention).

Then please name those implementations! I'm aware of very few and they are all from moderately to very complex.

By reliable I mean the following:

1) Can discover a temporary transport failure and provide a way to reconnect at the transport level and to replay the missing notifications from the server to the client.

2) Can garbage collect truely stale connections from the clients that really died.

3) Does 1) and 2) with reasonable overhead. I'm immediately disqualifying any implementation that keeps a per-client queue of outgoing messages on the server. They are just too easy to DDoS, although I'm aware of several vendors actively peddling such solutions (just not in the Bitcoin domain). Any implementation that can replay from a single shared transaction log is fine with me.

By the way: "official" JSON-RPC supports asynchronous notifications from the client to the server. This is not what is required here and the slush's contribution is that he had found a productive way to "abuse" the existing implementations to do the inverse notifications.

flipperfish

sr. member

Activity: 350

Merit: 251

Dolphie Selfie

Quote from: 2112 on October 31, 2012, 02:10:04 PM

None of the commonly-mentioned RPC standards offer a reliable two-way communication between a client and a server. In the Stratum design slush had found a way to abuse the JSON-RPC "notification" mechanism to implement the inverse notification: from the server to the client.

I don't understand the problem you see here. Most RPC standards use TCP as underlying layer, which ensures reliability (and even order of arrival). Two-way communication is needed anyways, else the server could not send its response back to the client. On the conceptional level, 2-way communication can also be achieved by implementing both server and client on both sides. Many RPC-implementations also offer this as part of their protocol (like in the example you mention).

Additionally, where exactly do you see synchronization issues while using RPC? In any case, where more than one entity does access the "node-state-provider" in a parallel way, there will be needed some mechanism to prevent deadlocks and race conditions. But I think it is reasonable to let this be decided by the concrete implementation of this service.

2112

legendary

Activity: 2128

Merit: 1074

OK, so this is the leading architectural misconception:

Quote from: ShadowOfHarbringer on October 31, 2012, 07:56:49 AM

interfaces can be made around it using XMLRPC API

Quote from: flipperfish on October 31, 2012, 01:02:08 PM

The "node-state-provider" could be implemented in various ways (e.g. as RPC-Stub, ...).

None of the commonly-mentioned RPC standards offer a reliable two-way communication between a client and a server. In the Stratum design slush had found a way to abuse the JSON-RPC "notification" mechanism to implement the inverse notification: from the server to the client.

Now I understand better the added value of the trading engines like MetaTrader, etc. : they hide the the essential asynchronous nature of the financial networking: the user isn't just making requests and receiving responses. The financial user needs to be made aware of the changes occuring in the outside world: be it securities exchange trades or P2P bitcoin transactions. MetaTrader (and similar designs) just sandbox the user's program in a way that the aggresive polling is not visible outside of the user's machine.

Financial industry had various solutions to this problem available for years. One of the simplest: FIX protocol is about 20 years old now.

http://en.wikipedia.org/wiki/Financial_Information_eXchange

Unfortunately I see a long and painful road forward for Bitcoin implementers and integrators. If the prevailing attitude will be that "no deficiency can't be resolved by simply polling more often" then the progess will be exruciatingly slow. The intense trading activity will be indistinguishable from the DDoS attack. The battery life of any portable Bitcoin devices will be very bad.

flipperfish

sr. member

Activity: 350

Merit: 251

Dolphie Selfie

I think it is a good idea to have a modularized full-node bitcoin client. AFAIK there is currently no other full-node client. In addition, an important advantage of a modularized approach is testability. The more modularized a system is, the easier it is to write (good) unit-tests, which should also improve security. As the satoshi client is already implemented in C/C++ (with all its messieness), I think it would be a good idea to implement an alternative full-node client in an alternative language, which is also statically typed (like Java, C#), but does hide some of the C/C++ quriks. This would improve code readability and thus documentation.

I think the OP identified already the most needed modules, but I think that "Knowledge Center" still is some kind of "god module", which does everything, that does not fit somewhere else. IMHO "block-chain-storage" should be generalized to "node-state-provider" and store things like floating transactions, too. The "verifier" can then use the primitive services provided by "node-state-provider" to decide if incoming entities (transactions and blocks) are valid. It's the job of the "node-state-provider" to ensure performance of its offered service-primitives. The "node-state-provider" could be implemented in various ways (e.g. as DBMS, as RPC-Stub, as Flatfile, ...).

The split between wallet and pure transaction server should be done on a higher level than the other modules.

2112

legendary

Activity: 2128

Merit: 1074

Quote from: Hawkix on October 31, 2012, 07:08:13 AM

I feel the minimal separation which would be desirable is to split current client into hardened transaction server and move the wallet, addressbook, GUI etc. into thin light client.

One could argue that the minimal separation is an already solved problem: Electrum client/server communicating with the Stratum protocol.

As far as I understand the main opposition argument against the Stratum-based implementation is that it will be susceptible to the MITM attack.

In my opinion assuming single P2P-network module in any design is an oversimplification. Perhaps it would be better to have two kinds of P2P modules:

1) P2P-participant: a module that stores the blockchain (either directly or in cooperation with another module) and is capable of both sending and receiving the Bitcoin transactions. The transactions in this module are always fully verified.

2) P2P-observer: a module that stores only block headers (in the SPV fashion) and can only receive the Bitcoin transactions including the ones that fail verification. The main distinction required for this module is that it doesn't make a direct connection to the P2P-participant module used by the same client. In other words it will report existence of "our" transactions, transactions that our client had sent, only when it had seen them propagating on the outside network.

Both version of P2P modules would benefit from a persistent way to store what currently is the volatile mempool: the in-flight transactions that are not yet recorded in the blockchain. This would be of great benefit to the people who are actively mining, either solo or as a pool operators.

Obviously there is a lot of overlap in the functionality I described above. So maybe the better design would be to roll them together into a single module, but include an additional "participant/observer" flag in its API?

On the other hand the P2P-observer module is significantly less resource intensive. It could also be a quick and neat way to discover MITM attacks on the Stratum (or similar) protocol.

So then maybe make a "participant/observer" flag an instantiation choice for an unified P2P module? The P2P modules instantiated in the "participant" mode will be able to respond to the API call with both "participant" and "observer" flag. The P2P instaces created in the observer mode will only respond to the API calls specifying the "observer" mode.

I'm quite certain that an architecture for a robust Bitcoin implementation is still an open problem and worth further research and discussion. All current implementations are quite fragile and are incapable of properly dealing with the known failure modes of the Bitcoin network.

ShadowOfHarbringer

legendary

Activity: 1470

Merit: 1006

Bringing Legendary Har® to you since 1952

Quote from: Hawkix on October 31, 2012, 07:08:13 AM

I feel the minimal separation which would be desirable is to split current client into hardened transaction server and move the wallet, addressbook, GUI etc. into thin light client.

For example, why would a small office of 10 people willing to use Bitcoin have each of its computer to download blockchain and keep up to date? The solution is to setup just single instance of local transaction server (all time up) and connect the GUI client (using RPC calls or Stratum) to this transaction server.

I think the developers are already aiming for something like this.
Bitcoind is already kind of separate from the GUI, and interfaces can be made around it using XMLRPC API or command line. And I support this.

But the OP meant a much deeper split of the Bitcoin clients into 3-4 or even more elements. This is what i oppose, because it creates unnecessary complexity.

Hawkix

hero member

Activity: 531

Merit: 505

I feel the minimal separation which would be desirable is to split current client into hardened transaction server and move the wallet, addressbook, GUI etc. into thin light client.

For example, why would a small office of 10 people willing to use Bitcoin have each of its computer to download blockchain and keep up to date? The solution is to setup just single instance of local transaction server (all time up) and connect the GUI client (using RPC calls or Stratum) to this transaction server.

We need MTA and MUA for the Bitcoin - BTA and BUA.

2112

legendary

Activity: 2128

Merit: 1074

Quote from: cjp on October 27, 2012, 09:57:00 AM

I've read the 2PC Wikipedia page you mentioned, you want the "SQL server" and the "bitcoind daemon" to act as "cohorts" in the protocol. In that case, the "bitcoind daemon" should act as follows:

When a "query to commit" is received, select a sufficient amount of unlocked confirmed unspent transaction outputs for the transaction, and lock these (so they are not used by other transactions). If this fails (insufficient unlocked confirmed unspent tx outputs), reply "NOK", else reply "OK".
When a "commit" is received, spend the locked tx outputs.
When a "rollback" is received, unlock the locked tx outputs.

Now, as always, the devil is in the details. These are some I thought of:

The two-phase commit protocol has no time-outs. When one component fails to send "OK" or "NOK" in the first phase, all resources stay locked until the problem is resolved, e.g. by restarting a crashed service.
After resolving a failure, all in-progress transactions need to be finished. It may be needed to re-send information, but this resending must not result in a transaction happening twice. This can be avoided e.g. by giving each transaction a unique ID.
If it turns out in the second phase that the transaction fee (selected in the first phase) is too low, then extra bitcoins be needed to increase the fee, but these are not guaranteed to be available. This is a problem, because in the second phase, the transaction is already supposed to be committed. To avoid this, a sufficient amount must be locked in the first phase, to take into account the highest possible fee needed. If the required fee is lower than the maximum, the rest can always be sent back to self.

I think this can be built on top of any bitcoind (including the existing one, without changing anything to bitcoind itself), as long as the implementation of this protocol is the only process which uses that bitcoind. Although it sounds like useful not-yet-implemented functionality, I don't think it's necessary to involve this functionality in the module definitions at this moment.

Well, what can I say? Building a Chernobyl-style containment structure around the existing bitcoind is also a form of software architecture. For some Bitcoin users this would be even preferred choice when compared with a clean-slate new design.

For those of the readers who are alergic to Microsoft Windows I've found another learning resource. On the Oracle Technology Network there is a compressed virtual machine image called "tuxweb.7z". It contains a simple Apache-based web-store with the Oracle Express and Tuxedo as a backend.

I just want to make it clear that the above isn't a preferred solution for the web-store ventors on this site. It would be akin to using a harvester combine to mow the backyard. But for anyone contemplating scaling up the Bitcoin solution this is the way to go. Even if you are ultimately going to come with something different seeing the harvester combine at work will be an usefull learning experience.

ShadowOfHarbringer

legendary

Activity: 1470

Merit: 1006

Bringing Legendary Har® to you since 1952

I don't think we need to split the Satoshi client. One should ve very careful when it comes to designing advanced multi-layered, multi-component applications when there is no clear need.

Stallman tried something like that called "microkernel architecture" with GNU HURD, and see how it ended up. Linus torvalds went the other way, and designed monolithic kernel with attachable modules instead. And look how it worked out: Linux is the most popular, the most advanced and the most scallable operating system on the planet, and is used now almost everywhere, except on the desktop.

In THEORY, microkernel architecture with separate independent layers/modules for every function of the system is very neat conception, but in practice it produces incredible amount of communication, latency and compatibility problems.

There is a history lesson to be learned from this: "Keep it simple, stupid!"

cjp

full member

Activity: 210

Merit: 124

Quote from: 2112 on October 25, 2012, 04:45:03 PM

Consider a following (incorrect) attempt to implement an http://en.wikipedia.org/wiki/Two-phase_commit_protocol
between an SQL server and a bitcoind daemon:

1) START TRANSACTION
2) SELECT amount and destination address
3) call "sendtoaddress" bitcoind via RPC
4) if OK then UPDATE account balances and COMMIT TRANSACTION
5) if not OK then ROLLBACK TRANSACTION

Now consider that the bitcoind had a really complex wallet containing mutiple thousands of unspent coins and also a backup was running on the disk cointaining the .bitcoin directory. The RPC call timed out. What do you do now? How to fix this problem?

I've read the 2PC Wikipedia page you mentioned, you want the "SQL server" and the "bitcoind daemon" to act as "cohorts" in the protocol. In that case, the "bitcoind daemon" should act as follows:

When a "query to commit" is received, select a sufficient amount of unlocked confirmed unspent transaction outputs for the transaction, and lock these (so they are not used by other transactions). If this fails (insufficient unlocked confirmed unspent tx outputs), reply "NOK", else reply "OK".
When a "commit" is received, spend the locked tx outputs.
When a "rollback" is received, unlock the locked tx outputs.

Now, as always, the devil is in the details. These are some I thought of:

The two-phase commit protocol has no time-outs. When one component fails to send "OK" or "NOK" in the first phase, all resources stay locked until the problem is resolved, e.g. by restarting a crashed service.
After resolving a failure, all in-progress transactions need to be finished. It may be needed to re-send information, but this resending must not result in a transaction happening twice. This can be avoided e.g. by giving each transaction a unique ID.
If it turns out in the second phase that the transaction fee (selected in the first phase) is too low, then extra bitcoins be needed to increase the fee, but these are not guaranteed to be available. This is a problem, because in the second phase, the transaction is already supposed to be committed. To avoid this, a sufficient amount must be locked in the first phase, to take into account the highest possible fee needed. If the required fee is lower than the maximum, the rest can always be sent back to self.

I think this can be built on top of any bitcoind (including the existing one, without changing anything to bitcoind itself), as long as the implementation of this protocol is the only process which uses that bitcoind. Although it sounds like useful not-yet-implemented functionality, I don't think it's necessary to involve this functionality in the module definitions at this moment.

2112

legendary

Activity: 2128

Merit: 1074

Oh, and by the way: here's a very nice post from DeathAndTaxes in the same vein like the original post of this thread:

https://bitcointalksearch.org/topic/m.1113573

He's coming from the large enterprise background, he had seen the transactional processing being done on the Microsoft software. He's just not a software architect, but more of a data center operations manager.

2112

legendary

Activity: 2128

Merit: 1074

Quote from: casascius on October 25, 2012, 05:03:23 PM

If I'm a wallet module that handles orders on behalf of a web storefront, the disposition of the payment isn't immediately useful until I'm ready to ship their order. If I make a query to my "knowledge center" to see if a payment made 3 hours ago is continuing to receive confirmations 5 minutes before I'm about to print the label and ship the product, is an asynchronous notification really necessary?

First, before concluding that the solution is unscalable, the problem needs to be defined. You and I and others might legitimately have a different idea of what the problem is, which is part of why a solution whose parts can be replaced would be so valuable.

I kinda understand your reservations now that I know that you are coming from a fiercely independent small businessman background. To you reinventing the wheel of online transaction processing may sound and look like a worthwhile endeavor.

I come from academic background and as a student my school tortured us on a near obsolete mainframe with CISC, ACP& PL/I to imbue us with knowledge of how to avoid reinventing the wheels.

The "problem" you are writing about is well defined since about 1970-1980 and it is called http://en.wikipedia.org/wiki/Online_transaction_processing or http://en.wikipedia.org/wiki/Transaction_processing .

I don't have an evangelical bent. Anyone is free to peddle their wares here. I'll just call it the way I see it: gold-foil wrapped turd. We could go back and forth like it went in the Stratum thread. But I now understand that people here need to lose their own money to learn something.

BoardGameCoin

sr. member

Activity: 283

Merit: 250

I heartily agree with the spirit of this proposal, and may start figuring out the build process to dive into the existing source code to understand things better.

-bgc

casascius

vip

Activity: 1386

Merit: 1140

The Casascius 1oz 10BTC Silver Round (w/ Gold B)

Quote from: 2112 on October 25, 2012, 04:45:03 PM

1) START TRANSACTION
2) SELECT amount and destination address
3) call "sendtoaddress" bitcoind via RPC
4) if OK then UPDATE account balances and COMMIT TRANSACTION
5) if not OK then ROLLBACK TRANSACTION

Now consider that the bitcoind had a really complex wallet containing mutiple thousands of unspent coins and also a backup was running on the disk cointaining the .bitcoin directory. The RPC call timed out. What do you do now? How to fix this problem?

I think I agree here: my ideal on how the flow would work (in relation to the above) would be this:

1) start transaction
2) select unspent txids for some or all keys I have, reconcile with my local storage to make sure I haven't spent them
3) generate a transaction that spends txids (I'm a wallet module), and update my local storage to reflect that these txids are attempted-spend
4) save the transaction somewhere in case I later determine I have failed to get it to the network
5) attempt to send the transaction to the network (if I'm talking to a DBMS, maybe that takes the form of inserting my transaction into a work queue that will be forwarded by a bitcoind somewhere else)
6) upon failure, the saved record is available for me to try again upon my next startup

casascius

vip

Activity: 1386

Merit: 1140

The Casascius 1oz 10BTC Silver Round (w/ Gold B)

Quote from: 2112 on October 25, 2012, 04:13:58 PM

You can attempt to fake asynchronicity by frequent polling (you called it "query for an update") or hacks like long-poll. But this isn't scalable solution and ultimately it will also fail, just in a different way than the typical damn-the-ACID hacks.

What use case do you imagine this being a problem? I actually suggested two ways to consume the data (let's call them A and B), and you have basically said (or I have understood), "No, B won't work, you actually have to use A". That may be true if the application is a GUI that a user might be staring at for an incoming payment. But B is a different tool for a different problem, not a lazy way to "pretend" to do A.

If I'm a wallet module that handles orders on behalf of a web storefront, the disposition of the payment isn't immediately useful until I'm ready to ship their order. If I make a query to my "knowledge center" to see if a payment made 3 hours ago is continuing to receive confirmations 5 minutes before I'm about to print the label and ship the product, is an asynchronous notification really necessary?

First, before concluding that the solution is unscalable, the problem needs to be defined. You and I and others might legitimately have a different idea of what the problem is, which is part of why a solution whose parts can be replaced would be so valuable.

Topic: We need to split up the Satoshi client (Read 3175 times)