Elegance and comprehensibility via holistic unification of design concepts...
Add more elegant syntax and less noisy implementation of polymorphism, first-class functions, etc...
One of the big issues these days is asynchronous programming. The following delve into this issue:
[...]
I am absolutely astonished they undid making Rust a tasklet based runtime i.e. like stackless Python with a language native coroutine M:N threading model, especially with the excellent actor based concurrency model already built in there which should have been perfect. Originally their i/o was based on libuv, probably the leading portable asynchronous i/o library written in C, so all their i/o was also async. A full fat coroutine + all async i/o new systems programming language ticking every box for a C++ replacement - Rust v0.12 sounds great, doesn’t it?
Unfortunately they ended up canning the greenlet support because theirs were slower than kernel threads which in turn demonstrates someone didn’t understand how to get a language compiler to generate
stackless coroutines effectively (not surprising, the number of engineers wired the right way is not many in this world, but see
http://www.reddit.com/r/rust/comments/2l0a4b/do_rust_web_servers_use_libuv_through_libgreen_or/ for more detail). And they canned the async i/o because libuv is “slow” (which it is only because it is single threaded only, plus forces a malloc + free per async operation as the buffers must last until completion occurs, plus it enforces a penalty over synchronous i/o see
http://blog.kazuhooku.com/2014/09/the-reasons-why-i-stopped-using-libuv.html), which was a real shame - they should have taken the opportunity to replace libuv with something better (hint: ASIO + AFIO, and yes I know they are both C++, but Rust could do with much better C++ interop than the presently none it currently has) instead of canning always-async-everything in what could have been an amazing step up from C++ with most of the benefits of Erlang without the disadvantages of Erlang.
A huge missed opportunity I think, and sadly it looks like the ship has sailed on those decisions
, and both untick pretty major boxes for some people including me. As it is a personal goal of mine to see AFIO become the asynchronous filesystem and file i/o implementation entering the C++ standard library to complement ASIO as the C++ Networking library entering in C++ 17, I can’t complain too much I suppose, it’s just it renders Rust as less likely to make C++ obsolete/encourage C++ to up its game.
[...]
+Steve Klabnik Oh for sure. As it happens, I'm currently in a week long private discussion about how best to tweak C++/LLVM to ideally support stackless coroutines via a new "safe C++" modifier which has the compiler enforce safer and much more coroutine efficient programming idioms on functions and namespaces so marked up (indeed such safe C++ would substantially close the gap with Rust and Go I think). So I absolutely understand the difficulties in not just deciding on a design, but persuading people - most of whom are not async wired - that implementing the design is a good idea. There is also a big problem that we need reference compiler implementations to demonstrate the design works, and that means raising at least $40k to fund the requisite developers.
No, my objection to dropping async i/o in Rust was more that there is nothing wrong with libuv, it's just it's slow. Slow is 100% fine for a v1.0 language release, so long as you're faster than Python it isn't important. I'd prefer that all the people who jump onboard with 1.0 code against the right kind of i/o library API with the right kind of semantics, and we'll fix up performance over the coming years. Moreover, my day job's main code base is currently wasting a chunk of time dealing with Rust's incomplete i/o facilities, and for us at least a slow but fully async i/o library baked into the language would be far better than currently wrestling with pushing networking apis into threads just to work around blocking issues and bugs. mio is a non starter for us, as are most of the other async i/o frameworks for Rust, because we need Windows and besides we don't want to get locked into an expensive to replace library which may get orphaned.
Anyway, I'm sure you all had the same discussions when you decided to drop built in libuv, I guess coming from an async background I like async. For many if not most programmers async just isn't important, and it's an easy drop given the average cost benefit.
[...]
+Niall Douglas I think that 'slow' wasn't the full objection here, exactly. Let me lay out a slightly fuller history, slightly modified from a HN comment:
------------
In the beginning, Rust had only green threads. Eventually, it was decided that a systems language without systems threads is... strange. So we needed to add them. Why not add choice? Since the interfaces could be the same, why not abstract over them, and you could just choose which one you wanted?
At the same time, the problems with green threads by default were becoming issues. Segmented stacks cause slow C interop. You need a runtime to manage them, etc. Furthermore, the overall abstraction was causing an unacceptable cost. The green threads weren't very green. Plus, with the need to actually release someday looming, decisions needed to be made regarding tradeoffs. And since Rust is supposed to be a systems language, having 1:1 threads and basically no runtime makes more sense than N:M threads and a runtime. . So libgreen was removed, the interface was re-done to be 1:1 thread centric.
[...]
+Steve Klabnik Regarding
green threads - which aren't necessarily stackless coroutines - yes a new userspace implementation is highly unlike to beat the kernel which has had years of tuning. This is one of the big objections to Microsoft's resumable functions proposal before WG21, some of us think it too heavy and too ungenerically useful outside its narrowly defined use case.
Stackless coroutines are a bit different though, because they require no C stack at all. The way we're thinking about them for C++ is that the compiler will emit, for each compiled function, its maximum possible stack frame size with stack frame constructor and destructor. It also prevents you writing code which causes uncalculatable stack consumption, so that's the "safe C++" part (interestingly, via those same safeguards we can also guarantee some function can never fail unexpectedly which is great for high reliability ultra low latency scenarios, but that's an aside). To call that function, one simply constructs its stack at some arbitrary location the callee asks for (could be static mem, could be malloc, could be the C stack), and calls the function setting the stack frame register to the new stack frame.
One can now pause and resume the execution of that function with a complete knowledge of what context needs to be saved and restored. What you get looks as if stackful coroutines, but context switching is optimal and no actual stack is needed except if you call the C library which can't be a resumption point. The price is that the programmer can't do some things, and can only call inline defined C++ functions or other safe C++ functions or the C library, and alloca with a dynamically calculated value is obviously verboten.
Anyway, my point is that you can modify the language to make stackless coroutines efficient, and I do wish Rust had done that for 1.0. But I entirely accept that stuff must be cut to reach a 1.0 release, else you'd be at it forever. Thanks for the extra info though.
[...]
+Niall Douglas yeah, absolutely: there are different ways to build them. That's just about the particular way they had been built at the time, which is why they had to go.
https://github.com/rustcc/coroutine-rs also popped up recently, though I haven't checked the implementation.
Coroutines are computer program components that generalize subroutines for nonpreemptive multitasking, by allowing multiple entry points for suspending and resuming execution at certain locations. Coroutines are well-suited for implementing more familiar program components such as cooperative tasks, exceptions, event loop, iterators, infinite lists and pipes.
Comparison with subroutines
When subroutines are invoked, execution begins at the start, and once a subroutine exits, it is finished; an instance of a subroutine only returns once, and does not hold state between invocations. By contrast, coroutines can exit by calling other coroutines, which may later return to the point where they were invoked in the original coroutine; from the coroutine's point of view, it is not exiting but calling another coroutine. Thus, a coroutine instance holds state, and varies between invocations; there can be multiple instances of a given coroutine at once. The difference between calling another coroutine by means of "yielding" to it and simply calling another routine (which then, also, would return to the original point), is that the latter is entered in the same continuous manner as the former. The relation between two coroutines which yield to each other is not that of caller-callee, but instead symmetric.
Any subroutine can be translated to a coroutine which does not call yield.
To implement a programming language with subroutines requires only a single stack that can be preallocated at the start of program execution. By contrast, coroutines, able to call on other coroutines as peers, are best implemented using continuations. Continuations may require allocation of additional stacks, and therefore are more commonly implemented in garbage-collected high-level languages.[citation needed] Coroutine creation can be done cheaply by preallocating stacks or caching previously allocated stacks.
Comparison with generators
Generators, also known as semicoroutines, are also a generalisation of subroutines, but are more limited than coroutines. Specifically, while both of these can yield multiple times, suspending their execution and allowing re-entry at multiple entry points, they differ in that coroutines can control where execution continues after they yield, while generators cannot, instead transferring control back to the generator's caller. That is, since generators are primarily used to simplify the writing of iterators, the yield statement in a generator does not specify a coroutine to jump to, but rather passes a value back to a parent routine.
However, it is still possible to implement coroutines on top of a generator facility, with the aid of a top-level dispatcher routine (a trampoline, essentially) that passes control explicitly to child generators identified by tokens passed back from the generators:
Comparison with mutual recursion
Using coroutines for state machines or concurrency is similar to using mutual recursion with tail calls, as in both cases the control changes to a different one of a set of routines. However, coroutines are more flexible and generally more efficient. Since coroutines yield rather than return, and then resume execution rather than restarting from the beginning, they are able to hold state, both variables (as in a closure) and execution point, and yields are not limited to being in tail position; mutually recursive subroutines must either use shared variables or pass state as parameters. Further, each mutually recursive call of a subroutine requires a new stack frame (unless tail call elimination is implemented), while passing control between coroutines uses the existing contexts and can be implemented simply by a jump.
Common uses
* State machines within a single subroutine, where the state is determined by the current entry/exit point of the procedure; this can result in more readable code compared to use of goto, and may also be implemented via mutual recursion with tail calls.
* Actor model of concurrency, for instance in video games. Each actor has its own procedures (this again logically separates the code), but they voluntarily give up control to central scheduler, which executes them sequentially (this is a form of cooperative multitasking).
* Generators, and these are useful for streams – particularly input/output – and for generic traversal of data structures.
* Communicating sequential processes where each sub-process is a coroutine. Channel inputs/outputs and blocking operations yield coroutines and a scheduler unblocks them on completion events.
stackfulness
In contrast to a stackless coroutine a stackful coroutine can be suspended from within a nested stackframe. Execution resumes at exactly the same point in the code where it was suspended before. With a stackless coroutine, only the top-level routine may be suspended. Any routine called by that top-level routine may not itself suspend. This prohibits providing suspend/resume operations in routines within a general-purpose library.
first-class continuation
A first-class continuation can be passed as an argument, returned by a function and stored in a data structure to be used later. In some implementations (for instance C# yield) the continuation can not be directly accessed or directly manipulated.
Without stackfulness and first-class semantics, some useful execution control flows cannot be supported (for instance cooperative multitasking or checkpointing).
[...]
In general, stackful coroutine is more powerful than stackless coroutine. So why do we want stackless coroutine? short answer: efficiency.
Stackful coroutine typically needs to allocate a certain amount of memory to accomodate its runtime-stack (must be large enough), and the context-switch is more expensive compared to the stackless one, e.g. Boost.Coroutine takes 40 cycles while CO2 takes just 7 cycles in average on my machine, because the only thing that a stackless coroutine needs to restore is the program counter.
That said, with language support, probably stackful coroutine can also take the advantage of the compiler-computed max-size for the stack as long as there's no recursion in the coroutine, so the memory usage can also be improved.
Speaking of stackless coroutine, bear in mind that it doesn't mean that there's no runtime-stack at all, it only means that it uses the same runtime-stack as the host side, so you can call recursive functions as well, just that all the recursions will happen on the host's runtime-stack. In contrast, with stackful coroutine, when you call recursive functions, the recursions will happen on the coroutine's own stack.
C#'s await/async is compiler feature which rewrites method into special class-finite_state_machine. All method's local variables are automatically moved into fields of that class, and code of method is moved into special class method which depending on current state just jumps by switch into right await position (
http://www.codeproject.com/Articles/535635/Async-Await-and-the-Generated-StateMachine ) . That is similar in something to stackless coroutines.
In C++ stackless coroutines can be implemented just within 100 code lines, for example look to Boost.Asio. Of course syntax sugar is less sweet - local variables must be moved into class field by hands, but everything else is similar in shape and content. For instance state machine is generated automatically (with similar switch inside), by macros. (See talk by Christopher M. Kohlhoff
http://blip.tv/boostcon/why-c-0x-is-the-awesomest-language-for-network-programming-5368225 , code example -
https://github.com/chriskohlhoff/awesome/blob/master/server.cpp ).
Boost.Coroutine library provides stackful coroutines for C++ on wide range of platforms.
Stackful coroutines are much more powerfull than stackless ones:
___________________
Major advantage #1:
Stackful coroutine allows to encapsulate all asynchronous logic inside components - in a such way, that client code would look EXACTLY SAME to SYNCHRONOUS code.
For example, Python's gevent library does monkey patching of stadard sockets which automaticly turns all code that use them into "asyncronous", without any changes (
http://www.gevent.org/intro.html#monkey-patching ).
Another demo can be found at Boost.Coroutine: it has example of simple server which asynchronously reads data from tcp port and prints it on screen, and all this happens in one thread. Reading loop looks exactly same as "normal" blocking code:
___________________
Advantage #2, Performance:
When you have chained levels of awaits - special code is executed at each of them - one level awaits another, and so on. It is O(N).
But in case of stackful coroutines, no matter how long chain you have - each level "implicitly" awaits the bottom one, that is O(1). Among other things this is extremely fast:
http://www.boost.org/doc/libs/1_54_0/libs/coroutine/doc/html/coroutine/performance.html___________________
Advantage #3, the Killer:
Exactly same syntax of await can be emulated with help of Stackful Coroutines.
I have made small proof-of-concept:
https://github.com/panaseleus/await_emu :
I am thinking about all these issues w.r.t. to Rust and for example how to implement the await/async emulation I coded for EMCAScript 1.6:
'use strict';
/*
Inputs a generator function which must 'yield' only Promises, and may
optionally return a value. The return value may optionally be a Promise.
Returns a function which inputs the arguments of the generator function, and
returns a Promise which resolves when the generator function is done.
The return value of the generator function—or undefined—is the resolved value.
The optional '_this' input sets the value of 'this' for the generator function.
Inspiration from and wanting it to work without transpile in ES6:
https://thomashunter.name/blog/the-long-road-to-asyncawait-in-javascript/
http://jlongster.com/A-Study-on-Solving-Callbacks-with-JavaScript-Generators#Async-Solution--2--P
Wanted a solution built on correct primitives of Promises & generators (e.g. not node-fibers):
https://github.com/yortus/asyncawait#2-featuregotcha-summary
http://howtonode.org/generators-vs-fibers
https://blog.domenic.me/youre-missing-the-point-of-promises/
*/
function asyncify(func, _this) {
return function() { // inputs 'arguments'
return new Promise((resolve, reject) => { // inputs functions to invoke to resolve and reject the returned Promise
// Function that iterates (via recursion) each Promise 'yield'ed then return value (of a generator function 'gen')
function f(gen, previous/*= undefined*/) {
const {value, done} = gen.next(previous) // employ ES6 to destructure the returned named tuple into new const vars, https://davidwalsh.name/es6-generators
if (done)
// 'value' is the return value or undefined if none,
// https://davidwalsh.name/es6-generators
resolve(value)
else
// Assume the returned 'value' is a Promise; so
// recurse our iteration function when the Promise resolves
value.then(_ => f(gen, _)).catch(_ => reject(_))
}
f(func.apply(_this, arguments)) // iterate the input 'func' function
})
}
}
'use strict';
/*
Queue of Promises resolved in FIFO order, that enables chaining asynchronous
events which operate on a value.
Pushing to the end of queue returns a Promise which resolves to the value set
by the last shift off the front of the queue, or to the initialized value in
case no prior shifts occurred.
Shifting off the start of the queue inputs the value to resolve the next queued
Promise (which is saved to the initialized value if no next Promise is queued).
*/
object('currentScript').then(_ => fetch[(_ ? _() : document.getElementsByTagName('script').last())['id']]( // register module output with fetch.js
function(value) {
var fifo = []
this.push = () => new Promise(_ => {fifo.push(_); if (fifo.length == 1) _(value)})
this.shift = _ => {fifo.shift(); if (fifo.length) {fifo[0](_)} else {value = _}}
}
))