bugaevc,
@bugaevc@floss.social avatar

Let's look into C++ 20 coroutines 🧵

I assume you're already familiar with general ideas of coroutines, async/await, and CPS / state machine transformation from other languages.

bugaevc,
@bugaevc@floss.social avatar

Let me start by criticizing the C++ 20 design, compared to, say, async/await in Rust:

  1. Contrary to the famous C++ "zero overhead" principle, using coroutines pretty much requires pervasive heap allocations and indirect function calls.
bugaevc,
@bugaevc@floss.social avatar

Even something as simple as breaking out a part of an async function into a sub-function and immediately awaiting its result will result in more heap allocations and more indirect calls; you won't get any inlining.

They say that a sufficiently smart compiler could optimize these things out, but in practice this never happens, not even for trivial functions, not even with -O3.

bugaevc,
@bugaevc@floss.social avatar

This is even more striking considering that Rust async/await has been worked on, in public, since much earlier than 2020. In particular this foundational post from 2016 describes the shortcomings of the approach that's similar to (but still actually better than) the one that C++ 20 ended up taking, and what a much better design looks like: https://aturon.github.io/blog/2016/09/07/futures-design/

bugaevc,
@bugaevc@floss.social avatar
  1. C++ 20 coroutines are just way overdesigned.

People say that the Rust async/await design is complicated (futures, tasks, polling, wakers, pinning, oh my!), but compared to C++ 20 coroutines, Rust design is quite straightforward.

bugaevc,
@bugaevc@floss.social avatar

In theory, all these complications & mental gymnastics are intended to make C++ 20 coroutines very flexible, allowing libraries to provide different styles of coroutine-like abstractions instead of hard-wiring a single design into the core language. In practice, this doesn't seem to buy them much flexibility, compared to what would be possible with a more Rust-like design.

bugaevc,
@bugaevc@floss.social avatar
  1. You cannot just write code with it! For instance you cannot write something like this:

int async_callee() {
co_return 42;
}
int async_caller() {
int a = co_await async_callee();
co_return a + 35;
}

No, you need a bunch of library code/abstractions (that the standard library does not even provide!) to make something like that expressible.

bugaevc,
@bugaevc@floss.social avatar

With that being said, let's look at the actual design.

There are a bunch of explanations of this online, each one, naturally, tries to highlight the aspects that its author deemed most important. I read them so you don't have to; and here's my attempt at explaining it.

bugaevc,
@bugaevc@floss.social avatar

First thing: any function using any of co_return/co_await/co_yield in its body is a coroutine.

A coroutine can suspend its execution, saving its state into a "coroutine frame" (that's almost definitely heap-allocated), to be resumed later. std::coroutine_handle<> is a small wrapper around a pointer to a coroutine frame. It doesn't imply any ownership, can be cheaply copied around, and can be converted to a raw pointer and back using .address()/::from_address().

bugaevc,
@bugaevc@floss.social avatar

Through this handle, you can .resume() a suspended coroutine; to make this work, the coroutine frame includes a vtable, and resuming a coroutine does an indirect call through the vtable. It's important to understand that .resume() is just a regular function call: it's not noreturn or anything like that, and it will, in fact, return, once the coroutine suspends the next time.

bugaevc,
@bugaevc@floss.social avatar

A coroutine suspends by co_await'ing an awaitable value.

Specifically, it calls the .await_suspend(std::coroutine_handle<>) method on it, passing a handle to itself. The method should store the handle somewhere and arrange for something to call .resume() on it later, when the awaited value "is ready". For example, it could arrange for handle.resume() to be called by another thread, or in a later iteration of an event loop.

bugaevc,
@bugaevc@floss.social avatar

Once .await_suspend() returns, the coroutine is suspended, and control returns from the .resume() call that has resumed the coroutine this time (or to the original caller if this is the first time the coroutine suspends; more on this below).

bugaevc,
@bugaevc@floss.social avatar

There are a few knobs to this mechanism: you must also define bool .await_ready() that says whether the coroutine should suspend at all, and .await_suspend() can return a boolean or another coroutine handle to tail-resume in stead of void; also, you can override things to transform the operand of co_await, first into an "awaitable", and then into an "awaiter", and that's what the .await_*() methods would actually get called on (this is somewhat like IntoFuture, if you're keeping track).

bugaevc,
@bugaevc@floss.social avatar

When you eventually .resume() the coroutine, first thing it does is it calls .await_resume() on the awaited value, whose return value is the value the original co_await expression evaluates to (it can also return void or throw an exception).

bugaevc,
@bugaevc@floss.social avatar

Now that we looked at suspending and resuming, let's see how a coroutine gets created in the first place.

You cannot tell from a signature alone whether a function is implemented as a coroutine or not (i.e. whether it has any co_ statements in its body). An external library could flip a function implementation between coroutine / not coroutine without breaking API or ABI. The code to invoke it on the caller side is the same, and you get back an instance of the return type in both cases.

bugaevc,
@bugaevc@floss.social avatar

When a function is implemented as a coroutine (because it contains co_ statements in its body), there is an important type that determines things about how it behaves, called the "promise type". Now, this is an example of very unfortunate naming, since this type has nothing to do with the notion of promise objects (aka futures or deferreds), nor with the existing std::promise<> type. A "coroutine policy type" or just "coroutine type" would have been a much better name, but it is what it is.

bugaevc,
@bugaevc@floss.social avatar

The "promise type" is not explicit in the signature or implementation of a coroutine. It is determined from the signature: for a coroutine with signature Ret(Arg1, Arg2, Arg3), the "promise type" used is whatever typename std::coroutine_traits<Ret, Arg1, Arg2, Arg3>::promise_type is; you can define your own specializations to make it be your intended promise type.

bugaevc,
@bugaevc@floss.social avatar

The default implementation provides one specialization, which has promise_type = Ret::promise_type, meaning that if your coroutine's return type has a nested typename promise_type, that will be the promise type of the coroutine, unless you define your own specialization. This is meant for types like MyTask<T> or MyFuture<T> that are intended to be used as coroutines' return types.

bugaevc,
@bugaevc@floss.social avatar

So, we have figured out what the promise type for a coroutine is, based on its signature. The first thing a coroutine does when invoked is it allocates (almost certainly, on the heap) its frame, which includes among other things an instance of the promise type. It then initializes the instance of the promise type, either with its default constructor, or with a constructor taking (a copy of) all of the coroutine's arguments.

bugaevc,
@bugaevc@floss.social avatar

The coroutine then calls .get_return_object() on the created promise, and this is what creates the return value that the coroutine call will return to its caller.

bugaevc,
@bugaevc@floss.social avatar

Since the promise type is located within the coroutine frame, it's trivial to convert between a pointer to the frame (which, remember, is what underlies std::coroutine_handle<>) and a reference to the promise; so you can create a handle using std::coroutine_handle<promise_type>::from_promise(*this).

So you could pass your coroutine handle to the return object you're constructing in .get_return_object(). This may or may not be what you want, depending on what sort of model you're implementing.

bugaevc,
@bugaevc@floss.social avatar

Once the return object is constructed, the coroutine calls .initial_suspend() on the promise, and co_await's the result. People typically make .initial_suspend() return an instance of either std::suspend_always or std::suspend never,

bugaevc,
@bugaevc@floss.social avatar

which are the two simple, ready-made awaitable types, one of which doesn't suspend the coroutine at all (by returning true from .await_ready()), and the other one suspends the coroutine forever and never resumes it. Someone else can still resume a coroutine suspended on std::suspend_always, if they got a handle to it through other means (such as the way described above).

bugaevc,
@bugaevc@floss.social avatar

In practice, .initial_suspend() lets you pick between making a coroutine that doesn't start running until explicitly kicked off (perhaps its result co_await'ed), or a hot-start coroutine that immediately runs until the first "real" suspension point, and only then returns to the caller.

bugaevc,
@bugaevc@floss.social avatar

Whether the coroutine suspends immediately on .initial_suspend(), or runs to its first real suspension point and suspends there, this is where it returns to the caller, and the value that it returns is the object created earlier with .get_return_object(). Once again, the caller may not even know that it's a coroutine that it has invoked, from its perspective it's just another function with a return value.

bugaevc,
@bugaevc@floss.social avatar

From that first suspension on, the coroutine can be resumed via its handle. Whether you're expecting that to be done by the caller (and the return object holds a copy of the coroutine handle) or by whatever the coroutine is suspended on co_await'ing, is up to your design; both options are possible.

As said, std::coroutine_handle<> is a non-owning pointer; also resuming a completed or already running coroutine is UB. This is naturally a ripe area for ownership / UaF / memory unsafety issues.

bugaevc,
@bugaevc@floss.social avatar

When the coroutine completes, it either:

  • calls promise.return_value(value), for 'co_return expr();',
  • calls promise.return_void(), for 'co_return;' (or simply reaching the end of body),
  • calls promise.unhandled_exception() if it terminates by throwing an exception that it doesn't itself catch (notably, the exception doesn't fall out of the coroutine to the resumer).

In the implementation of these, you'd want to save the return value (or the exception info) somewhere, to be retrieved later.

bugaevc,
@bugaevc@floss.social avatar

Finally, it calls .final_suspend() on the promise (which must be noexcept, since it's too late to throw exceptions at this point) and co_await's the returned value. This mirrors the .initial_suspend() at the start.

bugaevc,
@bugaevc@floss.social avatar

The final suspend should really suspend the coroutine (so std::suspend_never won't do), perhaps notify someone that the coroutine is done, maybe post its return value somewhere (perhaps resume another coroutine that's co_await'ing...), and finally destroy the coroutine frame by calling handle.destroy(). It then returns, just like any other suspension point, to the last resumer (or the original caller, if the coroutine never suspended on any other suspension points).

bugaevc,
@bugaevc@floss.social avatar

Note that coroutine's local variables are destructed after .return_value(), but before .final_suspend(), which is why you should only save the return value in .return_value(), and notify others about the coroutine returning at .final_suspend() time. This is important if the local state of the coroutine contains resources such as mutex guards that need to be destructed for the coroutine to be considered to have completed, even though the return value itself is available earlier.

bugaevc,
@bugaevc@floss.social avatar

There's also co_yield, which calls .yield_value(value) on the promise and co_await's the result. You can build generators with this, I guess,

and in fact C++ 23 comes with an std::generator<> type you can just use (or could, if your codebase was C++ 23+ only). On the inside, it's a type that you can return from a coroutine (and it has all of the promise_type things wired up); on the outside, it implements .begin(), .end(), and the C++ 20 range view stuff, so you can just iterate over it.

bugaevc,
@bugaevc@floss.social avatar

That's it! You can find more details in your nearest copy of the C++ standard, or in one of the many explanations on the web.

One final note: it's important to understand that none of this is magic, it's just a state machine transformation and a bunch of compiler-inserted calls (i.e. await_suspend or get_return_object). The calls are real function calls following the ABI, they could be calling into a different compilation unit, a different shared library, etc.

Fin.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • random
  • DreamBathrooms
  • everett
  • InstantRegret
  • magazineikmin
  • thenastyranch
  • rosin
  • GTA5RPClips
  • Durango
  • Youngstown
  • slotface
  • khanakhh
  • kavyap
  • ngwrru68w68
  • ethstaker
  • JUstTest
  • osvaldo12
  • tester
  • cubers
  • cisconetworking
  • mdbf
  • tacticalgear
  • modclub
  • Leos
  • anitta
  • normalnudes
  • megavids
  • provamag3
  • lostlight
  • All magazines