Trying Out C++26 Executors
mropert.github.io25 points by ingve 6 days ago
25 points by ingve 6 days ago
The essence of the sender/receiver proposal is essentially this: - first start with a sync function
result_t foo(params_t);
...
auto result = foo(params);
- make it async by adding a continuation: void async_foo(params_t params, invokable<result_t> cont) { cont(foo(params)); };
...
invokable<result_t> auot receiver= [](result_t result) {...};
async_foo(params, receiver);
- then curry it: auto curried_async_foo(params_t params) {
return [params](invokable<result_t> cont) {
async_foo(params, cont);
};}
...
auto op = curried_async_foo(params);
op(receiver);
- finally modify the curried variant to add another required evaluation round: auto foo_sender(param_t params) {
return [params](invokable<result_t> cont) {
return [params, cont]{
async_foo(params, cont);
};};}
...
auto sender = foo_sender(params);
auto operation = sender(receiver);
operation();
The actual library uses structures with named methods instead of callables (so you would do operation.start() for example), plus a separate continuation for the error path (by having the receiver concept implement two named functions), and cancellation support. Also the final operation is required to be address stable until it is completed.The complexity is of course in the details, and I didn't fully appreciate it until I tried to implement a stripped down version of the model (and I'm sure I'm still missing a lot).
The model does work very well with coroutines and can avoid or defer a lot of the expected memory allocations of async operations.
> can avoid or defer a lot of the expected memory allocations of async operations
Is this true in realistic use cases or only in minimal demos? From what I've seen, as soon as your code is complex enough that you need two compilation units, you need some higher level async abstraction, like coroutines.
And as soon as you have coroutines, you need to type-erase both the senders and the scheduler, so you have at least couple of allocations per continuation.
I can't comment on this particular implementation but few years back I played around with a similar idea, so not quite 1-to-1 mapping, but my conclusion derived from the experiments was the same - it is allocation-heavy. Code was built around similar principles, with type-erasure on top of future-promises (no coroutines back then in C++), and work-stealing queues. Code was quite lean, although not as feature-packed as folly, and there was nothing obvious to optimize apart from lock-contention, and dynamic allocations. I did couple of optimizations on both of those ends back then and it seemed to confirm the hypothesis of heavy allocations.
I have only played very little with it and I don't have anything in production yet, so I can't say.
I did manage to co_await senders without additional allocations though (but the code I wrote is write-only so I need to re-understand what I did 6 months ago again).
I recall that I was able to allocate the operation_state as a coroutine-local, and the scheduler is node based, so the operation_state can just be appended to the scheduler queue.
You still do need to allocate the coroutine itself, and I haven't played with coroutine allocators yet.
Thanks, your comment has explained this better than the article
And it does look like an interesting proposal
The article is one-level up where it is both using higher level combinators and the pipe syntax for chaining sender and receivers. I do find that hard to reason as well.
My reduction to continuations is, I think, derived form Eric Niebler original presentation where he introduced a prototype of the idea behind sender/receivers.
At the end they mention this text:
The authors and NVIDIA do not guarantee that this code is fit for any purpose whatsoever.
And say that it worries them. This is actually a warranty disclaimer (the warranty of fitness for a particular purpose) and has to be written like this to be effective. So I would not read anything into it
Not only that, almost every software license (FOSS or proprietary) has a similar clause (often in all-caps).
Is it just me or are the code examples of the executors absolutely unreadable/comprehensible without reading it 5 times?
Even with different formatters I'd much prefer the tbb variant.
C++ has two ways of naming things: `std::incomprehensible_long_name` and `|`.
I love C++ for the power it gives me, but boy do I hate reading C++ code. I know most of these things are for historical reasons and/or done because of parser compatibilities etc. but it's still a pain.
I used to live and breath C++ early 2000s, but haven't touched it since. I can't make sense of modern C++.
Thankfully, you can still write C++ just fine without the "modern" stuff and have not only readable code, but also sane compile times. The notion, explicitly mentioned in the article, that all this insane verbosity also adds 5 seconds to your build for a single executor invocation is just crazy to me (it is far longer than my entire build for most projects).
I am confused. Out of curiosity WDYM by 5 seconds being far longer than your entire build for most projects? That sounds crazy low.