Zig's new plan for asynchronous programs

205 points by messe 10 hours ago

Overall this article is accurate and well-researched. Thanks to Daroc Alden for due diligence. Here are a couple of minor corrections:

> When using an Io.Threaded instance, the async() function doesn't actually do anything asynchronously — it just runs the provided function right away.

While this is a legal implementation strategy, this is not what std.Io.Threaded does. By default, it will use a configurably sized thread pool to dispatch async tasks. It can, however, be statically initialized with init_single_threaded in which case it does have the behavior described in the article.

The only other issue I spotted is:

> For that use case, the Io interface provides a separate function, asyncConcurrent() that explicitly asks for the provided function to be run in parallel.

There was a brief moment where we had asyncConcurrent() but it has since been renamed more simply to concurrent().

landr0id - 5 hours ago

Hey Andrew, question for you about something the article litely touches on but doesn't really discuss further:
> If the programmer uses async() where they should have used asyncConcurrent(), that is a bug. Zig's new model does not (and cannot) prevent programmers from writing incorrect code, so there are still some subtleties to keep in mind when adapting existing Zig code to use the new interface.
What class of bug occurs if the wrong function is called? Is it "UB" depending on the IO model provided, a logic issue, or something else?
- AndyKelley - 4 hours ago
  
  A deadlock.
  For example, the function is called immediately, rather than being run in a separate thread, causing it to block forever on accept(), because the connect() is after the call to async().
  If concurrent() is used instead, the I/O implementation will spawn a new thread for the function, so that the accept() is handled by the new thread, or it will return error.ConcurrencyUnavailable.
  async() is infallible. concurrent() is fallible.
- 7 hours ago

[deleted]

spullara - 5 minutes ago

I think that Java virtual threads solve this problem in a much better way than most other languages. I'm not sure that it is possible in a language as low level as Zig however.

woodruffw - 9 hours ago

I think this design is very reasonable. However, I find Zig's explanation of it pretty confusing: they've taken pains to emphasize that it solves the function coloring problem, which it doesn't: it pushes I/O into an effect type, which essentially behaves as a token that callers need to retain. This is a form of coloring, albeit one that's much more ergonomic.

(To my understanding this is pretty similar to how Go solves asynchronicity, expect that in Go's case the "token" is managed by the runtime.)

flohofwoe - 9 hours ago

If calling the same function with a different argument would be considered 'function coloring', every function in a program is 'colored' and the word loses its meaning ;)
Zig actually also had solved the coloring problem in the old and abandondend async-await solution because the compiler simply stamped out a sync- or async-version of the same function based on the calling context (this works because everything is a single compilation unit).
- zarzavat - an hour ago
  In that case JS is not colored either because an async function is simply a normal function that returns a Promise.
  As far as I understand, coloring refers to async and sync functions having the same calling syntax and interface, I.e.
  b = readFileAsync(p) b = readFileSync(p)
  share the same calling syntax. Whereas
  b = await readFileAsync(p) readFileAsync(p).then(b => ...) b = readFileSync(b)
  are different.
  If you have to call async functions with a different syntax or interface, then it's colored.
- woodruffw - 9 hours ago
  
  > If calling the same function with a different argument would be considered 'function coloring', than every function in a program is 'colored' and the word loses its meaning ;)
  Well, yes, but in this case the colors (= effects) are actually important. The implications of passing an effect through a system are nontrivial, which is why some languages choose to promote that effect to syntax (Rust) and others choose to make it a latent invariant (Java, with runtime exceptions). Zig chooses another path not unlike Haskell's IO.
- SkiFire13 - 6 hours ago
  
  > Zig actually also had solved the coloring problem in the old and abandondend async-await solution because the compiler simply stamped out a sync- or async-version of the same function based on the calling context (this works because everything is a single compilation unit).
  AFAIK this still leaked through function pointers, which were still sync or async (and this was not visible in their type)
  - throwawaymaths - 3 hours ago
    
    Pretty sure the Zig team is aware of this and has plans to fix it before they re-release async.
- adamwk - 8 hours ago
  
  The subject of the function coloring article was callback APIs in Node, so an argument you need to pass to your IO functions is very much in the spirit of colored functions and has the same limitations.
  - jakelazaroff - 8 hours ago
    
    In Zig's case you pass the argument whether or not it's asynchronous, though. The caller controls the behavior, not the function being called.
    
    layer8 - 7 hours ago
    
    The coloring is not the concrete argument (Io implementation) that is passed, but whether the function has an Io parameter in the first place. Whether the implementation of a function performs IO is in principle an implementation detail that can change in the future. A function that doesn't take an Io argument but wants to call another function that requires an Io argument can't. So you end up adding Io parameters just in case, and in turn require all callers to do the same. This is very much like function coloring.
    In a language with objects or closures (which Zig doesn't have first-class support for), one flexibility benefit of the Io object approach is that you can move it to object/closure creation and keep the function/method signature free from it. Still, you have to pass it somewhere.
    
    messe - 5 hours ago
    
    > Whether the implementation of a function performs IO is in principle an implementation detail that can change in the future.
    I think that's where your perspective differs from Zig developers.
    Performing IO, in my opinion, is categorically not an implementation detail. In the same way that heap allocation is not an implementation detail in idiomatic Zig.
    I don't want to find out my math library is caching results on disk, or allocating megabytes to memoize. I want to know what functions I can use in a freestanding environment, or somewhere resource constrained.
    
    simonask - 2 hours ago
    
    This is also why function coloring is not a problem, and is in fact desirable a lot of the time.
    
    - 4 hours ago
    
    [deleted]
    
    derriz - 5 hours ago
    
    > A function that doesn't take an Io argument but wants to call another function that requires an Io argument can't.
    Why? Can’t you just create an instance of an Io of whatever flavor you prefer and use that? Or keep one around for use repeatedly?
    The whole “hide a global event loop behind language syntax” is an example of a leaky abstraction which is also restrictive. The approach here is explicit and doesn’t bind functions to hidden global state.
    
    layer8 - 4 hours ago
    
    You can, but then you’re denying your callers control over the Io. It’s not really different with async function coloring: https://news.ycombinator.com/item?id=46126310
    Scheduling of IO operations isn’t hidden global state. Or if it is, then so is thread scheduling by the OS.
    
    quantummagic - 6 hours ago
    
    Is that a problem in practice though? Zig already has this same situation with its memory allocators; you can't allocate memory unless you take a parameter. Now you'll just have to take a memory allocator AND an additional io object. Doesn't sound very ergonomic to me, but if all Zig code conforms to this scheme, in practice there will only-one-way-to-do-it. So one of the colors will never be needed, or used.
- jcranmer - 8 hours ago
  
  > If calling the same function with a different argument would be considered 'function coloring', than every function in a program is 'colored' and the word loses its meaning ;)
  I mean, the concept of "function coloring" in the first place is itself an artificial distinction invented to complain about the incongruent methods of dealing with "do I/O immediately" versus "tell me when the I/O is done"--two methods of I/O that are so very different that it really requires very different designs of your application on top of those I/O methods: in a sync I/O case, I'm going to design my parser to output a DOM because there's little benefit to not doing so; in an async I/O case, I'm instead going to have a streaming API.
  I'm still somewhat surprised that "function coloring" has become the default lens to understand the semantics of async, because it's a rather big misdirection from the fundamental tradeoffs of different implementation designs.
  - zelphirkalt - an hour ago
    
    Function coloring is the issue, that arises in practice, which is why people discuss, whether some approach solves it or does not.
    Why do you think it automatically follows, that with an async I/O you are going to have a streaming API? An async I/O can just like the sync I/O return a whole complete result, only that you are not waiting for that to happen, but the called async procedure will call you back once the result is calculated. I think a streaming API requires additional implementation effort, not merely async.
  - omnicognate - 2 hours ago
    
    100% agree, but fortunately I don't think it is the "default lens". If it were nobody would be adding new async mechanisms to languages, because "what color is your function" was a self-described rant against async, in favour of lightweight threads. It does seem to have established itself as an unusually persistent meme, though.
- - 8 hours ago
  
  [deleted]
- rowanG077 - 9 hours ago
  
  If your functions suddenly requires (currently)unconstructable instance "Magic" which you now have to pass in from somewhere top level, that indeed suffers from the same issue as async/await. Aka function coloring.
  But most functions don't. They require some POD or float, string or whatever that can be easily and cheaply constructed in place.
throwawaymaths - 3 hours ago

1) zig's io is not a viral effect type, you can in principle declare a global io variable and use it everywhere that any library calls for it. Not best practice for a library writer, but if you're building an app, do what you want.
2) There are two things here, there is function coloring and the function coloring problem. The function coloring problem is five things:
https://journal.stuffwithstuff.com/2015/02/01/what-color-is-...
1. Every function has a color.
2. The way you call a function depends on its color.
3. You can only call a red function from within another red function.
4. Red functions are more painful to call.
5. Some core library functions are red.
You'll have some convincing to do that zig's plan satisfies 4. It's almost certain that it won't satisfy 5.
It's open to debate if zig's plan will work at all, of course.
- woodruffw - 2 hours ago
  
  > 1) zig's io is not an effect type, you can in principle declare a global io variable and use it everywhere that any library calls for it.
  That's an effect, akin to globally intermediated I/O in a managed runtime.
  To make it intuitive: if you have a global token for I/O, does your concurrent program need to synchronize on it in order to operate soundly? Do programs that fail to obtain the token behave correctly?
  - throwawaymaths - 2 hours ago
    
    how do you "fail to obtain the token"?
    
    woodruffw - 2 hours ago
    
    The token guards a fallible resource (I/O). You can (temporarily or permanently) fail to obtain it for any reason that would affect the underlying I/O.
jayd16 - 9 hours ago

Actually it seems like they just colored everything async and you pick whether you have worker threads or not.
I do wonder if there's more magic to it than that because it's not like that isn't trivially possible in other languages. The issue is it's actually a huge foot gun when you mix things like this.
For example your code can run fine synchronously but will deadlock asynchronously because you don't account for methods running in parallel.
Or said another way, some code is thread safe and some code isn't. Coloring actually helps with that.
- flohofwoe - 9 hours ago
  
  > Actually it seems like they just colored everything async and you pick whether you have worker threads or not.
  There is no 'async' anywhere yet in the new Zig IO system (in the sense of the compiler doing the 'state machine code transform' on async functions).
  AFAIK the current IO runtimes simply use traditional threads or coroutines with stack switching. Bringing code-transform-async-await back is still on the todo-list.
  The basic idea is that the code which calls into IO interface doesn't need to know how the IO runtime implements concurrency. I guess though that the function that's called through the `.async()` wrapper is expected to work properly both in multi- and single-threaded contexts.
  - jayd16 - 8 hours ago
    
    > There is no 'async'
    I meant this more as simply an analogy to the devX of other languages.
    >Bringing code-transform-async-await back is still on the todo-list.
    The article makes it seem like "the plan is set" so I do wonder what that Todo looks like. Is this simply the plan for async IO?
    > is expected to work properly both in multi- and single-threaded contexts.
    Yeah... about that....
    I'm also interested in how that will be solved. RTFM? I suppose a convention could be that your public API must be thread safe and if you have a thread-unsafe pattern it must be private? Maybe something else is planned?
    
    messe - 8 hours ago
    
    > The article makes it seem like "the plan is set" so I do wonder what that Todo looks like. Is this simply the plan for async IO?
    There's currently a proposal for stackless coroutines as a language primitive: https://github.com/ziglang/zig/issues/23446
- - 3 hours ago
  
  [deleted]
doyougnu - 8 hours ago

Agreed. the Haskeller in me screams "You've just implemented the IO monad without language support".
- AndyKelley - 6 hours ago
  
  It's not a monad because it doesn't return a description of how to carry out I/O that is performed by a separate system; it does the I/O inside the function before returning. That's a regular old interface, not a monad.
  - endgame - 5 hours ago
    
    So it's the reader monad, then? ;-)
    
    - 5 hours ago
    
    [deleted]
    
    tylerhou - 3 hours ago
    
    Yes.
    
    AndyKelley - an hour ago
    
    Can you explain for those of us less familiar with Haskell (and monads in general)?
SkiFire13 - 6 hours ago

The function coloring problem actually comes up when you implement the async part using stackless coroutines (e.g. in Rust) or callbacks (e.g. in Javascript).
Zig's new I/O does neither of those for now, so hence why it doesn't suffer from it, but at the same time it didn't "solve" the problem, it just sidestepped it by providing an implementation that has similar features but not exactly the same tradeoffs.
- bloppe - 6 hours ago
  
  How are the tradeoffs meaningfully different? Imagine that, instead of passing an `Io` object around, you just had to add an `async` keyword to the function, and that was simply syntactic sugar for an implied `Io` argument, and you could use an `await` keyword as syntactic sugar to pass whatever `Io` object the caller has to the callee.
  I don't see how that's not the exact same situation.
  - bevr1337 - 5 hours ago
    
    In the JS example, a synchronous function cannot poll the result of a Promise. This is meaningfully different when implementing loops and streams. Ex, game loop, an animation frame, polling a stream.
    A great example is React Suspense. To suspend a component, the render function throws a Promise. To trigger a parent Error Boundary, the render function throws an error. To resume a component, the render function returns a result. React never made the suspense API public because it's a footgun.
    If a JS Promise were inspectable, a synchronous render function could poll its result, and suspended components would not need to use throw to try and extend the language.
    
    int_19h - 2 hours ago
    
    .NET has promises that you can poll synchronously. The problem with them is that if you have a single thread, then by definition while your synchronous code is running, none of the async callbacks can be running. So if you poll a Task and it's not complete yet, there's nothing you can do to wait for its completion.
    Well, technically you can run a nested event loop, I guess. But that's such a heavy sync-wrapping-async solution that it's rarely used other than as a temporary hack in legacy code.
    
    bloppe - 4 hours ago
    
    I see. I guess JS is the only language with the coloring problem, then, which is strange because it's one of the few with a built-in event loop.
    This Io business is isomorphic to async/await in Rust or Python [1]. Go also has a built-in "event loop"-type thing, but decidedly does not have a coloring problem. I can't think of any languages besides JS that do.
    [1]: https://news.ycombinator.com/item?id=46126310
    
    unbrice - 3 hours ago
    
    > Go also has a built-in "event loop"-type thing, but decidedly does not have a coloring problem.
    context is kind of a function color in go, and it's also a function argument.
  - VMG - 5 hours ago
    
    Maybe I have this wrong, but I believe the difference is that you can create an Io instance in a function that has none
    
    bloppe - 5 hours ago
    
    In Rust, you can always create a new tokio runtime and use that to call an async function from a sync function. Ditto with Python: just create a new asyncio event loop and call `run`. That's actually exactly what an Io object in Zig is, but with a new name.
    Looking back at the original function coloring post [1], it says:
    > It is better. I will take async-await over bare callbacks or futures any day of the week. But we’re lying to ourselves if we think all of our troubles are gone. As soon as you start trying to write higher-order functions, or reuse code, you’re right back to realizing color is still there, bleeding all over your codebase.
    So if this is isomorphic to async/await, it does not "solve" the coloring problem as originally stated, but I'm starting to think it's not much of a problem at all. Some functions just have different signatures from other functions. It was only a huge problem for JavaScript because the ecosystem at large decided to change the type signatures of some giant portion of all functions at once, migrating from callbacks to async.
    [1]: https://journal.stuffwithstuff.com/2015/02/01/what-color-is-...
- zamalek - 5 hours ago
  
  It's sans-io at the language level, I like the concept.
  So I did a bit of research into how this works in Zig under the hood, in terms of compilation.
  First things first, Zig does compile async fns to a state machine: https://github.com/ziglang/zig/issues/23446
  The compiler decides at compile time which color to compile the function as (potentially both). That's a neat idea, but... https://github.com/ziglang/zig/issues/23367
  > It would be checked illegal behavior to make an indirect call through a pointer to a restricted function type when the value of that pointer is not in the set of possible callees that were analyzed during compilation.
  That's... a pretty nasty trade-off. Object safety in Rust is really annoying for async, and this smells a lot like it. The main difference is that it's vaguely late-bound in a magical way; you might get an unexpected runtime error and - even worse - potentially not have the tools to force the compiler to add a fn to the set of callees.
  I still think sans-io at the language level might be the future, but this isn't a complete solution. Maybe we should be simply compiling all fns to state machines (with the Rust polling implementation detail, a sans-io interface could be used to make such functions trivially sync - just do the syscall and return a completed future).
  - matu3ba - 2 hours ago
    
    > I still think sans-io at the language level might be the future, but this isn't a complete solution. Maybe we should be simply compiling all fns to state machines (with the Rust polling implementation detail, a sans-io interface could be used to make such functions trivially sync - just do the syscall and return a completed future).
    Can you be more specific what is missing in sans-io with explicit state machine for static and dynamic analysis would not be a complete solution? Serializing the state machine sounds excellent for static and dynamic analysis. I'd guess the debugging infrastructure for optimization passes and run-time debugging are missing or is there more?
    
    zamalek - 2 hours ago
    
    Exactly the caveat that they themselves disclose: some scenarios are too dynamic for static analysis.
  - algesten - 4 hours ago
    
    I wouldn't define it as Sans-IO if you take an IO argument and block/wait on reading/writing, whether that be via threads or an event loop.
    Sans-IO the IO is _outside_ completely. No read/write at all.
    
    zamalek - 2 hours ago
    
    Oof, you're completely right. I'm not sure where I got that wire crossed.
dundarious - 9 hours ago

There is a token you must pass around, sure, but because you use the same token for both async and sync code, I think analogizing with the typical async function color problem is incorrect.
rowanG077 - 9 hours ago

Having used zig a bit as a hobby. Why is it more ergonomic? Using await vs passing a token have similar ergonomics to me. The one thing you could say is that using some kind of token makes it dead simple to have different tokens. But that's really not something I run into often at all when using async.
- messe - 9 hours ago
  
  > The one thing you could say is that using some kind of token makes it dead simple to have different tokens. But that's really not something I run into often at all when using async.
  It's valuable to library authors who can now write code that's agnostic of the users' choice of runtime, while still being able to express that asynchronicity is possible for certain code paths.
  - rowanG077 - 9 hours ago
    
    But that can already be done using async await. If you write an async function in Rust for example you are free to call it with any async runtime you want.
    
    messe - 9 hours ago
    
    But you can't call it from synchronous rust. Zig is moving toward all sync code also using the Io interface.
    
    tcfhgj - 7 hours ago
    
    yes, you can:
    runtime.block_on(async { })
    https://play.rust-lang.org/?version=stable&mode=debug&editio...
    
    messe - 6 hours ago
    
    Let me rephrase, you can't call it like any other function.
    In Zig, a function that does IO can be called the same way whether or not it performs async operations or not. And if those async operations don't need concurrency (which Zig expresses separately to asynchronicity), then they'll run equally well on a sync Io runtime.
    
    tcfhgj - 6 hours ago
    
    > In Zig, a function that does IO can be called the same way whether or not it performs async operations or not.
    no, you can't, you need to pass a IO parameter
    
    messe - 6 hours ago
    
    You will need to pass that for synchronous IO as well. All IO in the standard library is moving to the Io interface. Sync and async.
    If I want to call a function that does asynchronous IO, I'll use:
    foo(io, ...);
    If I want to call one that does synchronous IO, I'll write:
    foo(io, ...);
    If I want to express that either one of the above can be run asynchronously if possible, I'll write:
    io.async(foo, .{ io, ... });
    If I want to express that it must be run concurrently, then I'll write:
    try io.concurrent(foo, .{ io, ... });
    Nowhere in the above do I distinguish whether or not foo does synchronous or asynchronous IO. I only mark that it does IO, by passing in a parameter of type std.Io.