Golang's big miss on memory arenas
avittig.medium.com145 points by andr3wV 8 days ago
145 points by andr3wV 8 days ago
Man this person is mediocre at best. You can do fully manual memory management in Go if you want. The runtime is full of tons of examples where they have 0-alloc, Pools, ring buffers, Assembly, and tons of other tricks.
If you really want an arena like behavior you could allocate a byte slice and use unsafe to cast it to literally any type.
But like… the write up completely missed that manual memory management exists, and Golang considers it “unsafe” and that’s a design principle of the language.
You could argue that C++ RAII overhead is “bounded performance” compared to C. Or that C’s stack frames are “bounded performance” compared to a full in-register assembly implementation of a hot loop.
But that’s bloody stupid. Just use the right tool for the job and know where the tradeoffs are, because there’s always something. The tradeoff boundary for an individual project or person is just arbitrary.
As someone who writes Go code that processes around 100B messages per day (which all need to be parsed and transformed), I can confirm that the author’s position is very much misguided.
And it also completely ignores the fascinating world of “GC-free Java”, which more than a few of the clients I work with use: Java with garbage collection entirely disabled. It’s used in finance a lot.
Is it pretty? No.
Is it effective? Yes.
Regarding Go’s memory arenas, do you need to use memory arenas everywhere ? Absolutely not. Most high performance code has a hot part that’s centered (like the tokenizer example that OP used). You just make that part reuse memory instead of alloc / dealloc and that’s it.
Same. I'm genuinely confused by all the comments of 'ah man, this is holding me back' in this thread, and folks claiming it's not possible to do any arena tricks in Go.
I'm not sure if these are just passerbys, or people who actually use Go but have never strayed from the std lib.
This isn't true in practice because you won't be able to control where allocations are made in the dependencies you use, including inside the Go standard library itself. You could rewrite/fork that code, but then you lose access to the Go ecosystem.
The big miss of the OP is that it ignores the Go region proposal, which is using lessons learned from this project to solve the issue in a more tractable way. So while Arenas won't be shipped as they were originally envisioned, it isn't to say no progress is being made.
I had to fork go’s CSV to make it re-use buffers and avoid defensive copies. But im not sure an arena api is a panacea here - even if i can supply an arena, the library needs certain guarantees about how memory it returns is aliased / used by the caller. Maybe it would still defensive copy into the arena, maybe not. So i don’t see how taking arena as parameter lets a function reason about how safely it can use the arena.
I personally loved using Go 8 years ago. When I built a proof of concept for a new project in both Go and Rust, it became clear that Rust would provide the semantics I’m looking for out of the box. Less fighting with the garbage collector or rolling out my own memory management solution.
If I’m doing that with a lot of ugly code - I might as well use idiomatic Zig with arenas. This is exactly the point the author tried to make.
Your last paragraph captures the tension perfectly. Go just isn’t the tool we thought for some jobs, and maybe that’s okay. If you’re going to count nanoseconds or measure total allocations, it’s better to stick to a non-GC language. Or a third option can be to write your hot loops in one such language; and continue using Go for everything else. Problem solved.
> Go just isn’t the tool we thought for some jobs
Go made it explicitly clear when it was released that it was designed to be a language that felt dynamically-typed, but with performance closer to statically-typed languages, for only the particular niche of developing network servers.
Which job that needs to be a network server, where a dynamically-typed language is a appropriate, does Go fall short on?
One thing that has changed in the meantime is that many actually dynamically-typed languages have also figured out how to perform more like a statically-typed language. That might prompt you to just use one of those dynamically-typed languages instead, but I'm not sure that gives any reason to see Go as being less than it was before. It still fulfills the expectation of having performance more like a statically-typed language.
Which dynamically typed languages perform like a statically typed language?
It says "more like", not "like". Javascript now performs more like a statically-typed language, as one example. That wasn't always the case. It used to be painfully slow — and was so when Go was created. The chasm between them has shrunk dramatically. A fast dynamically-typed language was a novel curiosity when Go was conceived. Which is why Go ended up with a limited type system instead of being truly dynamically-typed.
> Which job that needs to be a network server, where a dynamically-typed language is a appropriate, does Go fall short on?
A job where nanosecond routing decisions need to be made.
> Or a third option can be to write your hot loops in one such language; and continue using Go for everything else. Problem solved.
Or use Go and write ugly code for those hot loops instead of introducing another language and build system. Then you can still enjoy nicety of GC in other parts of your code.
I can see that being an option in a small team that works closely with one another and wants to keep things simple.
Though it is my personal opinion that forcing a GC-based language to do a task best suited for manual memory management is like swimming against the tide. It’s doable but more challenging than it ought to be. I might even appreciate the challenge but the next person maintaining the code might not.
> If you really want an arena like behavior you could allocate a byte slice and use unsafe to cast it to literally any type.
A word of caution. If you do this and then you store pointers into that slice, the GC will likely not see them (as if you were just storing them as `uintptr`s)
You need to ensure that everything you put in the arena only references stuff in the same arena.
No out pointers. If you can do that, you're fine.
I still would be wary, even in that case. Go does not guarantee that the address of an allocation won't change over the lifetime of the allocation (although current implementations do not make use of this).
If you really store just references to the same arena, better to use an offset from the start of the arena. Then it does not matter whether allocations are moved around.
> If you really want an arena like behavior you could allocate a byte slice and use unsafe to cast it to literally any type.
Only if the type is not a pointer per se or does not contain any inner pointers.
Otherwise the garbage collector will bite you hard.
Types with inner pointers add difficulty to be sure, but it’s still possible to use them with this pattern. You have to make sure of three things to do so: 1) no pointers outside of the backing memory; 2) an explicit “clear()” function that manually nulls out inner pointers in the stored object (even inner pointers to other things in the backing slice); 3) clear() is called for all such objects that were ever stored before the backing slice is dropped and before those objects are garbage collected.
Do you have some tips for blog postings, code, articles that explore these topics in Go?
> You can do fully manual memory management in Go if you want. The runtime is full of tons of examples where they have 0-alloc, Pools, ring buffers, Assembly, and tons of other tricks.
The runtime only exposes a small subset of what it uses internally and there's no stable ABI for runtime internals. If you're lucky to get big enough and have friends they might not break you, some internal linkage is being preserved, but in the general case for a general user, nope. Updates might make your code untenable.
> If you really want an arena like behavior you could allocate a byte slice and use unsafe to cast it to literally any type.
AIUI the prior proposals still provided automated lifetime management, though that's related to various of the standing concerns, so you can't match that from "userspace" of go, finalizers don't get executed on a deterministic schedule. Put simply: that's not the same thing.
As someone else points out this is also much more fraught with error than just typing what you described. On top of the GC issue pointed out already, you'll also hit memory model considerations if you're doing any concurrency, which if you actually needed to do this surely you are. Once you're doing that you'll run into the issue, if you're trying to compete with systems languages, that Go only provides a subset of the platform available memory model, in the simplest form it only offers acq/rel atomic semantics. It also doesn't expose any notion of what thread you're running on (which can change arbitrarily) or even which goroutine you're running on. This limits your design space quite significantly at the bounds your performance for high frequency small region operations. I'd actually hazard an educated guess that an arena written as you casually suggest would perform extremely poorly at any meaningful scale (lets say >=32 cores, still fairly modest).
> You could argue that C++ RAII overhead is “bounded performance” compared to C. Or that C’s stack frames are “bounded performance” compared to a full in-register assembly implementation of a hot loop. > But that’s bloody stupid. Just use the right tool for the job and know where the tradeoffs are, because there’s always something. The tradeoff boundary for an individual project or person is just arbitrary.
Sure, reducto ad absurdum, though I typically would optimize against the (systems language) compiler long before I drop to assembly, it's 2025 systems compilers are great and have many optimizations, intrinsics and hints.
> Man this person is mediocre at best.
Harsh, I think the author is fine really. I think their most significant error isn't in missing or not discussing difficult other things they could do with Go, it's seemingly being under the misconception prior to the Arena proposal that Go actually cedes control for lower level optimization. It doesn't, and it never has, and it likely never will (it will gain other semi-generalized internal optimizations over time, lots of work goes into that).
In some cases you can hack some in on your own, but Go is not well placed as a "systems language" if you mean by that something like "competitive efficiency at upper or lower bound scale tasks", it is much better placed as a framework for writing general purpose servers at middle scales. It's best placed on systems that don't have batteries, and that have plenty of ram. It'll provide you with a decent opportunity to scale up and then out in that space as long as you pay attention to how you're doing along the way. It'll hurt if you need to target state of the art efficiency at extreme ends, and very likely block you wholesale.
I'm glad Go folks are still working on ideas to try to find a way for applications to get some more control over allocations. I'm also not expecting a solution that solves my deepest challenges anytime soon though. I think they'll maybe solve some server cases first, and that's probably good, that's Go's golden market.
The vibe I get from this post is of someone who hasn't routinely used arenas in the past and thinks they're kind of a big deal. But a huge part of the point of an arena is how simple it is. You can just build one. Meanwhile, the idea that arena handles were going to be threaded through every high-allocation path in the standard library is fanciful.
Two big issues in Golang are that you can't actually build an arena allocator that can be used for multiple types in a natural way.
The other is that almost no library is written in such a way that buffer re-use is possible (looking at you, typical kafka clients that throw off a buffer of garbage per message and protobuf). The latter could be fixed if people paid more attention to returning buffers to the caller.
You totally can build it using unsafe and generics. I’ve done it with mmap-backed byte slices for arbitrary object storage.
With a number of caveats. You cannot reimplement arenas as the experiment did without special hooks into the runtime. https://github.com/golang/go/blob/master/src/arena/arena.go
The special hooks for context and arena (actually arena(s) can be part of context) should have eliminated the need to change signatures for threading context and arena handles through the chain of calls. Instead there should have been an API (both - internal and user accessible) to check and pick, if present, the closest one on stack (somewhat similar to how you can get ClassLoader and the hierarchy of them in Java)
Rust also suffers from libraries returning a newly allocated strings and vectors when the code should allow to pass a pre-existing string or vector to place the results.
Granted the latter leads to more verbose code and chaining of several calls is no longer possible.
But I am puzzled that even performance-oriented libraries both in Go and Rust still prefer to allocate the results themselves.
I'm curious, do you have any arena experience out of c/cpp/rust/zig?
It may be that you can "just" build one, but you can't "just" use it and expect any of the available libraries and built ins to work with it.
How many things would you have to "just" rewrite?
There was never a proposal to automate arenas in Go code, and that wouldn't even make sense: the point of arenas is that you bump-allocate until some program-specific point where you free all at once (that's why they're so great for compiler code, where you do passes over translation units and can just do the memory accounting at each major step).
(Yes: I used arenas a lot when I was shipping C code; they're a very easy way to get big speed boosts out of code that does a lot of malloc).
I'm not sure what the parent posters were referring to, but there's an interesting way in which "automation" might make sense in some languages: implicit arena utilization based on the current call stack, without needing to pass/thread an explicit `arena` parameter through the ecosystem.
One could imagine a language that allows syntax like CallWithArena(functionPointer, someSetupInfo) and any standard library allocation therein would use the arena, releasing on completion or error.
Languages like Python and modern Java would typically use a thread/virtualthread/greenlet-local variable to track the state for this kind of pattern. The fact that Go explicitly avoids this pattern is a philosophical choice, and arguably a good one for Go to stick to, given its emphasis on avoiding the types of implicit "spooky action at a distance" that often plague hand-rolled distributed systems!
But the concept of arenas could still apply in an AlternateLowLevelLanguage where a notion of scoped/threaded context is implicit and language-supported, and arena choice is tracked in that context and treated as a first-class citizen by standard libraries.
Basically what Odin does yea?
Hadn’t known about Odin but yes!
> Operations such as new, free and delete by default will use context.allocator, which can be overridden by the user. When an override happens all called procedures will inherit the new context and use the same allocator.
This sounds a lot like having an allocator as a parameter object a la Scheme[0]. Really cool!
[0] https://standards.scheme.org/corrected-r7rs/r7rs-Z-H-6.html#...
> Yes: I used arenas a lot when I was shipping C code; they're a very easy way to get big speed boosts out of code that does a lot of malloc
This is something I look forwards to exploring later in my current pet project, right now it has possibly the stupidest GC (just tracks C++ 'new' allocated objects) but is set up for drop in arena allocation with placement new so, we'll see how much that matters later on. There are two allocation patterns, statements and whatnot get compiled to static continuation graphs which push and pop secondary continuations and Value objects to do the deed so, I believe, the second part with the rapid temporary object creation will see the most benefit.
Anyhoo, slightly different pattern where the main benefits will most likely be from the cache locality or whatever, assuming I can even make a placement new arena allocator which is better than the performance of the regular C++ new. Never know, might even add more overhead than just tracking a bunch of raw C++ pointers as I can't imagine there's even a drop of performance which C++ new left on the table?
C++ has the ability to override new and delete, and the standard library supports allocators as type parameters exactly because the standard implementation purpose is to be good enough.
There are plenty of specialisations that get more performance out, e.g. multi-threaded code in NUMA architectures.
> How many things would you have to "just" rewrite?
The same ones you'd have to rewrite using the (experimental) arenas implementation found in the standard library. While not the only reason, this is the primary reason for why it was abandoned; it didn't really fit into the ecosystem the way users would expect — you included, apparently.
"Memory regions" is the successor, which is trying to tackle the concerns you have. Work on it is ongoing.
> By killing Memory Arenas, Go effectively capped its performance ceiling.
I'm still optimistic about potential improvements. (Granted, I doubt there will be anything landing in the near future beyond what the author has already mentioned.)
For example, there is an ongoing discussion on "memory regions" as a successor to the arena concept, without the API "infection" problem:
There's a bunch of activity ongoing to make things better for memory allocation/collection in Go. GreenTeaGC is one that has already landed, but there are others like the RuntimeFree experiment that aims at progressively reduce the amount of garbage generated by enabling safe reuse of heap allocations, as well as other plans to move more allocations to the stack.
Somehow concluding that "By killing Memory Arenas, Go effectively capped its performance ceiling" seems quite misguided.
That one is kind of interesting given the past criticism of Java and .NET having too many GCs and knobs.
With time Go is also getting knobs, and turns out various GC algorithms are actually useful.
Not sure what you are referring to. There are no knobs involved in the things I mentioned (aside from the one to enable the experiment, but that's just temporary until the experiment completes - one way or the other).
The knobs are the values that can be given to GOGC environment variable.
Also I kind of foresee they will discover there are reasons why multiple GC algorithms are desired, and used in other programming ecosystems, thus the older one might stay anyway.
This does make me appreciate some of the decisions that Zig has made, about passing allocators explicitly and also encouraging the use of the ArenaAllocator for most programs.
Since Zig built up the standard library where you always pass an allocator, they avoided the problem that the article mentions, about trying to retrofit Go's standard library to work with an arena allocator.
Although, that's not the case for IO in Zig. The most recent work has actually been reworking the standard library to be where you explicitly pass IO like you pass an allocator.
But it's still a young language so it's still possible to rework it.
I really do enjoy using the arena allocator. It makes things really easy, if your program follows a cyclical pattern where you allocate a bunch of memory and then when you're done just free the entire arena
I'm a bit split on this one.
Simple arenas are easy enough to write yourself, even if it does make unidiomatic code as the author points out. Pretty much anything that allocates tons of slices sees a huge performance bump from doing this. I -would- like that ability in an easier fashion.
On the other, hand, new users will abuse arenas and use them everywhere because "I read they are faster", leading to way worse code quality and bugs overall.
I do agree it would become infectious. Once people get addicted to microbenchmarking code and seeing arenas a bit faster in whatever test they are running, they're going to ask that all allocating functions often used (especially everything in http and json) have the ability to use arenas, which may make the code more Zig-like. Not a dig at Zig, but that would either make the language rather unwieldy or double the number of functions in every package as far as I can see.
> Simple arenas are easy enough to write yourself
you can write arena yourself, but it is useless if lang doesn't allow you to integrate it, e.g. allocate objects and vars on that arena..
You can, for some things, but it's a rather ugly process. I've mainly used it with slices and strings. So not useless, but certainly not full featured or simple.
What's the downside of having one API to pre-allocate memory to be used by the GC, and a second API to suspend/resume GC operations? When you run out of pre-allocated memory, it will resume GC operations automatically.
I'm naively thinking, the performance bottleneck is not with tracking allocations but constantly freeing them and then reallocating. Let the GC track allocations, but prevent it from doing anything else so long as it is under the pre-allocated memory limit for the process. When resumed, it will free unreferenced memory. That way, the program can suspend GC before a performance sensitive block and resume it afterwards. API's don't need to change, because the change at all that way.
Languages like D and C# have such knobs, remember .NET was designed to support C++ as well, and on modern .NET Microsoft has slowly been exposing those capabilities into C#.
I was considering few use cases where arena would make sense and I encountered the "abandoned" arena library in the standard library and then read on why it was never enabled. And yes, in those extremely rare situations, it would be nice to have them. But generally, they make little sense for Go and projects Go is used in. So I definitely do not share any of the opinions from the blog post.
There is Odin, Zig or Jai(likely next year) as new kids on the block and alternatives to the cancer that is Rust or the more mainstream C, C++, Java or even C#.
Go definitely does not have to try and replace any of them. Go has its own place and has absolutely no reason to be fearful of becoming obsolete.
After all, in rare/extreme cases, one can always allocate big array and use flatbuffers for data structures to put into it.
> When your software team needs to pick a language today, you typically weigh two factors: language performance and developer velocity.
There are obviously other factors in play as well, or languages that are really good at both but weak in other areas (like adoption and mind share) would dominate. And I for sure don't see a lot of Crystal around..
I think the overall sentiment with this post is sound, but arenas aren't the answer to Go's performance challenges. From my perspective, possibly in an effort to keep the language simple, Go's designers didn't care about performance. 'let the GC handle it' was the philosophy and as a result you see poor design choices all the way through the standard library. And the abstracting everything through interfaces then compounds the issue because the escape compiler can't see through the interface. The standard library is just riddled with unnecessary allocations. Just look at the JSON parser for instance and the recent work to improve it.
There is some interesting proposals on short term allocations, being able to specify that a local allocation will not leak.
Most recently, I've been fighting with the ChaCha20-Poly1305 implementation because someone in their 'wisdom' added a requirement for contiguous memory for the implementation, including extra space for a tag. Both ChaCha20 and Poly1305 are streaming algorithms, but the go authors decide 'you cannot be trusted' - here's a safe one-shot interface for you to use.
Go really needs a complete overhaul of their Standard Library to fix this, but I can't see this ever getting traction due to the focus on not breaking anything.
Go really is a great language, but should include performance / minimise the GC burden as a key design consideration for it's APIs.
I agree about nearly all of this, but in my fantasy I think the 'unsafe' library should be how to break the abstraction layer and adjust things directly when a good language model isn't provided.
JSON's just a nightmare though. The inane legacy of UCS2 / UTF16 got baked into Unicode 8, and UTF16 escapes into JSON.
A better route for something like Go IMO is to move to a compacting collector, this would allow them to move to a bump allocator like Java for super fast allocations and would make deallocation effectively "free" by only moving live objects. They may need to make it generational so they aren't constantly moving long lived objects, but that is a memory vs cpu trade off (could be one more GC flag?). If I recall, the previous objection was because of CGo, which would require pinning (since C wouldn't tolerate moved pointers), but every Go dev I know hates CGo and generally avoids it, plus I see they added "runtime.Pinner" in 1.21 which should solve that I suspect (albeit it would suddenly be required I expect for pointers retained in C). Is anyone aware of what other challenges there are moving to a compacting collector/bump allocator?
Go exposes raw pointers to the programmer, and its current GC is entirely non-moving. Even excluding cgo, I think a moving one would probably break real programs that rely on pointer values.
Yes, there's a case to be made that exposing "real" pointers in a GC'd language was a substantial mistake, but I guess it simplified _some_ parts of FFI. The trade-off so far maybe is fine, but it is a shame that there are certain things that can't be done without introducing new substantial costs. Maybe the compiler could learn to do something suuuper clever like recognize when pointers are being used non-transparently and automatically pin those, seems fraught with potential error though, trivial example being stuff like &a[0] (that ones easier to catch, others might not be).
True, I forgot about unsafe package. They would probably have to make it a Go 2 thing and add indirection to raw pointers or a need to "pin" them. Since pinning would already exist for CGo I suspect that would make more sense and wouldn't have performance penalty.
My guess is that when you measure, an arena is not worth the trouble when you run a generational GC, which essentially uses an arena for the eden space already. And if you have an arena, it's probably very short lived and would otherwise live entirely in eden.
Go's GC is not generational.
It's not, but joining the two comments together sync.Pool is often close to what you want for a subset of cases, and it's sort of a locality biased generational storage (without actually providing you strong long-term guarantees that it is that).
Going from something like "Go lacks a builtin arena allocation" to "Go risks becoming the COBOL" is a long stretch. First, Go is slower than C/C++/rust without complex memory allocation. Introducing an arena allocator won't fix that. Second, arena allocation often doesn't work for a lot of allocation patterns. Third, plain arena allocator is easy to implement when needed. Surely a builtin one would be better but Go won't fall without it.
Interesting that it never talks about direct competitors to the "middle ground" as well, like Java, C#, Erlang, Haskell, various Lisps, etc.
Not all that interesting when you think about it. Doing so would lead to having to admit that the Go team was right that the proposed arena solution isn't right; that there is a better solution out there. Which defies the entire basis of the blog post. The sunk cost fallacy wouldn't want to see all the effort put into the post go to waste upon realizing that the premise is flawed.
The post could have also mentioned that the Go project hasn't given up. There are alternatives being explored to accomplish the same outcome. But, as before, that would invalidate the basis of the post and the sunk cost fallacy cannot stand the idea of having to throw the original premise into the trash.
(I agree with other commenters' assessment about the importance of the authors complaints, and recommend others checkout the Go memory regions proposal.)
For those interested, here's an article where Miguel Young implements a Go arena: https://mcyoung.xyz/2025/04/21/go-arenas/. I couldn't find references to Go's own experimental arena API in this article. Which is a shame since it'd be if this knowledgeable author traded them off. IIUC, Miguels version and the Go experimental version do have some important differences even apart from the API. IIRC, the Go experimental version doesn't avoid garbage collection. It's main performance benefit is that the Go runtimes' view on allocated memory is decreased as soon as `arena.Free` is called. This delays triggering the garbage collector (meaning it will run less frequently, saving cycles).
If Go refuses to add complexity to gain performance and cannot engineer its way around the GC, it effectively resigns from the pursuit of the high-performance tier.
I'm completely okay with that. In fact I much prefer it.
Writing high performance code is expensive in any language. Expensive in terms of development time, maintenance cost, and risk. It doesn't really matter what language we are talking about. The language usually isn't the limiting factor. Performance is usually lost in the design stage - when people pick the wrong strategies for solving a particular problem.
Not all code needs to be as fast as it can be. The priority for any developer should always be:
1. Correct
2. Understandable
3. Performant
If you haven't achieved 1, then 2 and 3 doesn't matter. At all. If you haven't achieved 2, then the lifetime cost and risk introduced by your code may not have an acceptable cost. When I was inexperienced I only focused on 3. The code needed to be fast. I didn't care if it was impossible for others to maintain. That works if you want no help. Ever. But that isn't how you create lasting value.Good programmers achieve all three and respect the priority. The programmers you don't really want on your team only focus on 3. Their code will be OK in the short term, but in the long term it tends to be a liability. I have seen several commercial products have to rewrite huge chunks of code that was impenetrable to anyone but the original author. And I have seen original authors break under the weight of their own code because they can no longer reason about what it does.
Go tries to not be complex. That is its strength. Introducing complexity that isn't needed by the vast majority of developers is a very bad idea.
If I need performance Go can't deliver there are other languages I could turn to. So far I haven't needed to.
(From the other comment I surmise that there are plenty of tricks one can use in Go to solve scenarios where you need to resort to trickery to get higher performance for various cases. So it seems that what you are asking for isn't even needed)
I like the priorities.
I think a core thing that's missing is that code that performs well is (IME) also the simplest version of the thing. By that, I mean you'll be;
- Avoiding virtual/dynamic dispatch
- Moving what you can up to compile time
- Setting limits on sizing (e.g. if you know that you only need to handle N requests, you can allocate the right size at start up rather than dynamically sizing)
Realistically for a GC language these points are irrelevant w.r.t. performance, but by following them you'll still end up with a simpler application than one that has no constraints and hides everything behind a runtime-resolved interface.
I generally don't worry too much about static vs dynamic dispatch. Not that I use a lot of interfaces all over the place, but there are certain places where I do (for instance persistence layer abstraction - where it doesn't actually matter since any overhead caused by that is many orders of magnitude smaller than the cost of what the call does anyway)
Also, if someone can understand the code, they can optimize it if needed. So in a way, trying to express oneself clearly and simply can be a way to help optimization later.
>If you choose lower-level languages like Rust, your team will spend weeks fighting the borrow checker, asynchronicity, and difficult syntax.
It's interesting the author decides to describe Rust in this way, but then spends the next 90% of the article lambasting the Go authors for having the restraint to not turn Go into Rust.
Arenas are simple to write, and if you need one, there are a lot of implementations available. If you want the language to give you complete flexibility on memory allocations then Go is the wrong language to use. Rust and Zig are right there, you pay upfront from that power with "difficult syntax".
Implicit context [1] was one of the coolest features of a programming language I’ve ever seen that no one has ever implemented. And I’m really not sure why. Not just Go but most languages have this context passing problem with varying degrees of solution quality, making this implicit and built in could have opened up so many possibilities, more than just arenas.
As someone already mentioned, Odin does have an implicit context, and I do think it's a good idea here. https://odin-lang.org/docs/overview/#parameters
Doesn't Scala have this? It sounds like a good idea to me too, but I haven't had a chance to try it for real myself and I've heard other people say it's a bad thing.
But maybe it's like exceptions, where people get involved with a project originally written by people who misused all sorts of language constructs and came away thinking the language was awful, or don't learn idiomatic usage or something.
Clojure has had dynamic vars since the beginning (2007?). Johnathan probably got it from elisp though.
Philosophical question, but after reaching critical mass, should languages even aspire to more? I.e. do you risk becoming "master of none"? What's wrong with specialist languages? I.e. best of breed vs best of suite?
I agree with author Go is getting squeezed, but it has its use cases. "COBOL of could native" implies it's not selected for new things, but I reach for it frequently (Go > Java for "enterprise software" backends, Go > others for CLI tools, obviously cloud native / terraform / CI ecosystem, etc.).
However in "best of suite" world, ecosystem interop matters. C <> Go is a pain point. As is WASM <> Go. Both make me reach for Rust.
> should languages even aspire to more?
Some should, maybe. But Go said right from day one that it doesn't aspire to be anything more than a language that appears dynamically-typed with static-type performance for the creation of network servers. It has no reason to. It was very much built for a specific purpose.
It has found other uses, but that was a surprise to its creators.
> Go is getting squeezed
Is it? I don't really see anything new that is trying to fill the same void. There are older solutions that are still being used, but presumably more would use them if Go hadn't been invented. So it is Go doing the squeezing, so to speak.
Look at PHP. Every year people say PHP got much better then in the dark ages.
Yes it got rid of it's rough edges. People solely look positive at it because it has become more familiar with mainstream OOP languages. But it has no identity anymore. It is still simpler for the web then most competitors, but it doesn't matter because you install 30 packages for an hello world anyway. The community doesn't want simplicity, they want easy, with glorious looking code.
The irony is that PHP is perceived more attractive by coders, but it's so generic now, that a newbie is unlikely to choose it.
Newbies want a compelling catchphrase:
C: So powerful you can shoot your foot off!
Rust: Now that you've shot your foot off, let's not do that a second time.
Javascript: It runs on the server and in the browser.
Typescript: It runs on the server and in the browser, now with types!
In contrast,
PHP: Not sure if I want to be a templating language or general-purpose programming language.
PHP is: see your changes by refreshing your browser. At least that was it’s initial appeal.
Not the ability to mix SQL injection vulnerabilities into the middle of your HTML?
Regardless, you’re thinking of Perl/CGI. PHP did attract the Perl crowd away from Perl, but it wasn’t for that reason. That was already the norm.
> One concern was that Arenas introduced “Use-After-Free” bugs, a classic C++ problem where you access memory after the arena has been cleared, causing a crash.
In Rust, can the lifetime of objects be tied to that of the arena to prevent this?
Asking as a C/C++ programmer with not much Rust experience.
Yes, or rather, the lifetime of references to the contained objects can be tied to the lifetime of references to the arena. E.g., the bumpalo crate [0] has two relevant methods, Bump::alloc(), which puts a value into the arena and gives you back a reference, and Bump::reset(), which erases everything from the arena.
But Bump::reset() takes a &mut self, while Bump::alloc() takes a &self reference and gives back a &mut T reference of the same lifetime. In Rust, &mut references are exclusive, so creating one for Bump::reset() ends the lifetime of all the old &self references, and thus all the old &mut T references you obtained from Bump::alloc(). Ergo, once you call Bump::reset(), none of the contained objects are accessible anymore. The blogpost at [2] gives a few other crates with this same &self -> &mut T pattern.
Meanwhile, some crates such as slab [1] effectively give you a numeric key or token to access objects, and crates differ in whether they have protections to guarantee that keys are unique even if objects are removed. All UAF protection must occur at runtime.
[0] https://docs.rs/bumpalo/3.19.0/bumpalo/struct.Bump.html
I wish Odin could gain more traction
I didn't realize odin had a similar threading model to go with built-in channels, that's pretty neat. Odin might be my next toy language
It’s a great little language. I just wish it had a bigger standard library.
Bigger?! What more do you need?! There are also other things that are on the way as well.
I am a bit confused about the API pollution issue with arenas. I think it's a valid point to think about, but at the same time I don't think the average dev will do any extra steps for the faster thing to do.
I would like to see a reference to the place/proposal where Go team has actually rejected the idea of arenas. I have not see this ever in their issues.
I wonder whether it would be possible to retrofit Arena allocation transparently (and safely!) onto a language with a moving GC (which IIUC Go currently is not):
You could ask the programmer to mark some callstack as arena allocated and redirect all allocations to there while active and move everything that is still live once you leave the arena marked callstack (should be cheap if the live set is small, expensive but still safe otherwise).
Sure, you drop an active arena pointer into TLS and allocate out of that then pop and free it once you pop the stack. Producing API guarantees that all incoming references are dead before you do that though, that's the real trick.
>Instead of asking the runtime for memory object-by-object, an Arena lets you allocate a large pool of memory upfront. You fill that pool with objects using a simple bump pointer (which is CPU cache-friendly), and when you are done, you free the entire pool at once
>They have been trying to prove they can achieve Arena-like benefits with, for example, improved GC algorithms, but all have failed to land
The new Green Tea GC from Go 1.25 [0]:
Instead of scanning objects we scan whole pages. Instead of tracking objects on our work list, we track whole pages. We still need to mark objects at the end of the day, but we’ll track marked objects locally to each page, rather than across the whole heap.
Sounds like a similar direction: "let's work with many objects at once". They mention better cache-friendliness and all.Isn’t a memory arena an application level issue? Like with Arrow I can memory map a file and expose a known range to an array as a buffer.
Sure, but I think the problem is there is an existing paradigm of libraries allocating their own memory. So you would need to pass allocators around all over the place to make it work. If there was a paradigm of libraries not doing allocations and requiring the caller to allocate this wouldn't be such an issue.
> I think the problem is there is an existing paradigm of libraries allocating their own memory.
That is a problem, and the biggest reason for why the arenas proposal was abandoned. But if you were willing to accept that tradeoff in order to use the Go built-in arenas, why wouldn't you also be willing to do so for your own arenas implementation?
> If there was a paradigm of libraries not doing allocations and requiring the caller to allocate this wouldn't be such an issue.
I suppose that is what was at the heart of trying out arenas in an "official" capacity: To see if everyone with bespoke implementations updated them to use a single Go-blessed way to share around. But there was no sign of anyone doing that, so maybe it wasn't a "big miss" after all. Doesn't seem like there was much interest in collaborating on libraries using the same interface. If you're going to keep your code private, you can do whatever you want.
Go now has memory regions, an automatic form of arenas: https://go.googlesource.com/proposal/+/refs/heads/master/des...
I think the deeper issue is that Go's garbage collector is just not performant enough. And at the same time, Go is massively parallel with a shared-everything memory model, so as heaps get bigger, the impact of the imperfect GC becomes more and more noticeable.
Java also had this issue, and they spent decades on tuning collectors. Azul even produced custom hardware for it, at one point in time. I don't think Go needs to go in that direction.
> The real reason was the “Infectious API” problem. To get performance benefits, you can’t just create an arena locally; you have to pass it down the call stack so functions can allocate inside it. This forces a rewrite of function signatures.
Sorry, but it doesn't seem that difficult (famous last words). Add a new implicit parameter to all objects just like "this" called "thisArena". When a call to any allocation is made, pass "thisArena" implicitly, unless something else passed explicitly.
That way the arena is viral all the way down and you can create sub-arenas. It also does not require actually passing the arena as parameter.
You don't even need to rewrite any new code, just recompile it.
That design introduces two kinds of overhead at runtime:
- You need a pointer to the allocator (presumably you’d want to leave room for types beyond arenas). That’s 8 bytes of extra size on every object.
- You need to dynamically dispatch on this pointer for every allocation. Dynamic dispatch is a lot cheaper than it used to be on modern architectures, but it’s a barrier for basic inlining, which is a big deal when alloc() for an arena is otherwise just a pointer bump.
At the end of the day there has to be a tradeoff between ease of use and performance. Having spent a lot of time optimizing high throughput services in go, it always felt like I was fighting the language. And that's because I was... sure they could add arenas but that just feels like what it is, a patch over the fact you're working alongside a GC.
It's more like fighting ideology. Each language goes long ways to teach their idiomatic ways, but if it comes to performance most languages break down at that point. Writing fast code makes you feel dirty, but the fault is in the constant signalling of DON'T DO THAT.
What's to prevent someone from implementing arenas in the user space as a stand alone module?
Nothing but the benefit is limited if you can’t pass the arena to functions doing lots of allocating.
>If you choose TypeScript or Python, you’ll hit a performance wall the moment you venture outside of web apps, CRUD servers, and modeling.
This really isn't very accurate. It is for Python, but JavaScript is massively performant. It's so performant that you can write game loops in it provided you work around the garbage collector, which, as noted, is a foible golang shares.
The solution is the same, to pre-allocate memory.
Even Python is kind of debatable, if PyPy had a bit more of mainstream love.
What are the reasons why PyPy hasn't caught on? I know about PyPy for ages, but I still haven't given it a try, I still feel the aftertaste of anaconda...
The biggest issue has been that CPython exposes its internals to native libraries, thus since many Python libraries are actually thin bindings to native libraries, this reduces the interest in using PyPy.
There is now new ABI proposal that should work across Python implementations, proposed by PyPy, but the uptake seems slow.
https://discuss.python.org/t/c-api-working-group-and-plan-to...
https://doc.pypy.org/en/latest/extending.html
With a good enough JIT, the amount of native libraries wouldn't be needed to the extent it is..
> It’s brilliant code, but it’s not the kind of Go most teams write or can maintain.
Minimizing allocation inside a loop is not a huge insight, nor very rare in any language including python.
Frankly, it’s not a lack of arenas that is holding Go back. It’s the fact that, in 2025, we have a language with a runtime that is neither generational nor compacting. I can’t trust the runtime to perform well, especially in memory-conscious, long-running programs.
Arenas is one of those patterns that very easy to underestimate. I didn't know about it when I started programming and I run into huge performance issue where I needed to deallocate a huge (sometimes tens of GBs consisting of millions of objects) structure just to make a new one. It was often faster to kill the process and start a new one but that had other downsides. At some point we added a simple hand written arena-like allocator and used it along with malloc. The arena was there for objects on that big structure that will all die at the same point and malloc was for all the other things.
The speed-up was impossible to measure because deallocation that used to take up to 30 seconds (especially after repeat cycles of allocating/deallocating) was now instant.
Even though we had very little experience it was trivial to do in C. Imo it's critical for performance oriented language to make using multiple allocators convenient. GC is a known performance killer but so is malloc in some circumstances.
The author is confused about how performance tuning works. Step one, get it right. Step two, see if it's fast enough for the problem at hand.
There is almost never a step three.
But if there is, it's this: Step three: measure.
Now enter a loop of "try something, measure, go to step 2".
Of the things you can try, optimizing GC overhead is but one of many options. Arenas are but one of many options for how to do that.
And the thing about performance optimizations are that they can be intensely local. If you can remove 100% of the allocations on just the happy path inside of one hot loop in your code, then when you loop back to step two, you might find you are done. That does not require an arena allocator with global applicability.
Go gives realistic programmers the right tools to succeed.
And Go's limitations give people like the author plenty of ammunition to fight straw men that don't exist. Tant pis.
one question that always plagues me when we talk about mixing manual and automatic memory systems is...how does it work? if we have a mixed graph of automatic and manual objects, it seems like we dont have a choice except to have garbage collection enabled for everything and make a new root (call it the programmer) that keeps track of whether or not the object has been explicitly freed.
since we still have the tracing overhead and the same lifetimes, we haven't really gained that much by having manual memory.
D's best take at this is a compile-time assert that basically forbids us from allocating GC memory in the affected region (please correct me if I'm wrong), but that is pretty limited.
does anyone else have a good narrative for how this would work?
There are many automatic memory management systems ranging from the simple clearup of immutable systems (https://justine.lol/sectorlisp2/), to region allocation, to refcounting with cycle collection, and the full-fat tracing.
I'd have thought that allocating a block of memory per-GC type would work. As-per Rust you can use mainly one type of GC with a smaller section for eg. cyclic data allocated in a region, which can be torn down when no longer in use.
If you think about it like a kernel, you can have manual management in the core (eg. hard-realtime stuff), and GC in userland. The core can even time-slice the GC. Forth is particularly amenable as it uses stacks, so you can run with just that for most of the time.
I know there have been solutions in the Java world for >20 years, though I can't comment on well they work in practice.
From a quick search, _An Implementation of Scoped Memory for Real-Time Java_ (https://people.csail.mit.edu/rinard/paper/emsoft01.pdf) provides a decent overview:
> Real-Time Java extends this memory model to support two new kinds of memory: immortal memory and scoped memory. Objects allocated in immortal memory live for the entire execution of the program. The garbage collector scans objects allocated in immortal memory to find (and potentially change) references into the garbage collected heap but does not otherwise manipulate these objects.
> Each scoped memory conceptually contains a preallocated region of memory that threads can enter and exit. Once a thread enters a scoped memory, it can allocate objects out of that memory, with each allocation taking a predictable amount of time. When the thread exits the scoped memory, the implementation deallocates all objects allocated in the scoped memory without garbage collection. The specification supports nested entry and exit of scoped memories, which threads can use to obtain a stack of active scoped memories. The lifetimes of the objects stored in the inner scoped memories are contained in the lifetimes of the objects stored in the outer scoped memories. As for objects allocated in immortal memory, the garbage collector scans objects allocated in scoped memory to find (and potentially change) references into the garbage collected heap but does not otherwise manipulate these objects.
> The Real-Time Java specification uses dynamic access checks to prevent dangling references and ensure the safety of using scoped memories. If the program attempts to create either 1) a reference from an object allocated in the heap to an object allocated in a scoped memory or 2) a reference from an object allocated in an outer scoped memory to an object allocated in an inner scoped memory, the specification requires the implementation to throw an exception.
There are some interesting experiments going on in the OCaml world that involve what they call 'modes', essentially a second type system for how a value is used separate from what it is. One goal of modes is to solve this problem. It ends up looking a bit like opting-in to a Rust-style borrow-checker for the relevant functions
They could probably learn one or two things on how Java and .NET do arenas, just saying.
They did. That is how they learned arenas are the wrong abstraction and why the project is now looking at memory regions instead.
Doesn't look like it, in the end it will be like generics, half way there.
Looks like the opposite of generics. Go's generics story is intrinsically linked to Java. It was the Java team that told the Go team to not implement generics until they were perfectly satisfied with the solution, and it was the same guy who ultimately designed both Java's and Go's generics. You cannot take a closer look at Java's generics than that.
Except the Go's implementation is not as capable as Java's one.
Phil Walder delivered a design within Go's team goals.
Java team has told nothing to Go's team, they have acknowldeged their bias anti-generics.
". They are likely the two most difficult parts of any design for parametric polymorphism. In retrospect, we were biased too much by experience with C++ without concepts and Java generics. We would have been well-served to spend more time with CLU and C++ concepts earlier."
https://go.googlesource.com/proposal/+/master/design/go2draf...
> Except the Go's implementation is not as capable as Java's one.
I have no idea what you think is on the other side of that exception. Please clarify.
No need to outsource words. If I wanted to converse with Leapcell, I'd go to him directly.
Especially when it doesn't even begin to address the concern. The disconnect is not in where Go generics are limited, but where you find the exception. There is nothing that I can find to "except against". What was "Except" in reference to?
Not sure about .NET, but Java doesn't have arenas..
It’s this simple in .NET
ArrayPool<T>will elements of arraypool still be tracked by GC with overhead?
Depends on the T.
.NET has value types, explicit stack allocation, low level unsafe programming C style, and manual memory management as well.
java.lang.foreign.Arena
My understanding is that that arena allows you to allocate memory segments, but you can't do much with it, you can't allocate var or object on it like in C++ for example, so its almost useless.
You certainly can, as they were designed as JNI replacement, with the goal to fully support the C ABI of the host platform.
You can either do the whole boilerplate manually with Panama set of APIs, or write a C header file and let jextract do the work of boilerplate generation.
Back in the 1960s, my parents were one of the "first generation" of what we'd call "sport climbers" now. They and their friends climbed all over Scotland, and once they'd done that they climbed in Italy and Austria. They packed everything in, and they packed it all back out, camping, bothying, and bivvying in all conditions.
They and their friends spoke disdainfully of the "short toothbrush brigade". These were the climbers who sawed the handles off their toothbrushes, to save like four grammes in their backpack weight. Massively inconveniencing themselves but they sure were a teaspoon lighter!
This feels like that. Really do you think that playing childish pranks on the garbage collector is going to speed up anything? Pick a faster sorting algorithm or something.