How fast is a macOS VM, and how small could it be?
eclecticlight.co249 points by moosia a day ago
249 points by moosia a day ago
>Starting with 4 virtual cores and 8 GB vRAM, where the VM ran perfectly briskly with around 5 GB of memory used, I stepped down to 3 cores and 6 GB, to discover that memory usage fell to 3.9 GB and everything worked well. With just 2 cores and 4 GB of memory only 3.1 GB of that was used, and the VM continued to handle those lightweight tasks normally.
Good reminder that there's a certain amount of memory tied up with each core (probably mainly page cache and concurrency handling etc).
As a general rule, also the amount of physical memory installed in a computer should be proportional with the number of hardware threads provided by its CPU.
Besides the fact that the operating system may allocate some memory for each thread, when you launch a multi-threaded application that is able to use all available threads, for instance the compilation of a big software project, it frequently will allocate some working memory in an amount proportional with the amount of working threads.
I have encountered many multi-threaded applications that need up to 2 GB per thread to work well.
This corresponds to having 64 GB for a desktop CPU with 32 threads, like Ryzen 9 9950X.
For the compilation example, I have seen software projects, like Chrome/Chromium and its derivatives, where if you do not have enough memory, proportional to the number of hardware threads, e.g. when you have only 32 GB for a 16 core/32 thread CPU, you must reduce the number of concurrent compilations, e.g. with an appropriate parameter to "make -j", leaving some threads and cores idle, because otherwise you may encounter out-of-memory errors.
Compiling flash-attn (Flash Attention) is a another great stress-test for CPU+RAM as just using 16 threads can balloon you into 128GB RAM usage territory already. Same thing with needing to not do too much concurrency when compiling it.
I have this problem with NixOS as one of my build servers doesn’t have enough ram. There doesn’t seem to be a way to know if a compilation is likely to be ram heavy and either use a tagged server with more ram or use few threads on servers with less ram.
It's an important point. I went from 4c/8t and 32GB to 16/32 and 96GB. Dramatically less memory per thread. Some software (looking at you, Vivado) can take incredible amounts of memory per parallel job thus mandating some projects can only run with a subset of my cores. At least until I stepped up my work laptop to 10.66 GB/thread. That seems to be manageable
I'd bet for the null hypothesis: the memory behaviour changes would hold if the core count was kept constant and only the VM's memory size was adjusted.
Agreed. This is the OS adapting to available memory.
Similarly if you started with 4GB and there was 900MB available for user apps, I expect you could launch apps that consume 1500MB just fine; the OS is leaving enough to launch anything, and making use of unused memory for cache/etc.
There is a per-cpu data structure in the xnu kernel, but it is not big enough to tilt the scales when you are talking about RAM in units of gigabytes.
It’s not just the kernel. I wouldn’t be surprised if there’s a fair few userspace services spawning a thread per core.
There is some overhead per-core, you're right, but imo this reduction in usage is likely from how the kernel allocates available memory, which is being reduced as well. The kernel will keep read caches around longer with more memory, it'll prefer to compress memory instead of swap to disk if it has more, it'll purge/cleanup reclaimable memory less often with more memory, etc. It even scales its internal buffer sizes and vnode tables depending on total memory.
All good things imo, it dynamically makes the most of what is available, at the expense of making it harder to see a true baseline of hard min requirement to operate.
Fun things to check, `vm_stat`
$ vm_stat Mach Virtual Memory Statistics: (page size of 4096 bytes)
Pages free: 230295.
Pages active: 1206857.
Pages inactive: 1206361.
Pages speculative: 31863.
Pages throttled: 0.
Pages wired down: 470093.
Pages purgeable: 18894.
"Translation faults": 21635255.
Pages copy-on-write: 1590349.
Pages zero filled: 11093310.
Pages reactivated: 15580.
Pages purged: 50928.
File-backed pages: 689378.
Anonymous pages: 1755703.
Pages stored in compressor: 0.
Pages occupied by compressor: 0.
Decompressions: 0.
Compressions: 0.
Pageins: 832529.
Pageouts: 225.
Swapins: 0.
Swapouts: 0.
edit: no code fence markdown support or am I doing something wrong?
Single inline backticks like `this` aren't recognized (although still useful in my opinion, they just don't change the rendering).
Triple backticks also aren't recognized. However, if you indent by I believe 4 spaces, it formats it in a fixed width font presuming it's code.
Let's try (4 spaces):
func main() {
fmt.Println("Hello, HN!")
}
None for comparison:func main() { fmt.Println("Hello, HN!") }
Seems I missed the window to be able to edit my message, but I'll remember this info for next time, thanks!
Got a M5 air recently - my first dive into MacOS land so trying to figure this out too.
Seems essentially impossible to get:
* pytorch
* GPU acceleration
* VM/container like isolation
The virtio-gpu layer gets closest but seems to only pass through graphics GPU not compute GPU so no pytorch
I need this too, and looked quite a lot on it a year ago. I haven’t had time to check out the recent developments with Docker Model Runner (vllm-metal) or podman libkrun. Did neither of those work for you?
vllm-metal isn't GPU access but rather a openai compatible end point which I can already do via lm studio endpoint over network
>podman libkrun
Haven't tried it but research suggests its really shaky still. podman libkrun exposes vulkan while torch expects mps on macs. Sounds like one can force vulkan but that's apparently slow and beta-ish?
I got torch to run in a Cirruslabs Tart instance.
By "Instance" do you mean their cloud platform?
Nah, just locally on my macair.
TBF, I only got to the point that using device=mps_device didn't fail. I used Sonoma at the time and the image for the vm was ghcr.io/cirruslabs/macos-sequoia-xcode:16.2-beta-3. Python 3.12, as well, because torch didn't work with later versions.
import torch
mps_device = torch.device("mps")
print('device is', mps_device)
x = torch.ones(1, device=mps_device)
print(x)My only experience with VMs on macOS is colima+docker, and it's relatively painful and inefficient (but usable).
Try Apple's container CLI. I moved a project of mine from colima+docker to it relatively easily, a couple of weekends ago.
Here's an example of how to build a simple Alpine Linux container using Apple's containerization CLI. It also demonstrates how to connect to the container through Tailscale SSH using a Tailscale auth key stored in Apple Keychain:
Does this project aim for docker cli and api compatibility? Searching for Docker on that page yields no results. Though in their example, they do show an example of a Dockerfile referencing docker.io without shame.
Typical Apple behavior, I guess, but grating to see in a OSS tool.
This is a weird take, imho. Should they feel shame for using Dockerfiles in their OCI-standard-compliant tool? Would you be happier if they introduced subtly incompatible Applefiles?
Why are they obliged to emulate the Docker CLI? This limits them to just shadowing someone else's product. Just use Docker if you want their CLI/API, it uses the same virtualization framework under the hood on Macs.
I'm curious to know what kind of project is macOS exclusive?
You're surprised that a project by Apple Inc that is basically a wrapper around the Mac virtualisation framework [0] is Mac exclusive?
[0] https://developer.apple.com/documentation/virtualization
container is really good, ive been using it to sandbox some CLI tools and it starts up in less than a second
Recently got a Mac Mini for local CI purposes (together with Forgejo Actions), took a broad look at the ecosystem and decided to just roll with "build on host" instead. Setting up signing/notarization just looked like an insurmountably task together with isolating it from the host, even with agents. At least the macOS builds are really fast now and the signing/notarization just ~200 lines of Bash...
> the signing/notarization just ~200 lines of Bash
200 lines?! That’s two orders of magnitude too many. What exactly are you doing that you need so such code for signing and notarisation?
From the top of my head, unlocking the keychain, finding the right identity, notarizing two parts, the binary itself and the .dmg that the .app ships in and some other stuff I'm sure. Can do a deeper look in a bit when I can. Most of the hassle is because it's 100% unattended and I had to do stuff to avoid GUI-prompts for passwords/unlocks, and that the Forgejo Runner has a different security context.
> unlocking the keychain, finding the right identity
You don’t need to do that, you can give options to the CLI to define what profile to use.
> Most of the hassle is because it's 100% unattended and I had to do stuff to avoid GUI-prompts for passwords/unlocks
I have a shell function to which I point my code and it compiles, signs, and notarises it without any more intervention, GUI or password prompts, and I’m pretty sure signing and notarising are literally two lines.
Unfortunately I’m not at my computer now or I’d paste them, but from your description that script is definitely too long.
I assume you're using notarytool but I doubt that it will work unless you have your keychain unlocked
This matches my experience. Keychain + fully unattended increases the complexity and adds a bunch of landmines that need to be dodged (e.g. GUI prompts like you mentioned).
OrbStack is pretty good. I don't find it inefficient, really.
OrbStack is impressive on the performance and energy efficiency fronts. I'm not aware of anything that comes close. But they're doing something funky under the covers. You can't just start any OS in a VM. It has to be somehow mangled to suit their VM. Thankfully NixOS is available so I'm fine for my use cases. It's still remarkable how efficient it is.
Yeah, it's like WSL. It starts just one VM and then your individual "machines" are LXC containers underneath. If you peek at the vendor-supplied file your NixOS OrbStack Machine includes you can see some of it.
They're constantly doing other optimizations in other ways, too. But that's the one you were pointing at, I think.
That's also what Colima does.
OrbStack isn't open-source though and I can't justify buying a license for every single person in my company just for something functionally equivalent but performing better.
These kinds of things should just be provided by Apple as a first-class thing.
https://github.com/trycua/cua/tree/main/libs/lume had a interesting take on this.
> Starting with 4 virtual cores and 8 GB vRAM, where the VM ran perfectly briskly with around 5 GB of memory used
But... if you start applications inside your VM it will want the full 8 Gb you've allocated not the 5 Gb it uses at startup?
I don’t assume that macOS virtualization is advanced enough to support memory ballooning, or is that not what you’re referring to?
Edit: I stand corrected!
I don't assume anything either, but a single Google search is enough to dispel that [1]
[1] https://developer.apple.com/documentation/virtualization/vzv...
macOS is generally pretty amazing at efficient memory usage and VM (virtual memory subsystem) handling. So even a 8GB machine can run pretty impressive workloads without having the user think the machine is underpowered.
Important caveat: that’s mostly the case for desktop workloads when you’re multitasking a lot, and not as much for server workloads where you actually need all memory.
Not really. Larger page sizes mean more potential for wasted memory and it has had a long standing memory leak in some core component to where even Calculator can cause an OOM event.
GP is pretty accurate in my experience. Up until last year I was still running an Intel MacBook Pro with 8GB of RAM and successfully multitasked with Blender, Illustrator, Unity, VS Code, and Firefox quite often. The math doesn't make sense, but all stayed responsive even with frequent hops between them. The only OOM events I ran into were memory leaks from Firefox, I believe from an extension.
There's nothing particularly interesting about that. Linux distro-of-your-choice can run the equivalents fine, as can Windows.
Browse /r/macos if you dare to wade into the uninformed cesspool; it's full of OOTB apps causing OOMs (among 3rd party apps) with the past at least two major versions of macOS.
I think there is something interesting there. I was running lighter workloads on similar RAM when I daily drove Debian and was frequently brought to my knees by swapping to death. I had to make conscious choices and manage my RAM usage to avoid it, and still occasionally got T-boned by something I overlooked. I have never had to worry about that with macOS.
I admit I don't have much experience with how Windows handles constrained memory since XP, and XP was abysmal at it just by virtue of being far more bloated than an equivalent Linux distro. It's certainly far more bloated nowadays, but maybe it handles memory pressure better.
None of this should be construed to say that macOS doesn't have serious issues or that it's not in dire need of a Snow Leopard-esque "0 new features" release. That's tangential to its memory handling, where I haven't seen the issues you describe.
Even NT4 handles memory pressure than modern day Linux. It's just not a fair comparison; Linux has never dealt with userspace OOM well.
As for macOS...
https://old.reddit.com/r/MacOS/comments/1njf1aj/bravo_apple_...
https://old.reddit.com/r/MacOS/comments/1nxh08n/impressive_m...
https://old.reddit.com/r/MacOS/comments/1jo5pnq/passwords_ap...
https://old.reddit.com/r/MacOS/comments/1gkwxe4/how_is_memor...
https://old.reddit.com/r/MacOS/comments/1seq0ij/freeform_has...
There are _plenty_ more. There is some fundamental library leaking given the range of impacted apps.
Seeing there are thousands running those apps (incl. Freeform) without memory leaks, it could be something else at play here.