Monty: A minimal, secure Python interpreter written in Rust for use by AI

github.com

82 points by dmpetrov 4 hours ago


simonw - 2 hours ago

I got a WebAssembly build of this working and fired up a web playground for trying it out: https://simonw.github.io/research/monty-wasm-pyodide/demo.ht...

It doesn't have class support yet!

But it doesn't matter, because LLMs that try to use a class will get an error message and rewrite their code to not use classes instead.

Notes on how I got the WASM build working here: https://simonwillison.net/2026/Feb/6/pydantic-monty/

imfing - 40 minutes ago

This is a really interesting take on the sandboxing problem. This reminds me of an experiment I worked on a while back (https://github.com/imfing/jsrun), which embedded V8 into Python to allow running JavaScript with tightly controlled access to the host environment. Similar in goal to run untrusted code in Python.

I’m especially curious about where the Pydantic team wants to take Monty. The minimal-interpreter approach feels like a good starting point for AI workloads, but the long tail of Python semantics is brutal. There is a trade-off between keeping the surface area small (for security and predictability) and providing sufficient language capabilities to handle non-trivial snippets that LLMs generate to do complex tasks

avaer - 2 hours ago

This feels like the time I was a Mercurial user before I moved to Git.

Everyone was using git for reasons to me that seemed bandwagon-y, when Mercurial just had such a better UX and mental model to me.

Now, everyone is writing agent `exec`s in Python, when I think TypeScript/JS is far better suited for the job (it was always fast + secure, not to mention more reliable and information dense b/c of typing).

But I think I'm gonna lose this one too.

c2xlZXB5 - an hour ago

Maybe a dumb question, but couldn't you use seccomp to limit/deny the amount of syscalls the Python interpreter has access to? For example, if you don't want it messing with your host filesystem, you could just deny it from using any filesystem related system calls? What is the benefit of using a completely separate interpreter?

JoshPurtell - 38 minutes ago

Monty is the missing link that's made me ship my rust-based RLM implementation - and I'm certain it'll come in handy in plenty of other contexts.

Just beware of panics!

_joel - 2 hours ago

Well I love the name, so definitely trying this out later, but first...

And now for something, completely different.

Retr0id - an hour ago

I'm enjoying watching the battle for where to draw the sandbox boundaries (and I don't have any answers, either!)

zahlman - 16 hours ago

> Instead, it let's you run safely run Python code written by an LLM embedded in your agent, with startup times measured in single digit microseconds not hundreds of milliseconds.

Perhaps if the interpreter is in turn embedded in the executable and runs in-process, but even a do-nothing `uv` invocation takes ~10ms on my system.

I like the idea of a minimal implementation like this, though. I hadn't even considered it from an AI sandboxing perspective; I just liked the idea of a stdlib-less alternative upon which better-thought-out "core" libraries could be stacked, with less disk footprint.

Have to say I didn't expect it to come out of Pydantic.

krick - an hour ago

I don't quite understand the purpose. Yes, it's clearly stated, but, what do you mean "a reasonable subset of Python code" while "cannot use the standard library"? 99.9% of Python I write for anything ever uses standard library and then some (requests?). What do you expect your LLM-agent to write without that? A pseudo-code sorting algorithm sketch? Why would you even want to run that?

rienbdj - 2 hours ago

If we’re going to have LLMs write the code, why not something more performant? Like pages and pages of Java maybe?

kodablah - 11 hours ago

I'm of the mind that it will be better to construct more strict/structured languages for AI use than to reuse existing ones.

My reasoning is 1) AIs can comprehend specs easily, especially if simple, 2) it is only valuable to "meet developers where they are" if really needing the developers' history/experience which I'd argue LLMs don't need as much (or only need because lang is so flexible/loose), and 3) human languages were developed to provide extreme human subjectivity which is way too much wiggle-room/flexibility (and is why people have to keep writing projects like these to reduce it).

We should be writing languages that are super-strict by default (e.g. down to the literal ordering/alphabetizing of constructs, exact spacing expectations) and only having opt-in loose modes for humans and tooling to format. I admit I am toying w/ such a lang myself, but in general we can ask more of AI code generations than we can of ourselves.

dmpetrov - 4 hours ago

I like the idea a lot but it's still unclear from the docs what the hard security boundary is once you start calling LLMs - can it avoid "breaking out" into the host env in practice?

ushakov - 23 minutes ago

without a VM boundary this is potentially dangerous as LLMs could figure out ways of escaping the lightweight sandbox.

falcor84 - 2 hours ago

Wow, a start latency of 0.06ms

OutOfHere - 2 hours ago

It is absurd for any user to use a half baked Python interpreter, also one that will always majorly lag behind CPython in its support. I advise sandboxing CPython instead using OS features.