The agent harness belongs outside the sandbox

mendral.com

104 points by shad42 13 hours ago


zmmmmm - 11 hours ago

I think it omits the real reason I want to run the harness in the sandbox: I barely trust the harness more than the LLM, at least at this point in time. They are so rapidly evolving along with the underlying models, that I don't think they are a reasonable component to rely on to provide safety constraints. Put more precisely: if your harness has an ability to do something the LLM can't, and it has a set of conditions under which the LLM can cause those to be invoked, you have to assume the LLM will work out those conditions and execute them. Effectively you have an arm of the lethal trifecta and pretending otherwise is more dangerous than helpful.

Having said that, some components need to live outside the sandbox (otherwise, who creates the sandbox?). Longer term, I see it as a dedicated security layer, not part of the harness. This probably has yet to emerge fully but it's more like a hypervisor type layer that sits outside of everything and authorises access based on context, human user, etc and can apply policy including mediate the human intervention for decision points when needed.

jdw64 - 11 hours ago

Personally, I find it fascinating to watch how, whenever a new technology appears, people start competing to define and own its standards.

Manus rebuilt its harness five times in six months. The model stayed the same, but the architecture changed five times.

LangChain re-architected Deep Research four times in one year.

Anthropic also ripped out Claude Code’s agent harness whenever the model improved.

Ever since Mitchell Hashimoto mentioned the harness in February, people have been trying to claim that concept. Eventually, someone will probably sell a book called Harness Engineering. I will buy it, of course. Then I will write a blog post about it that nobody reads, with a link that will be buried under ShowDead as soon as I submit it to HN.

And by that point, IT companies will start asking:

“You’re a new grad, right? You know harness engineering, don’t you?”

tptacek - 10 hours ago

There are other models. Eschew the sandbox. Give the agent a computer, with all the trimmings, but keep that computer segregated from sensitive resources. Tokens are a solved problem: tokenize them[1] or do something equivalent with a proxy. The same thing goes for secrets.

A lot of this post presents false dichotomies. It assumes the existence of a sandbox that is by definition ephemeral or "cattle-like". Why? There are reasons to do that and reasons not to do that. You can have a durable computer with a network identity and full connectivity, and you can have that computer spin down and stop billing when not in use.

There are a zillion different shapes for addressing these problems, and I'm twitchy because I think people are super path-dependent right now, and it's causing them to miss a lot of valuable options.

[1]: https://fly.io/blog/tokenized-tokens/ (I work at Fly.io but the thing this post talks about is open source).

MrDarcy - 11 hours ago

> A lot of what an agent does doesn't need a sandbox at all: thinking, calling APIs, summarizing, waiting for CI.

I don’t get it. Calling an API requires a sandbox in most cases. The others could be abused in service of an un-sandboxed agent with API access.

If the harness is outside the sandbox then it’s just an ambiguous and confusing security model and boundary.

Weryj - 2 hours ago

Agreed, this is exactly what we do.

There's no harm in a string, only in the execution.

I create Tools as Actors, which you preconfigured for the LLM context (in-house agent loop). The tools being preconfigured means you setup their environment before they can be executed. If it calls a bash tool for instance, the Tool Actor gets called and then it runs that command against an attached remote VM.

Or filesystem operations, are just read/writes inside a .zip file, which is overlayed onto the target project at build time.

This article is spot on, and I probably say that because it's self reinforcing.

skybrian - 11 hours ago

They didn't make a clear argument in favor of that architecture and I'm not really convinced.

On exe.dev the agent (Shelley) runs in a Linux VM, which is the security boundary. All the conversations are saved to a sqlite database, and it knows how to read it, so you can refer to a previous conversation in the database. It's also handy for asking the AI to do random sysadmin stuff, since it can use sudo.

A downside is that there's nowhere in the VM where secrets are safe from possibly getting exfiltrated via an injection attack. But they have "integrations" where you can put secrets into an http proxy server instead of having them locally.

Also, you don't need to use AI at all. You can use the VM as a VM.

saltcured - 12 hours ago

Sure, the experimental, agentically-developed code should be tested in a sandbox. This sandbox should contain the damage of the code execution when it goes wrong.

But shouldn't there really be another sandbox where the agentic tool calls execute? This is to contain the damage of the tool execution when it goes wrong.

And, the agent harness itself should either implement or be contained in a third sandbox, which should contain the damage of the agent. There should be a firewall layer to limit what tool requests the agent can even make. This is to contain the damage of the agent when it formulates inappropriate requests.

The agent also should not possess credentials, so it cannot leak them to the LLM and allow them to be transformed into other content that might leak out via covert channels.

afshinmeh - 5 hours ago

Agreed and it's a pattern that OpenAI suggested a few days ago, too [1]. I also built a cross platform process level sandboxing that uses parts of OpenAI Codex for the same purpose [2]

[1] https://openai.com/index/the-next-evolution-of-the-agents-sd...

[2] https://github.com/afshinm/zerobox

trjordan - 12 hours ago

Nah. Worse is better.

The reason agents work is because they have access to stuff by default. The whole world is context engineering at this point, and this proposal is to intermediate the context with a bespoke access layer. I put the bare minimum into getting my dev instance into a state where I can develop, because doing stuff (and these days: getting my agent to do stuff) is the goal.

This makes slightly more sense if you're building a SaaS and trying to get others to give you access to their code, their documents, and the rest so you can run agents against it. But the easiest, most powerful way is to just hook the agents up to the place that's already set up.

spankalee - 11 hours ago

This is angling in the right direction, but I think it has two problems:

1) It's still assuming agents have CLIs. This is a very developer-centric concept of agents, and doesn't map well to either consumer or enterprise agents that aren't primarily working with files. Skills, plans, TODO lists, and memory are good, but don't have to be modeled as raw file access. Many harnesses have tools for them.

2) It's talking about a singular sandbox. That's not good enough for prompt injection prevention, secure credential management, and limiting the blast radius of attacks.

NJL3000 - 11 hours ago

Two points:

-What remains unsolved is what should an Agent reasonably have access to in what context and for how long (etc).

Probabilistic code that can run far faster than human driven code, we don’t have a great model yet. We all should spend our energy there…

- Separating / putting controls on the FS resource is no different than putting the agent behind a firewall / allow-deny list.

It doesn’t invalidate running a sandbox in a sandbox to have better security.

nvader - 10 hours ago

Hey aluzzardi, thanks for sharing this article!

I'm really intrigued by your point on read-memory vs a dedicated read interface, because it is a real insight about success rates in harness design.

How did you come to the conclusion you did? Could you speak a little to the evaluations you ran, or the data or anecdotes you collected to validate that decision?

I'm also curious about the overall framing of the question, which I'll challenge with, does the agent have to have a where?

An agent could be modeled by a set of states and transitions. I don't think that there's anything inherently necessary about the current "one process claude" approach for harnesses, other than convenience. Why hasn't a fully distributed harness, built on functions and tables, gained more mindshare?

jFriedensreich - 5 hours ago

The title is highly misleading, they mean the harness belongs outside the sandbox the agent is working in. Please run also the harness in a sandbox, i don't think any of them is safe for a host. This is also the only valuable info in this marketing noise article. The rest is full of hidden endorsements of VC buddies (why would anyone build on closed source sandbox abstractions and claim alternatives need 1s to boot) And signs they cannot reason logically. (eg. Need shared files across users > the only two options are building distr. filesystem or storing in a database. Later admitting even the database solution needs a last write wins resolution on top and completely ignoring they could just as well have delegated shared writes to a authoritative file server with retry on conflicts and git.)

deevus - 5 hours ago

I’ve been working on a sandboxing tool that uses Incus. Originally it was only to run LLMs inside a sandbox, but recently I added MCP so that an agent could spin one up and do work that way.

It currently only exposes a rudimentary set of tools which I’d like to expand. The sandboxes created by MCP are generally ephemeral. The daemon will clean them up after an hour of no usage.

But it’s so cool that they get their own IP and you can ssh straight in. I can see that being very useful when you want to share with a colleague and then close your laptop (assuming it’s running on a remote instance).

https://github.com/deevus/pixels

vursekar - 10 hours ago

> Three engineers trigger the agent on the same incident, and they all see stale state until their sessions end. Conflict resolution, eventual consistency, cache invalidation.

Arguably this is a feature not a bug. Conflict resolution forces the need for a process to come to agreement on a common source of truth - one of the reasons why most Git repos don’t allow users to push to main directly. Writing directly to a shared memory database seems like it would result in chaos and a host of side effects once the number of users scales.

lwansbrough - 10 hours ago

I had an idea that devs could build wasm modules that would define tools and instructions, and a harness could load them. Kind of like MCP but with certain assurances about the sandboxing. You could build a package manager around these behaviours.

I still kind of think it’s a decent idea but it’s too close to MCP with drawbacks that make it a harder sell than MCP. It’s hard to compete on functionality from a secure sandbox if users decide they don’t care about security.

qudat - 9 hours ago

Interesting idea. Tangentially related I’ve been using my local agent to interact with remote shells via zmx, described here: https://bower.sh/zmx-ai-portal

The use case is different but this article strikes some vague similarities around an agent API to remotely execute commands.

avipeltz - 6 hours ago

At least for me one of the major reasons to run an agent in a sandbox is to save memory on my machine if i am running multiple agents in parallel. Wouldnt this not help for that?

Koffiepoeder - 12 hours ago

Slightly related: I am looking for:

- Easy single command CLI agent spawning with templates

- Automatic context transfer (i. e. a bit like git worktrees)

- Fully containerised, but remote (a bit like pods)

- Central, mitm-proxy zero trust authn/authz management (no keys or credentials inside the agents), rather enrichment in the hypervisor/encapsulation

- Multi agent follow-up functionalities

- Fully self hosted/FOSS

Basically a very dev-friendly, secure, "kubernetes"-like solution for running remote agents.

Anyone has an idea of how to achieve this or potential technologies?

blcknight - 12 hours ago

I am not sure anyone knows what a harness is at this point. I've heard 17 different definitions of it at this point. It's almost like a buzzword in search of a problem.

sudb - 11 hours ago

Is secretly rerouting reads/writes/edits of skills and memory any easier than just dumping the actual skills and memory files on disk at sandbox startup?

Another benefit of moving the harness outside the sandbox is you get to avoid accidentally creating a massive distributed system and you therefore don't have to think so much about events/communication between your main API and your sandboxes.

pamcake - 8 hours ago

The agent harness needs different sandbox(es) with different privileges. Nothing here supports not containing its access. It's a mistake to think and talk about "the sandbox" in the way the article does.

solidasparagus - 12 hours ago

Why are two concurrent sessions updating the same memory key with different values? IMO it probably points to a fundamental flaw in how memory is being thought about and built.

moron4hire - 4 hours ago

The idea of physically separating the agent harness and the artifacts on which you want the harness to work seems to be the wrong answer to the right problem, a symptom of a worrying problem with all the agentic workflow work I see in the wild: that of being lazy about how you develop the harness and tools it has available. Specifically, there seems to be a desire to make agent tools in an incredibly naive and simplistic way that ignores access control.

My company is not a "sophisticated" software development organization. We're 3000 people where 2000 do nothing with AI, 900 are naively jumping on every AI bandwagon that comes along (or rather, parroting whatever they read on the Internet to complain about how hard their job is despite it not having materially changed in the last 15 years), 90 are capable of doing anything towards implementing anything involving AI beyond "just use Claude," and 10 have the experience to really scrutinize what is going on. And our work is of a nature that scrutinizing the exact process of how results occurred, what data we came to it by, is essential. There are regulations and compliance issues that could land people in prison if we don't and the results are eventually proved to involve inappropriate data. What does that mean? I'll just say we primarily work for DoD.

I have very long experience with managers asking for the moon and not listening when the engineering staff raises red flags. They ask us, "why can't we just do X?" Where X is whatever they've recently read about in whatever MBA-targeted publication that was bought and paid for by the service provider profitting off of X, with no skin in the game regarding the nuance, because the relevant regulations are written to scapegoat the person in the chair bashing the keys and not the person making the decisions and hanging the former person's jobs over their heads. The "why can't we just do X" is not an in-good-faith question, it's a statement that you need to shut up about your concerns and "just do X."

But out of desperation/malicious compliance, I've started developing an agentic harness that can "just get AI to do it" for the data sources on which we work. And I've noticed two things: A) agent harnesses are not that hard to write (honestly, anyone with basic programming competency should be able to do it), and B) they can only ever work on what you give them. I suppose the last point should be obvious, but I've had enough conversations with folks who expect magic that it is clear that it is not actually obvious.

And that's where I get into "extant agent work is lazy." The agent harness I've developed is incapable of accessing data its operator should not have. If you are cleared to only see a subset of the universe of data, then running this harness cannot possibly give you access to more than your clearance. I'm not trying to brag here, because this was not a difficult guarantee to make. In developing the harness and giving it tools to do work, I just developed the same access controls I would have done for a human user accessing an API to the same data. The only thing that is different about my approach is that I didn't use an off-the-shelf harness with tools developed by others. I just wasn't lazy about my job.

My key stakeholder was skeptical that I was able to do this, mostly because he has subconsciously intuited that our organization is not very sophisticated in developing software. He doesn't understand that employing AI isn't magic, and I think that is the case for a lot of the people who use AI the most here. They see products like Claude go to work and think there is some kind of special sauce there that requires the development by a "frontier" AI firm to actualize. But the truth is, the more you develop agentic AI capability, the less AI you are actually employing. The AI eventually becomes just an orchestrator of tools that perform work by not-AI means. If you are lazy, you try to lean on naive tool implementations that let the AI do whatever "it wants." And that's where you get into trouble. But if you show up to your job and be diligent about implementing those tools, there is no possible way the AI can screw you over, because you never gave it the unrestricted access to curl or `rm -rf`.

This is why, even if AI does become a permanent fixture of software development (still not convinced, even after all this experience), you're still not getting rid of us software engineers, despite how much you hate us. You still need us to protect your data, and nothing about AI has changed the equation that ends in "data is king." If anything, it's more important than ever.

Edit: I'm specifically developing a multi-user agent, accessed via a Web application over a shared database. Row-level access control is baked into every tool and I can do this with little effort because dependency injection Is A Thing. Thus, the parameters of access control never even reach the AI.

Retr0id - 12 hours ago

It took me a while to grok why this made any sense, I think the context is that this is for hosting many agents as a service.

- 12 hours ago
[deleted]
8thcross - 12 hours ago

we are running a harness outside the sandbox, inside a sandobx.

nutfarm - an hour ago

[dead]

ibrahimhossain - 2 hours ago

[flagged]

ArielTM - 4 hours ago

[dead]

thinkneo_ai - 12 hours ago

[flagged]

steffs - 8 hours ago

[dead]

eddyaipt - 10 hours ago

[dead]

0xkvyb - 7 hours ago

[dead]

kweiza - 12 hours ago

[dead]

0xkvyb - 7 hours ago

[dead]