What I learned building an opinionated and minimal coding agent

403 points by SatvikBeri 2 days ago

Really awesome and thoughtful thing you've built - bravo!

I'm so aligned on your take on context engineering / context management. I found the default linear flow of conversation turns really frustrating and limiting. In fact, I still do. Sometimes you know upfront that the next thing you're to do will flood/poison the nicely crafted context you've built up... other times you realise after the fact. In both cases, you didn't have that many alternatives but to press on... Trees are the answer for sure.

I actually spent most of Dec building something with the same philosphy for my own use (aka me as the agent) when doing research and ideation with LLMs. Frustrated by most of the same limitations - want to build context to a good place then preserve/reuse it over and over, fire off side quests etc, bring back only the good stuff. Be able to traverse the tree forwards and back to understand how I got to a place...

Anyway, you've definitely built the more valuable incarnation of this - great work. I'm glad I peeled back the surface of the moltbot hysteria to learn about Pi.

visarga - 2 days ago

> want to build context to a good place then preserve/reuse it over and over, fire off side quests etc, bring back only the good stuff
My attempt - a minimalist graph format that is a simple markdown file with inline citations. I load MIND_MAP.md at the start of work, and update it at the end. It reduces context waste to resume or spawn subagents. Memory across sessions.
https://pastebin.com/VLq4CpCT
- a1ff00 - 2 days ago
  
  This is incredible. It never occurred to me to even think of marrying memory gather and update slash commands as a mindmap that follows the appropriate node and edge. It makes so much sense.
  I was using table structure with column 1 as a key, and col 2 as the data, and told the agents to match key before looking at Col 2. It worked, but sometimes it failed spectacularly.
  I’m going to try this out. Thanks for sharing your .md!
- Aditya_Garg - a day ago
  
  Very very cool. Going to try this out on some of my codebases. Do you have the gist that helps the agent populate the mindmap for an existing codebase? Your pastebin mentions it, but I dont see it linked anywhere.
  - visarga - a day ago
    
    Yes, it is here https://pastebin.com/XuV4H9Zd
    
    Aditya_Garg - a day ago
    
    Thank you!
- bizzletk - a day ago
  
  I love this idea, and have immediately put it to use in my own work.
  Would you mind publishing the `PROJECT_MIND_MAPPING.md` file that's referenced in `MIND_MAP.md'?

kloud - 2 days ago

The OpenClaw/pi-agent situation seems similar to ollama/llama-cpp, where the former gets all the hype, while the latter is actually the more impressive part.

This is great work, I am looking forward how it evolves in the future. So far Claude Code seems best despite its bugs given the generous subscription, but when the market corrects and the prices will get closer to API prices, then probably the pay-per-token premium with optimized experience will be a better deal than to suffer Claude Code glitches and paper cuts.

The realization is that at the end agent framework kit that is customizable and can be recursively improved by agents is going to be better than a rigid proprietary client app.

Aurornis - 2 days ago

> but when the market corrects and the prices will get closer to API prices
I think it’s more likely that the API prices will decrease over time and the CC allowances will only become more generous. We’ve been hearing predictions about LLM price increases for years but I think the unit economics of inference (excluding training) are much better than a lot of people think and there is no shortage of funding for R&D.
I also wouldn’t bet on Claude Code staying the same as it is right now with little glitches. All of the tools are going to improve over time. In my experience the competing tools aren’t bug free either but they get a pass due to underdog status. All of the tools are improving and will continue to do so.
- nl - 2 days ago
  
  > I think it’s more likely that the API prices will decrease over time and the CC allowances will only become more generous.
  I think this is absolutely true. There will likely be caps to stop the people running Ralph loops/GasTown with 20 clients 24/7, but for general use they will probably start to drop the API prices rather than vice-versa.
  > We’ve been hearing predictions about LLM price increases for years but I think the unit economics of inference (excluding training) are much better than a lot of people think
  Inference is generally accepted to be a very profitable business (outside the HN bubble!).
  Claude Code subscriptions are more complicated of course but I think they probably follow the general pattern of most subscription software - lots of people who hardly use it, and a few who push it very hard can they lose money on. Capping the usage solves the "losing money" problem.
badlogic - 2 days ago

FWIW, you can use subscriptions with pi. OpenAI has blessed pi allowing users to use their GPT subscriptions. Same holds for other providers, except Flicker Company.
And I'm personally very happy that Peter's project gets all the hype. The pi repo already gets enough vibesloped PRs from openclaw users as is, and its still only 1/100th of what the openclaw repository has to suffer through.
- kloud - 2 days ago
  
  Good to know, that makes it even better. I still find Opus 4.5 to be the best model currently. But if next generation of GPT/Gemini close the gap that will cross the inflection point for me and make 3rd party harnesses viable. Or if they jump ahead, that should put more pressure on the Flicker Company to fix the flicker or relax the subscriptions.
- MillionOClock - 2 days ago
  
  Is this something that OpenAI explicitly approves per project? I have had a hard time understanding what their exact position is.
  - nojito - 2 days ago
    
    Most likely.
    See here OpenCode.
    https://x.com/thdxr/status/2009742070471082006?s=20
andai - 2 days ago

This is basically identical to the ChatGPT/GPT-3 situation ;) You know OpenAI themselves keep saying "we still don't understand why ChatGPT is so popular... GPT was already available via API for years!"
- smokel - 2 days ago
  
  ChatGPT is quite different from GPT. Using GPT directly to have a nice dialogue simply doesn't work for most intents and purposes. Making it usable for a broad audience took quite some effort, including RLHF, which was not a trivial extension.
jrm4 - 2 days ago

This is the first I'm hearing of this pi-agent thing and HOW DO PEOPLE TECH DECIDE TO NAME THINGS?
Seriously. Is creator not aware that "pi" absolutely invokes the name of another very important thing? sigh.
- haxel - 2 days ago
  
  The creator is very aware. Its original name was "shitty coding agent".
  https://shittycodingagent.ai/
  - jrm4 - 2 days ago
    
    then do SCA and backronym it into something acceptable! That's even better lore :)
    
    haxel - 19 hours ago
    
    There's a fair chunk of irony here in that Mario is being both anti-memetic with his naming choices and contrarian in his design decisions, and yet he still finds himself dunked in the muck of popularity as the backbone of OpenClaw.
    
    gadflyinyoureye - 2 days ago
    
    You mean Software Component Architecture? Do you want to bring down the wrath of IBM!
    
    mwcz - a day ago
    
    Good call, he'll have to name it Shitty COdingagent, or "SCO". No one will sue over that name.
    
    gverrilla - 14 hours ago
    
    ding is a good name for an agent
- greenchair - 2 days ago
  
  Developers are the worst at naming things. This is a well known fact.
- SatvikBeri - 2 days ago
  
  From the article: "So what's an old guy yelling at Claudes going to do? He's going to write his own coding agent harness and give it a name that's entirely un-Google-able, so there will never be any users. Which means there will also never be any issues on the GitHub issue tracker. How hard can it be?"
ohyoutravel - 2 days ago

And like ollama it will no doubt start to get enshittified.
- threecheese - 2 days ago
  
  Only if it enters YC (like Ollama).

msp26 - 2 days ago

> Special shout out to Google who to this date seem to not support tool call streaming which is extremely Google.

Google doesn't even provide a tokenizer to count tokens locally. The results of this stupidity can be seen directly in AI studio which makes an API call to count_tokens every time you type in the prompt box.

haxel - 2 days ago

AI studio also has a bug that continuously counts the tokens, typing or not, with 100% CPU usage.
Sometimes I wonder who is drawing more power, my laptop or the TPU cluster on the other side.
Havoc - 2 days ago

Same for clause code. It’s constantly sending token counting requests
localhost - 2 days ago

tbf neither does anthropic

valleyer - 2 days ago

> If you look at the security measures in other coding agents, they're mostly security theater. As soon as your agent can write code and run code, it's pretty much game over.

At least for Codex, the agent runs commands inside an OS-provided sandbox (Seatbelt on macOS, and other stuff on other platforms). It does not end up "making the agent mostly useless".

chr15m - 2 days ago

Approval should be mandatory for any non-read tool call. You should read everything your LLM intends to do, and approve it manually.
"But that is annoying and will slow me down!" Yes, and so will recovering from disastrous tool calls.
- hk__2 - 2 days ago
  
  You’ll just end up approving things blindly, because 95% of what you’ll read will seem obviously right and only 5% will look wrong. I would prefer to let the agent do whatever they want for 15 minutes and then look at the result rather than having to approve every single command it does.
  - jondwillis - 2 days ago
    
    Works until it has access to write to external systems and your agent is slopping up Linear or GitHub without you knowing, identified as you.
    
    hk__2 - 2 days ago
    
    Sure; I mean this is what I _would like_; I’m not saying this would work 100% of the time.
  - chr15m - 2 days ago
    
    > I would prefer to let the agent do whatever they want
    Lol, good luck to you!
- mbrock - 2 days ago
  
  That kind of blanket demand doesn't persuade anyone and doesn't solve any problem.
  Even if you get people to sit and press a button every time the agent wants to do anything, you're not getting the actual alertness and rigor that would prevent disasters. You're getting a bored, inattentive person who could be doing something more valuable than micromanaging Claude.
  Managing capabilities for agents is an interesting problem. Working on that seems more fun and valuable than sitting around pressing "OK" whenever the clanker wants to take actions that are harmless in a vast majority of cases.
  - chr15m - a day ago
    
    I don't mean to sound like I'm demanding this. I'm saying you will get better outcomes if you choose to do this as a developer.
    You're right it's an interesting problem that seems fun to work on. Hopefully we'll get better harnesses. For now I'm checking everything.
- theshrike79 - a day ago
  
  This is like having a firewall on your desktop where you manually approve each and every connection.
  Secure, yes? Annoying, also yes. Very error-prone too.
- threecheese - 2 days ago
  
  It’s not just annoying; at scale it makes using the agent clis impossible. You can tell someone spends a lot of time in Claude Code: they can type —dangerously-skip-permissions with their eyes closed.
  - chr15m - 2 days ago
    
    Yep. The agent CLIs have the wrong level of abstraction. Needs more human in the loop.
- 0xbadcafebee - 2 days ago
  
  It's not reliable. The AI can just not prompt you to approve, or hide things, etc. AI models are crafty little fuckers and they like to lie to you and find secret ways to do things with alterior motives. This isn't even a prompt injection thing, it's an emergent property of the model. So you must use an environment where everything can blow up and it's fine.
  - chr15m - 2 days ago
    
    The harness runs the tool call for the LLM. It is trivial to not run the tool call without approval, and many existing tools do this.
beacon294 - 2 days ago

My codex just uses python to write files around the sandbox when I ask it to patch a sdk outside its path.
- Sharlin - 2 days ago
  
  It's definitely not a sandbox if you can just "use python to write files" outside of it o_O
  - chongli - 2 days ago
    
    Hence the article’s security theatre remark.
    I’m not sure why everyone seems to have forgotten about Unix permissions, proper sandboxing, jails, VMs etc when building agents.
    Even just running the agent as a different user with minimal permissions and jailed into its home directory would be simple and easy enough.
    
    embedding-shape - 2 days ago
    
    I'm just guessing, but seems the people who write these agent CLIs haven't found a good heuristic for allowing/disallowing/asking the user about permissions for commands, so instead of trying to sit down and actually figure it out, someone had the bright idea to let the LLM also manage that allowing/disallowing themselves. How that ever made sense, will probably forever be lost on me.
    `chroot` is literally the first thing I used when I first installed a local agent, by intuition (later moved on to a container-wrapper), and now I'm reading about people who are giving these agents direct access to reply to their emails and more.
    
    Majromax - 2 days ago
    
    > I'm just guessing, but seems the people who write these agent CLIs haven't found a good heuristic for allowing/disallowing/asking the user about permissions for commands, so instead of trying to sit down and actually figure it out, someone had the bright idea to let the LLM also manage that allowing/disallowing themselves. How that ever made sense, will probably forever be lost on me.
    I don't think there is such a good heuristic. The user wants the agent to do the right thing and not to do the wrong thing, but the capabilities needed are identical.
    > `chroot` is literally the first thing I used when I first installed a local agent, by intuition (later moved on to a container-wrapper), and now I'm reading about people who are giving these agents direct access to reply to their emails and more.
    That's a good, safe, and sane default for project-focused agent use, but it seems like those playing it risky are using agents for general-purpose assistance and automation. The access required to do so chafes against strict sandboxing.
    
    valleyer - 2 days ago
    
    Here's OpenAI's docs page on how they sandbox Codex: https://developers.openai.com/codex/security/
    Here's the macOS kernel-enforced sandbox profile that gets applied to processes spawned by the LLM: https://github.com/openai/codex/blob/main/codex-rs/core/src/...
    I think skepticism is healthy here, but there's no need to just guess.
    
    chongli - 2 days ago
    
    That still doesn't seem ideal. Run the LLM itself in a kernel-enforced sandbox, lest it find ways to exploit vulnerabilities in its own code.
    
    valleyer - 2 days ago
    
    The LLM inference itself doesn't "run code" per se (it's just doing tensor math), and besides, it runs on OpenAI's servers, not your machine.
    
    chongli - 2 days ago
    
    There still needs to be a harness running on your local machine to spawn the processes in their sandboxes. I consider that "part of the LLM" even if it isn't doing any inference.