Unrolling the Codex agent loop
openai.com394 points by tosh 19 hours ago
394 points by tosh 19 hours ago
The best part about this blog post is that none of it is a surprise – Codex CLI is open source. It's nice to be able to go through the internals without having to reverse engineer it.
Their communication is exceptional, too. Eric Traut (of Pyright fame) is all over the issues and PRs.
This came as a big surprise to me last year. I remember they announced that Codex CLI is opensource, and the codex-rs [0] from TypeScript to Rust, with the entire CLI now open source. This is a big deal and very useful for anyone wanting to learn how coding agents work, especially coming from a major lab like OpenAI. I've also contributed some improvements to their CLI a while ago and have been following their releases and PRs to broaden my knowledge.
I know very little about typescript and even less about rust. Am I getting the rust version of codex when I do `npm i -g @openai/codex`?
A stand alone rust binary would be nicer than installing node.
yes [0]
> The Rust implementation is now the maintained Codex CLI and serves as the default experience
[0] https://github.com/openai/codex/tree/main/codex-rs#whats-new...
For some reason a lot of people are unaware that Claude Code is proprietary.
Probably because it doesn’t matter most of the time?
Same. If you're already using a proprietary model might as well just double down
But you don't have to be restricted to one model either? Codex being open source means you can choose to use Claude models, or Gemini, or...
It's fair enough to decide you want to just stick with a single provider for both the tool and the models, but surely still better to have an easy change possible even if not expecting to use it.
Codex CLI with Opus, or Gemini CLI with 5.2-codex, because they're open sourced agents? Go ahead if you want but show me where it actually happens with practical values
until Microsoft buys it and enshits it.
This is a fun thought experiment. I believe that we are now at the $5 Uber (2014) phase of LLMs. Where will it go from here?
How much will a synthetic mid-level dev (Opus 4.5) cost in 2028, after the VC subsidies are gone? I would imagine as much as possible? Dynamic pricing?
Will the SOTA model labs even sell API keys to anyone other than partners/whales? Why even that? They are the personalized app devs and hosts!
Man, this is the golden age of building. Not everyone can do it yet, and every project you can imagine is greatly subsidized. How long will that last?
While I remember $5 Ubers fondly, I think this situation is significantly more complex:
- Models will get cheaper, maybe way cheaper
- Model harnesses will get more complex, maybe way more complex
- Local models may become competitive
- Capital-backed access to more tokens may become absurdly advantaged, or not
The only thing I think you can count on is that more money buys more tokens, so the more money you have, the more power you will have ... as always.
But whether some version of the current subsidy, which levels the playing field, will persist seems really hard to model.
All I can say is, the bad scenarios I can imagine are pretty bad indeed—much worse than that it's now cheaper for me to own a car, while it wasn't 10 years ago.
If the electric grid cannot keep up with the additional demand, inference may not get cheaper. The cost of electricity would go up for LLM providers, and VCs would have to subsidize them more until the price of electricity goes down, which may take longer than they can wait, if they have been expecting LLM's to replace many more workers within the next few years.
I can run Minimax-m2.1 on my m4 MacBook Pro at ~26 tokens/second. It’s not opus, but it can definitely do useful work when kept on a tight leash. If models improve at anything like the rate we have seen over the last 2 years I would imagine something as good as opus 4.5 will run on similarly specced new hardware by then.
I appreciate this, however, as a ChatGPT, Claude.ai, Claude Code, and Windsurf user... who has tried nearly every single variation of Claude, GPT, and Gemini in those harnesses, and has tested all the those models via API for LLM integrations into my own apps... I just want SOTA, 99% of the time, for myself, and my users.
I have never seen a use case where a "lower" model was useful, for me, and especially my users.
I am about to get almost the exact MacBook that you have, but I still don't want to inflict non-SOTA models on my code, or my users.
This is not a judgement against you, or the downloadable weights, I just don't know when it would be appropriate to use those models.
BTW, I very much wish that I could run Opus 4.5 locally. The best that I can do for my users is the Azure agreement that they will not train on their data. I also have that setting set on my claude.ai sub, but I trust them far less.
Disclaimer: No model is even close to Opus 4.5 for agentic tasks. In my own apps, I process a lot of text/complex context and I use Azure GPT 4.1 for limited llm tasks... but for my "chat with the data" UX, Opus 4.5 all day long. It has tested so superior.
Is Azure's pricing competitive on openAI's offerings through the api? Thanks!