Reverse engineering Codex CLI to get GPT-5-Codex-Mini to draw me a pelican
simonwillison.net162 points by simonw a day ago
162 points by simonw a day ago
Very mildly related: Pelicans on Bicycles - Raytracer Edition:
https://blog.nawaz.org/posts/2025/Oct/pelican-on-a-bike-rayt...
Hah, love the weird floating eggs!
Perhaps not eggs, but misplaced pelican neck pouches?
Opus4.1: https://claude.ai/public/artifacts/b47c2dd5-41a6-452c-8701-5...
Sonnet 4.5: https://gemini.google.com/share/a8ebea2c31dd
Gemini 2.5pro: https://gemini.google.com/share/a8ebea2c31dd
“a pelican riding on a bicycle in 3d. Works for mobile“
I blogged about this here, including running your benchmark against gpt-5-codex-mini which did the worst example I've seen by far: https://simonwillison.net/2025/Nov/9/pelican-on-a-bike-raytr...
Here's the codex-mini attempt: https://static.simonwillison.net/static/2025/povray-pelican-...
Installing Rust projects is usually as simple as calling `cargo install`. No need to wait for 5-30 minutes until LLM figures this out.
People are delegating way too much to LLMs. In turn, this makes your own research or problem-solving skills less sharp.
Quoting my article:
> This is a useful starting point for a project like this—in figuring out the compile step the coding agent gets seeded with a little bit of relevant information about the project, and if it can compile that means it can later partially test the code it is writing while it works.
"Figure out how to build this" is a shortcut for getting a coding agent primed for future work. If you look at the transcript you can see what it did: https://gistpreview.github.io/?ddabbff092bdd658e06d8a2e8f142...
Ran ls
└ AGENTS.md
CHANGELOG.md
+13 lines
scripts
sdk
Ran cat README.md
└ <p align="center"><code>npm i -g @openai/codex</code><br />or <code>brew
install --cask codex</code></p>
+105 lines
This repository is licensed under the [Apache-2.0 License](LICENSE).
Ran rg -n "rust" -n docs/install.md
└ 23:curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
25:rustup component add rustfmt
26:rustup component add clippy
Ran sed -n '1,200p' docs/install.md
└ ## Install & build
+36 lines
cargo test
```
Im gathering the build steps for the Rust tool, focusing on running cargo
build inside the codex-rs directory. I'll check dependencies and confirm
the instructions before actually building, aiming to provide a clear summary
afterward.
Ran cargo build
└ Updating git repository `https://github.com/nornagon/crossterm`
Updating git repository `https://github.com/nornagon/ratatui`
+825 lines
Compiling codex-cli v0.0.0 (/Users/simon/Dropbox/dev/codex/codex-rs/cli)
Finished `dev` profile [unoptimized + debuginfo] target(s) in 1m 27s
Ran ls target/debug | head
└ apply_patch
apply_patch.d
+6 lines
codex-execpolicy
That's a decent starting point on seeding the context with information that's relevant to making and then testing the modifications I'm about to ask for.What useful context is in there? How to call “cargo build”? It already knows that.
The README files that it looked at, and the directory structure it explored.
It now knows what the project is, what dependencies it uses, how it's laid out and the set of binaries that it generates.
Even more importantly: it knows that the project can be built without errors. If it tries a build later and sees an error it will know that the error was caused by code it had modified.
You could just tell it to check out readme, but I suspect it would have checked it out anyway or figured out the type of project and how it is structured as a first step of any other command you give it as without it it is impossible to add or update the project.
For a Rust developer, neglecting their ability to debug cargo build issues puts their career at risk. For someone like that, letting AI handle it would be a really shortsighted move.
But Simon isn’t a Rust developer - he’s a motivated individual with a side project. He can now speedrun the part he’s not interested in. That doesn’t affect anyone else’s decisions, you can still choose to learn the details. Ability to skip it if you wish, is a huge win for everyone.
> He can now speedrun the part he’s not interested in.
The reductio that people tend to be concerned about is, what if someone is not interested in any aspect of software development, and just wants to earn money by doing it? The belief is that the consequences then start becoming more problematic.
Those people are their own worst enemies.
Some people will always look for ways to "cheat". I don't want to hold back everyone else just because a few people will harm themselves by using this stuff as a replacement for learning and developing themselves.
Do you genuinely believe that this only applies to "a few people"?
This new post gets at the issue: https://news.ycombinator.com/item?id=45868271