How to Code Claude Code in 200 Lines of Code

mihaileric.com

676 points by nutellalover a day ago


lmeyerov - a day ago

Something I would add is planning. A big "aha" for effective use of these tools is realizing they run on dynamic TODO lists. Ex: Plan mode is basically bootstrapping how that TODO list gets seeded and how todos ground themselves when they get reached, and user interactions are how you realign the todo lists. The todolist is subtle but was a big shift in coding tools, and many seem to be surprised when we discuss it -- most seem to focus on whether to use plan mode or not, but todo lists will still be active. I ran a fun experiment last month on how well claude code solves CTFs, and disabling the TodoList tool and planning is 1-2 grade jumps: https://media.ccc.de/v/39c3-breaking-bots-cheating-at-blue-t... .

Fwiw, I found it funny how the article stuffs "smarter context management" into a breeze-y TODO bullet point at the end for going production-grade. I've been noticing a lot of NIH/DIY types believing they can do a good job of this and then, when forced to have results/evals that don't suck in production, losing the rest of the year on that step. (And even worse when they decide to fine-tune too.)

jacob019 - 7 hours ago

Seems everyone is working on the same things these days. I built a persistent Python REPL subprocess as an MCP tool for CC, it worked so insanely well that I decided to go all the way. I already had an agentic framework built around tool calling (agentlib), so I adapted it for this new paradigm and code agent was born.

The agent "boots up" inside the REPL. Here's the beginning of the system prompt:

  >>> help(assistant)

  You are an interactive coding assistant operating within a Python REPL.
  Your responses ARE Python code—no markdown blocks, no prose preamble.
  The code you write is executed directly.

  >>> how_this_works()

  1. You write Python code as your response
  2. The code executes in a persistent REPL environment
  3. Output is shown back to you IN YOUR NEXT TURN
  4. Call `respond(text)` ...
You get the idea. No need for custom file editing tools--Python has all that built in and Claude knows it perfectly. No JSON marshaling or schema overhead. Tools are just Python functions injected into the REPL, zero context bloat.

I also built a browser control plugin that puts Claude directly into the heart of a live browser session. It can inject element pickers so I can click around and show it what I'm talking about. It can render prototype code before committing to disk, killing the annoying build-fix loop. I can even SSH in from my phone and use TTS instead of typing, surprisingly great for frontend design work. Knocked out a website for my father-in-law's law firm (gresksingleton.com) in a few hours that would've taken 10X that a couple years ago, and it was super fun.

The big win: complexity. CC has been a disaster on my bookkeeping system, there's a threshold past which Claude loses the forest for the trees and makes the same mistakes over and over. Code agent pushes that bar out significantly. Claude can build new tools on the fly when it needs them. Gemini works great too (larger context).

Have fun out there! /end-rant

libraryofbabel - a day ago

It's a great point and everyone should know it: the core of a coding agent is really simple, it's a loop with tool calling.

Having said that, I think if you're going to write an article like this and call it "The Emperor Has No Clothes: How to Code Claude Code in 200 Lines of Code", you should at least include a reference to Thorsten Ball's excellent article from wayyy back in April 2025 entitled "How to Build an Agent, or: The Emperor Has No Clothes" (https://ampcode.com/how-to-build-an-agent)! That was (as far as I know) the first of these articles making the point that the core of a coding agent is actually quite simple (and all the deep complexity is in the LLM). Reading it was a light-bulb moment for me.

FWIW, I agree with other commenters here that you do need quite a bit of additional scaffolding (like TODOs and much more) to make modern agents work well. And Claude Code itself is a fairly complex piece of software with a lot of settings, hooks, plugins, UI features, etc. Although I would add that once you have a minimal coding agent loop in place, you can get it to bootstrap its own code and add those things! That is a fun and slightly weird thing to try.

(By the way, the "January 2025" date on this article is clearly a typo for 2026, as Claude Code didn't exist a year ago and it includes use of the claude-sonnet-4-20250514 model from May.)

Edit: and if you're interested in diving deeper into what Claude Code itself is doing under the hood, a good tool to understand it is "claude-trace" (https://github.com/badlogic/lemmy/tree/main/apps/claude-trac...). You can use it to see the whole dance with tool calls and the LLM: every call out to the LLM and the LLM's responses, the LLM's tool call invocations and the responses from the agent to the LLM when tools run, etc. When Claude Skills came out I used this to confirm my guess about how they worked (they're a tool call with all the short skill descriptions stuffed into the tool description base prompt). Reading the base prompt is also interesting. (Among other things, they explicitly tell it not to use emoji, which tracks as when I wrote my own agent it was indeed very emoji-prone.)

joshmlewis - 21 hours ago

This is cool but as someone that's built an enterprise grade agentic loop in-house that's processing a billion plus tokens a month, there are so many little things you have to account for that greatly magnify complexity in real world agentic use cases. For loops are an easy way to get your foot in the door and is indeed at the heart of it all, but there are a multitude of a little things that compound complexity rather quickly. What happens when a user sends a message after the first one and the agent has already started the tool loop? Seems simple, right? If you are receiving inputs via webhooks (like from a Slack bot), then what do you do? It's not rocket science but it's also not trivial to do right. What about hooks (guardrails) and approvals? Should you halt execution mid-loop and wait or implement it as an async Task feature like Claude Code and the MCP spec? If you do it async then how do you wake the agent back up? Where is the original tool call stored and how is the output stored for retrieval/insertion? This and many other little things add up and compound on each other.

I should start a blog with my experience from all of this.

nyellin - a day ago

There's a bit more to it!

For example, the agent in the post will demonstrate 'early stopping' where it finishes before the task is really done. You'd think you can solve this with reasoning models, but it doesn't actually work on SOTA models.

To fix 'early stopping' you need extra features in the agent harness. Claude Code does this with TODOs that are injected back into every prompt to remind the LLM what tasks remain open. (If you're curious somewhere in the public repo for HolmesGPT we have benchamrks with all the experiments we ran to solve this - from hypothesis tracking to other exotic approaches - but TODOs always performed best.)

Still, good article. Agents really are just tools in a loop. It's not rocket science.

floppyd - 21 hours ago

> This is the key insight: we’re just telling the LLM “here are your tools, here’s the format to call them.” The LLM figures out when and how to use them.

This really blew my mind back then in the ancient times of 2024-ish. I remember the idea of agents just reached me and I started reading various "here I built an agent that does this" articles, and I was really frustrated at not understanding how the hell LLM "knows" how to call a tool, it's a program, but LLMs just produce text! Yes I see you are telling LLM about tools, but what's next? And then when I finally understood that there's no next, no need to do anything other than explaining — it felt pretty magical, not gonna lie.

prodigycorp - a day ago

This article was more true than not a year ago but now the harnesses are so far past the simple agent loop that I'd argue that this is not even close to an accurate mental model of what claude code is doing.

bilater - 2 hours ago

I'm curious how tools like Claude Code or Cursor edit code. Do they regenerate the full file and diff it, or do they just output a diff and apply that directly? The latter feels more efficient, but harder to implement.

ofirpress - a day ago

We (the SWE-bench team) have a 100 line of code agent that is now pretty popular in both academic and industry labs: https://github.com/SWE-agent/mini-swe-agent

I think it's a great way to dive into the agent world

sams99 - 14 hours ago

For those interested, edit is a surprisingly difficult problem, it seems easy on the surface but there is both fine tuning and real world hallucinations you are fighting with. I implemented one this week in:

https://github.com/samsaffron/term-llm

It is about my 10th attempt at the problem so I am aware of a lot of the edge cases, a very interesting bit of research here is:

https://gist.github.com/SamSaffron/5ff5f900645a11ef4ed6c87f2...

Fascinating read.

mirzap - 11 hours ago

The "200 lines" loop is a good demo of the shape of a coding agent, but it’s like "a DB is a B-tree" - technically true, operationally incomplete.

The hard part isn’t the loop - it’s the boring scaffolding that prevents early stopping, keeps state, handles errors, and makes edits/context reliable across messy real projects.

thiagowfx - an hour ago

The blog post starts with:

> I’m using OpenAI here, but this works with any LLM provider

Have you noticed there’s no OpenAI in the post?

ulaw - a day ago

How many Claudes could Claude Code code if Claude Code could code Claude?

m-hodges - a day ago

Also relevant: You Should Write An Agent¹ and, How To Build An Agent.²

¹ https://fly.io/blog/everyone-write-an-agent/

² https://ampcode.com/how-to-build-an-agent

dmvaldman - 18 hours ago

This misses that agentic LLMs are trained via RL to use specific tools. Adding custom tools is subpar to those the model has been trained with. That's why Claude Code has an advantage, over say, Cursor, by being vertically integrated.

ozim - 3 hours ago

Magic is not agent, magic is neural network that was trained.

Yeah I agree there is bunch of BS tools on top that basically try to coerce people into paying and using their setup so they become dependent on that provider that provides some value but still they are so pushy that it is quite annoying.

tptacek - a day ago

What's interesting to me about the question of whether you could realistically compete with Claude Code (not Claude, but the CLI agent) is that the questions boil down to things any proficient developer could do. No matter how much I'd want to try, I have no hope of building a competitive frontier model --- "frontier model" is a distinctively apt term. But there's no such thing as a "frontier agent", and the Charmbracelet people have as much of a shot at building something truly exception as Anthropic does.

vinhnx - 12 hours ago

This reminds me of Amp's article last year[1]. I building my own coding agent [2]. Two goals: understand real-world agent mechanics and validate patterns I'd observed across OpenAI Codex and contemporary agents.

The core loop is straightforward: LLM + system prompt + tool calls. The differentiator is the harness, CLI, IDE extension, sandbox policies, filesystem ops (grep/sed/find). But what separates effective agents from the rest is context engineering. Anthropic and Manus has published various research articles around this topic.

After building vtcode, my takeaway: agent quality reduces to two factors, context management strategy and model capability. Architecture varies by harness, but these fundamentals remain constant.

[1] https://ampcode.com/how-to-build-an-agent [2] https://github.com/vinhnx/vtcode [3] https://www.anthropic.com/engineering/building-effective-age...

afarah1 - a day ago

Reminds me of this 2023 post "re-implementing LangChain in 100 lines of code": https://blog.scottlogic.com/2023/05/04/langchain-mini.html

We did just that back then and it worked great, we used it in many projects after that.

armcat - 13 hours ago

The new mental model actually is (1) skills based model, i.e. https://agentskills.io/home, and (2) where the LLM agents "see all problems as coding problems". Skills are a bunch of detailed Markdowns and corresponding code libraries and snippets. The mental model thereby loops as follows: read only the top level descriptions in each SKILL.md, use those in-context to decide which skill to pick, after picking the relevant skill read the skill in-depth to choose which code/lib to use, based on the <problem, code/lib> generate new code, execute the code, evaluate, repeat. The problem-as-a-code mental model is also a great way of evaluating, and creating rewards and guarantees.

RagnarD - 13 hours ago

This feels like a pretty deceptive article title. At the end, he does say:

"What We Built vs. Production Tools This is about 200 lines. Production tools like Claude Code add:

Better error handling and fallback behaviors Streaming responses for better UX Smarter context management (summarizing long files, etc.) More tools (run commands, search codebase, etc.) Approval workflows for destructive operations

But the core loop? It’s exactly what we built here. The LLM decides what to do, your code executes it, results flow back. That’s the whole architecture."

But where's the actual test cases of the performance of his little bit of code vs. Claude Code? Is the core of Claude Code really just what he wrote (he boldly asserts 'exactly what we built here')? Where's the empirical proof?

MORPHOICES - 10 hours ago

I recently tried out a tool that seemed quite sophisticated on first glance, but once you peel the layers back, the core logic is surprisingly small.

Not trivial. Just...smaller than expected. It really made me think how often we mistook: surface level complexity product polish system depth Lately, I've been pondering: if I had to re-explain this from scratch, what is its irreducible core?

I'm curious to hear others: What system surprised you by being simpler than you expected? Where was the real complexity? What do people tend to overestimate?

johnsmith1840 - a day ago

Here's the bigger question. Why would you?

Claude code feels like the first commodity agent. In theory its simple but in practice you'll have to maintain a ton of random crap you get no value in maintaining.

My guess is eventually all "agents" will be wipped out by claude code or something equivalent.

Maybe not the companies will die but that all those startups will just be hooking up a generic agent wrapper and let it do its thing directly. My bet is that that the company that would win this is the one with the most training data to tune their agent to use their harness correctly.

schmuhblaster - 19 hours ago

As an experiment over the holidays I had Opus create a coding agent in a Prolog DSL (more than 200 lines though) [0] and I was surprised how well the agent worked out of the box. So I guess that the latest one or two generations of models have reached a stage where the agent harness around the model seems to be less important than before.

[0] https://news.ycombinator.com/item?id=46527722

hazrmard - a day ago

This reflects my experience. Yet, I feel that getting reliability out of LLM calls with a while-loop harness is elusive.

For example

- how can I reliably have a decision block to end the loop (or keep it running)?

- how can I reliably call tools with the right schema?

- how can I reliably summarize context / excise noise from the conversation?

Perhaps, as the models get better, they'll approach some threshold where my worries just go away. However, I can't quantify that threshold myself and that leaves a cloud of uncertainty hanging over any agentic loops I build.

Perhaps I should accept that it's a feature and not a bug? :)

lucideer - 11 hours ago

This is a really great post, concise & clear & educational. I do find the title slightly ironic though when the code example goes on to immediately do "import anthropic" right up top.

(it's just a http library wrapping anthropic's rest API; reimplementing it - including auth - would add enough boilerplate to the examples to make this post less useful, but I just found it funny alongside the title choice)

emsign - 13 hours ago

> The LLM never actually touches your filesystem.

But that's not correct. You give them write access to files it then compiles and executes. It could include code that then runs with the rights of the executing user to manipulate the system. It already has one foot past the door. And you'd have to set up all kinds of safeguards to make sure it doesn't walk outside completely.

It's a fundamental problem if you give agentic AI rights on your system. Which in contrast kind of is the whole purpose of agentic AI.

bjacobso - a day ago

https://www.youtube.com/watch?v=aueu9lm2ubo

santiagobasulto - 7 hours ago

I'm surprised this post has so many upvotes. This is a gross oversimplification of what Claude Code (and other agents can do). On top of that, it's very poorly engineered.

jackfranklyn - a day ago

The benchmark point is interesting but I think it undersells what the complexity buys you in practice. Yes, a minimal loop can score similarly on standardised tasks - but real development work has this annoying property of requiring you to hold context across many files, remember what you already tried, and recover gracefully when a path doesn't work out.

The TODO injection nyellin mentions is a good example. It's not sophisticated ML - it's bookkeeping. But without it, the agent will confidently declare victory three steps into a ten-step task. Same with subagents - they're not magic, they're just a way to keep working memory from getting polluted when you need to go investigate something.

The 200-line version captures the loop. The production version captures the paperwork around the loop. That paperwork is boring but turns out to be load-bearing.

utopiah - 12 hours ago

Why limit it to few tools from a tool registry when running in a full sandbox using QEMU or thinner like Podman/Docker literally takes 10 lines of code? You can still use your real files with a mount point to a directory.

To be clear I'm not implying any of that is useful but if you do want to go down that path then why not actually do it?

loeg - 21 hours ago

Would be nice to have a different wrapper around Claude, even something bare bones, as long as it's easy to modify. Claude code terminal has the jankiest text editor I've ever used -- holding down backspace just moves the cursor. It's something about the key repeat rate. If you manually press backspace repeatedly, it's fine, but a rapid repeat rate confuses the editor. (I'm pretty sure neither GNU readline nor BSD editline do this.) It makes editing prompts a giant pain in the ass.

duncancarroll - 21 hours ago

> "But here’s the thing"

This phrase feels like the new em dash...

pbw - 17 hours ago

I have mixed feelings about the "Do X in N lines of code" genre. I applaud people taking the time to boil something down to its very essence, and implement just that, but I feel like the tone is always, "and the full thing is lame because it's so big," which seems off to me.

rcarmo - a day ago

I think mine have a little more code, but they also have a lot more tools:

- https://github.com/rcarmo/bun-steward

- https://github.com/rcarmo/python-steward (created with the first one)

And they're self-replicating!

cadamsdotcom - 21 hours ago

The devil is in the details, so actually, the Emperor in this analogy most definitely does have clothes.

For example, post-training / finetuning the model specifically to use the tools it’ll be given in the harness. Or endlessly tweaking the system prompt to fine-tune the model’s behavior to a polish.

Plus - both OpenAI and Qwen have models specifically intended for coding.

bochoh - 19 hours ago

I’ve had decent results with this for context management in large code bases so far https://github.com/GMaN1911/claude-cognitive

__0x01 - 16 hours ago

These LLM tools appear to have an unprecedented amount of access to the file systems of their users.

Is this correct, and if so do we need to be concerned about user privacy and security?

kristopolous - 19 hours ago

The source code link at the bottom of the article goes to YouTube for some reason

kirjavascript - a day ago

here's my take, in 70 lines of code: https://github.com/kirjavascript/nanoagent/blob/master/nanoa...

erelong - 15 hours ago

Kind of a meta thought but I guess we could just ask a LLM to guide us through creating some of these things ourselves, right? (Tools, agents, etc.)

oli5679 - a day ago

https://github.com/mistralai/mistral-vibe

This is a really nice open source coding agent implementation. The use of async is interesting.

OsrsNeedsf2P - 21 hours ago

Yea.. our startup greatly overestimated how hard it is to make a good agent loop. Handling exit conditions, command timeouts, context management, UI, etc is surprisingly hard to do seamlessly.

- a day ago
[deleted]
andai - 20 hours ago

    from ddgs import DDGS

    def web_search(query: str, max_results: int = 8) -> list[dict]:
        return DDGS().text(query, max_results=max_results)
nxobject - a day ago

I'll admit that I'm tickled pink by the idea of a coding agent recreating itself. Are we at a point where agents can significantly and autonomously improve themselves?

- a day ago
[deleted]
mudkipdev - 21 hours ago

Is this not using constrained decoding? You should pass the tool schemas to the "tools" parameter to make sure all tool calls are valid

_def - 16 hours ago

Are there useful open source "agents" already ready to use with local LLMs?

mephos - 16 hours ago

How much Claude could a Claude code code if a Claude code could code Claude

computerex - 13 hours ago

I think some of the commenters are missing the point of this article. Claude Code is a low level harness around the model. Low level thin wrappers are unreasonably effective at code editing. My takeaway is that imagine how good code editing systems will be once our tools are not merely wrappers around the model. Once we have tall vertical systems that use engineering+llms to solve big chunks of problems. I could imagine certain classes of software being "solved".

Imagine a SDK that's dedicated to customizing tools like claude code/cursor cli to produce a class of software like b2b enterprise saas. Within the bounds of the domain(s) modeled these vertical systems would ultimately even crush the capabilities of thin low level wrappers we have today.

dangoodmanUT - 21 hours ago

This definitely will handle large files or binary files very poorly

- 13 hours ago
[deleted]
d4rkp4ttern - 18 hours ago

This is consistent with how I've defined the 3 core elements of an agent:

- Intelligence (the LLM)

- Autonomy (loop)

- Tools to have "external" effects

Wrinkles that I haven't seen discussed much are:

(1) Tool-forgetting: LLM forgets to call a tool (and instead outputs plain text). Some may say that these concerns will disappear as frontier models improve, there will always be a need for having your agent scaffolding work well with weaker LLMs (cost, privacy, etc), and as long as the model is stochastic there will always be a chance of tool-forgetting.

(2) Task-completion-signaling: Determining when a task is finished. This has 2 sub-cases: (2a) we want the LLM to decide that, e.g. search with different queries until desired info found, (2b) we want to specify deterministic task completion conditions, e.g., end the task immediately after structured info extraction, or after acting on such info, or after the LLM sees the result of that action etc.

After repeatedly running into these types of issues in production agent systems, we’ve added mechanisms for these in the Langroid[1] agent framework, which has blackboard-like loop architecture that makes it easy to incorporate these.

For issue (1) we can configure an agent with a `handle_llm_no_tool` [2] set to a “nudge” that is sent back to the LLM when a non-tool response is detected (it could also be set as a lambda function to take other possible actions). As others have said, grammar-based constrained decoding is an alternative but only works for LLM-APIs that support.

For issue (2a) Langroid has a DSL[3] for specifying task termination conditions. It lets you specify patterns that trigger task termination, e.g.

- "T" to terminate immediately after a tool-call,

- "T[X]" to terminate after calling the specific tool X,

- "T,A" to terminate after a tool call, and agent handling (i.e. tool exec)

- "T,A,L" to terminate after tool call, agent handling, and LLM response to that

For (2b), in Langroid we rely on tool-calling again, i.e. the LLM must emit a specific DoneTool to signal completion. In general we find it useful to have orchestration tools for unambiguous control flow and message flow decisions by the LLM [4].

[1] Langroid https://github.com/langroid/langroid

[2] Handling non-tool LLM responses https://langroid.github.io/langroid/notes/handle-llm-no-tool...

[3] Task Termination in Langroid https://langroid.github.io/langroid/notes/task-termination/

[4] Orchestration Tools: https://langroid.github.io/langroid/reference/agent/tools/or...

m3kw9 - 2 hours ago

why the pointless exercise? Claude code itself can do all that.

erichocean - a day ago

The tip of the sphere in agentic code harnesses today is to RL train them as dedicated conductor/orchestrator models.

Not 200 lines of Python.

_andrei_ - a day ago

yes, it's an agent

dana321 - a day ago

[flagged]