You should write an agent

fly.io

1051 points by tabletcorry 4 days ago


dave1010uk - 4 days ago

Two years ago I wrote an agent in 25 lines of PHP [0]. It was surprisingly effective, even back then before tool calling was a thing and you had to coax the LLM into returning structured output. I think it even worked with GPT-3.5 for trivial things.

In my mind LLMs are just UNIX strong manipulation tools like `sed` or `awk`: you give them an input and command and they give you an output. This is especially true if you use something like `llm` [1].

It then seems logical that you can compose calls to LLMs, loop and branch and combine them with other functions.

[0] https://github.com/dave1010/hubcap

[1] https://github.com/simonw/llm

ericd - 4 days ago

Absolutely, especially the part about just rolling your own alternative to Claude Code - build your own lightsaber. Having your coding agent improve itself is a pretty magical experience. And then you can trivially swap in whatever model you want (Cerebras is crazy fast, for example, which makes a big difference for these many-turn tool call conversations with big lumps of context, though gpt-oss 120b is obviously not as good as one of the frontier models). Add note-taking/memory, and ask it to remember key facts to that. Add voice transcription so that you can reply much faster (LLMs are amazing at taking in imperfect transcriptions and understanding what you meant). Each of these things takes on the order of a few minutes, and it's super fun.

losvedir - 3 days ago

I appreciate the goal of demystifying agents by writing one yourself, but for me the key part is still a little obscured by using OpenAI APIs in the examples. A lot of the magic has to do with tool calls, which the API helpfully wraps for you, with a format for defining tools and parsed responses helpfully telling you the tools it wants to call.

I kind of am missing the bridge between that, and the fundamental knowledge that everything is token based in and out.

Is it fair to say that the tool abstraction the library provides you is essentially some niceties around a prompt something like "Defined below are certain 'tools' you can use to gather data or perform actions. If you want to use one, please return the tool call you want and it's arguments, delimited before and after with '###', and stop. I will invoke the tool call and then reply with the output delimited by '==='".

Basically, telling the model how to use tools, earlier in the context window. I already don't totally understand how a model knows when to stop generating tokens, but presumably those instructions will get it to output the request for a tool call in a certain way and stop. Then the agent harness knows to look for those delimiters and extract out the tool call to execute, and then add to the context with the response so the LLM keeps going.

Is that basically it? Or is there more magic there? Are the tool call instructions in some sort of permanent context, or could the interaction demonstrated in a fine tuning step, and inferred by the model and just in its weights?

riskable - 4 days ago

It's interesting how much this makes you want to write Unix-style tools that do one thing and only one thing really well. Not just because it makes coding an agent simpler, but because it's much more secure!

hoppp - 4 days ago

I should? what problems can I solve, that can be only done with an agent? As long as every AI provider is operating at a loss starting a sustainably monetizable project doesn't feel that realistic.