My Agent Skill for Test-Driven Development

saturnci.com

131 points by laxmena a day ago


simonw - 5 hours ago

This article would benefit from a date. It looks like it's recent (Internet Archive first grabbed it on May 29th) but it's the kind of information that can quickly become stale as models and agents improve.

(I've been getting solid results recently from simply telling Claude Code and Codex "Test with uv run pytest, use red/green TDD".)

SubiculumCode - an hour ago

One issue that I've run into with codex has been excessive use of fallbacks routines. Perhaps this is good practice in.professional programming in many situations, but for mine (in this case): computing geodesic distances and analysis, a silent bad fallback means the processed data is not what I thought it was..e.g. used an inaccurate geodesic method in place of the accurate one.

fowlie - 3 hours ago

Haven't tried this, but I've recently become a big fan of Matt Pococks skills. Workflow: /grill-with-docs -> /to-prd -> /to-issue -> /tdd. That will interview relentlessy until there is a "shared understanding" using "ubiquitous language", then it will spec all requirements with user stories, create issues and implement them using tdd.

zuzululu - 5 hours ago

TDD sounds great on paper for agentic development but you quickly realize it balloons the token cost. Often I write some feature and then its repurposed or removed, code is refactored moved around as time goes. With TDD I would be taxed heavily and velocity slow to a crawl.

The waterfall approach is better after trying out TDD especially when you have a multi-agent setup. Also I found that in some cases the tests were just superficial hallucinations that never actually tested the components written or there some some context corruption and ultimately triggered a false positive that kicked off a completely unintentional refactoring.

dluxem - 5 hours ago

I believe using a skill here is the wrong approach. LLMs already know what TDD is and how to do it, just like object oriented programming.

If this is encoded in a skill, that skill essentially has to be loaded for everything thing your LLM is doing. This is probably one of the few areas where direct instructions via AGENTS.md is best, and I don't believe it requires much direction here to force the issue.

But I think the OP is just trying to have their agent work in a very specific way -- that is fine too.

> 5. Show me the test and ask for approval before continuing

jvuygbbkuurx - 5 hours ago

All of these post are missing actual comparisons on results. I read exactly opposite 'you should do x' everyday. If TDD actually was better it would simply be in the system prompts already.

realty_geek - 3 hours ago

As an aside, check out Jason's podcast (codewithjason.com) - its pretty good.

The latest one is with "Uncle Bob Martin" who has some interesting takes on coding with AI from .... can I say an oldie?

servercobra - 5 hours ago

This overall is pretty close to how I've set up my implementation skill. One thing I'm curious about is how well the analogies like "We don't make dinner in a dirty kitchen." work vs something a lot more straightforward. Any input OP?

__mharrison__ - 4 hours ago

Testing is so important for development.

Even more so when coding with agents. I think it is the probably the biggest lever to keep AI in guardrails.

(It's also why I wrote my latest book, Effective Testing, because I routinely find that my clients are very poor at treating.)

enraged_camel - 3 hours ago

Spawning separate agents to review the original agent's implementation results in a very noticeable increase in code quality and decrease in bugs. This is why I encode two or three rounds of sub-agent review during the planning process, where I tell the agent authoring the plan to include those review rounds at the end. If the code is particularly load-bearing, I then ask a fourth agent, usually from the other frontier lab.

All of this burns more tokens of course, but probably way less than coming back to the code later to fix bugs. It is also slower, but in the long run saves time.

nullc - 2 hours ago

If you don't follow up with a pass of injecting bugs and validating that the tests fail in the presence of bugs... then you've only confirmed that the tests can pass and they may be substantially useless.

tokenfaucet - 5 minutes ago

[flagged]

Koyukoyu - 2 hours ago

[dead]

keenseller709 - 3 hours ago

[flagged]

behnamoh - 6 hours ago

Snake oil. Just ask the model, all these custom agents/skills haven't proven that useful in practice.

steno132 - 5 hours ago

Test driven development is one of the worst ideas nowadays in the LLM age. We have models that can consistently write expert level, usually bug free code for you and rapidly fix even complex bugs in your codebase.

The token cost and tech debt introduced by tests is just not worth it. There's usually no bugs and if there are, you can fix them quickly if and when it's needed.