The short leash AI coding method for beating Fable

blog.okturtles.org

56 points by Riseed 5 hours ago


sothatsit - 2 hours ago

This “short leash” seems like more of a crutch to me, and a sign of not giving the AI enough detail on the problem to begin with, or not reviewing and iterating on its output.

Hand-holding great models like Fable through implementation is a waste of time, and a waste of Fable. You can have increasingly nuanced discussions with stronger models, and they write a lot better code than they used to. The process of discussing designs and their implementations, questioning things that look weird to you, and actually reading the AI’s responses also helps to find better solutions.

For example, one time I wanted to write a greedy solver for a problem, and in my discussion with Opus on the idea it suggested using an existing MILP library to solve the problem exactly. I’d never even heard of MILP, but my final implementation ended up being better and simpler than what I’d have done alone.

ed_mercer - 2 hours ago

I feel like OP is still in the year 2025.

> The AI will have gone off the rails multiple times and you will only notice it later when you actually try to use the software.

Except that said AI can now themselves use your software and find and fix bugs themselves, not to mention drive new features.

>Your agent might go “off the rails” and start doing something you don’t want it to do

This happens but far less often than it used to, and the case for full autonomous agents is getting stronger, not weaker.

>It is humanly impossible to build your own understanding of a codebase

This again feels outdated. I think we're mving towards humans no longer needing to understand a codebase, and letting AI drive it.

jonplackett - 3 hours ago

I thought this was how everyone who can actually code uses AI for anything that’s actually important.

Am I wrong? Are you guys just YOLOing everything these days?

giancarlostoro - an hour ago

Here I thought this was about Fable the video game, then I remembered Anthropics model got named Fable. It's going to be painful to google one of my favorite game series, just like googling "Rust server" does not give you Rust programming results, but Rust the video game results. I wish google would have fixed this problem long ago, it seems like something trivial for them to fix.

afro88 - 2 hours ago

Maybe I'm too optimistic, but given appropriate skills and references (not just for writing but also reviewing) and intelligent use of subagents for isolated reviews and checks, you can lengthen the leash a bit.

But you still need to properly review plans and PRs to keep a good mental model of the codebase. This effectively limits the number of tasks being done in parallel to maybe 2-3. Though you'll be mentally exhausted and probably start to make mistakes or take shortcuts in reviews yourself.

fny - 2 hours ago

AI is a junior to mid-level engineer. If you treat it as such, you get the best of both vibe coding and rigorous engineering without all this paranoia.

Since the very beginning I've ran Claude from an isolated VM on yolo mode. This is just like giving an engineer their own laptop. Claude works on a feature up to a PR worthy point. I review the diff, just like I would with another engineer, and massage it to get it in the right shape and move on.

Inexperienced engineers make the same mistakes described I've even seen rm -rf albeit not from root! I would have lost my mind micromanaging someone with all permissions denied.

moezd - 3 hours ago

LLMs are still next token predictors, just because you can give it more vague instructions and it still finds the right steps to follow, it doesn't mean it's intelligent. It means you're speaking the same language as the harness they trained your model on.

And that has a limit. If you are stuck at PoC level or simple apps, you have no idea how limited the current models still are. There you really need to break tasks down, not just trust a token predictor to list steps that sound good. There has to be a human in the loop somewhere, because by the time you start skipping permissions, best case you get the jackpot, more likely is you get a suboptimal solution and token waste and what's genuinely still terrifying when the model ignores instructions and does some stupid nonsense, ruining your day. It really is as sharp as a CNC machine. It's not not useful, but could be dangerous, so maybe don't try to carve wood with a monster machine, or park your Ferrari in that crammed neighbourhood if you don't know how to parallel park.

sscaryterry - 5 hours ago

There really wasn't much substance to this article.

steezeburger - 2 hours ago

I find it hard to stay engaged doing this. I do get good results, but it's just hard to not get distracted when it's doing the work.

WhitneyLand - 2 hours ago

This post seems like some decent advice mixed in with a lot of overconfidence and unverifiable claims.

“expert developers whose skills have reached the point where they outclass any and all “frontier AI models” in their area of expertise”

Are any developers saying they outclass any and all frontier models? I’d say at best it’s mixed at this point. The best developers still do certain things better, but not even close to all things.

“The problem is that even code written and/or reviewed by Fable 5, will stink”

I’m skeptical. Example prompt and output please.

bonsai_spool - 3 hours ago

I'm curious whether Opus4.8 or similar can attain Mythos level through good system prompting and steering? You would expect this to work if it's true that the strength of Mythos is its unwillingness to quit before it gets a desired outcome

YuechenLi - an hour ago

I mean, the key is to stop trying to one-shot everything: The main problem I found with LLM code is more that they always try to take the shortest path to the solution possible, so a lot of time Codex would write code that meets the requirements of the prompt but misses something that cause it to not work in the non-ideal scenario.

The solution for that is pretty easy too, it's just iteration: you describe the exact problem you have with the code and why it is not running correctly and ask them to provide a narrow fix that addresses the bug. It's not that complicated.

hungryhobbit - 3 hours ago

I <3 how everyone and their brother feels qualified to write advice to hundreds? thousands? of other developers about AI ... based on a couple months of experience as a personal user.

I mean, it's like writing a book about how to use React or Django or some other major software ... after you used it for one project for a month!

Authors: I know this is the Internet, and I know bloggers blog about whatever pops into their head ... but if you are going to act like an authority, how about you learn more than the average reader before you start telling them authoritatively what to do?

kissgyorgy - 3 hours ago

This is probably slower than writing the code yourself. Doesn't make sense to me. Using an agent without YOLO mode is not wort it.

The way I rather do it is tightly control the output by skills written yourself, prompts, plans, etc. and have the closest possible outcome you would write yourself.

8note - 2 hours ago

... fable on the restart seems to be more like opus and very turn limited?

if you want to beat it, give it more turns before it has to "wrap up a session"

roshandxt - an hour ago

[flagged]

cws_ai_buddy - 3 hours ago

[flagged]

avereveard - 3 hours ago

Seems hella inefficient.

Better method start to realizing that everything that every program do is data transformations and or movement

Then you ask llm to subdivide data in a tree along the domain model, classifing streaming vs storing nodes

Then for each node you discuss with the ai for the best data structure

Then you ask for an interface that fully encapsulate the structure and every mutation only allows to go from a valid state to a valid state and bidding else is allowed to touch the state

And that's mostly it just connect all the interfaces until input goes to monitor or to storage or to api or wherever the destination is