Don't fall into the anti-AI hype
antirez.com797 points by todsacerdoti 20 hours ago
797 points by todsacerdoti 20 hours ago
I don't understand the stance that AI currently is able to automate away non-trivial coding tasks. I've tried this consistently since GPT 3.5 came out, with every single SOTA model up to GPT 5.1 Codex Max and Opus 4.5. Every single time, I get something that works, yes, but then when I start self-reviewing the code, preparing to submit it to coworkers, I end up rewriting about 70% of the thing. So many important details are subpar about the AI solution, and many times fundamental architectural issues cripple any attempt at prompting my way out of it, even though I've been quite involved step-by-step through the whole prototyping phase.
I just have to conclude 1 of 2 things:
1) I'm not good at prompting, even though I am one of the earliest AI in coding adopters I know, and have been consistent for years. So I find this hard to accept.
2) Other people are just less picky than I am, or they have a less thorough review culture that lets subpar code slide more often.
I'm not sure what else I can take from the situation. For context, I work on a 15 year old Java Spring + React (with some old pages still in Thymeleaf) web application. There are many sub-services, two separate databases,and this application needs to also 2-way interface with customer hardware. So, not a simple project, but still. I can't imagine it's way more complicated than most enterprise/legacy projects...
> non-trivial coding tasks
I’ve come back to the idea LLMs are super search engines. If you ask it a narrow, specific question, with one answer, you may well get the answer. For the “non-trivial” questions, there always will be multiple answers, and you’ll get from the LLM all of these depending on the precise words you use to prompt it. You won’t get the best answer, and in a complex scenario requiring highly recursive cross-checks— some answers you get won’t be functional.
It’s not readily apparent at first blush the LLM is doing this, giving all the answers. And, for a novice who doesn’t know the options, or an expert who can scan a list of options quickly and steer the LLM, it’s incredibly useful. But giving all the answers without strong guidance on non-trivial architectural points— entropy. LLMs churning independently quickly devolve into entropy.
>I’ve come back to the idea LLMs are super search engines.
Yes! This is exactly what it is. A search engine with a lossy-compressed dataset of most public human knowledge, which can return the results in natural language. This is the realization that will pop the AI bubble if the public could ever bring themselves to ponder it en masse. Is such a thing useful? Hell yes! Is such a thing intellegent? Certainly NO!
While I agree, I can't help but wonder: if such a "super search engine" were to have the knowledge on how to solve individual steps of problems, how different would that be from an "intelligent" thing? I mean that, instead of "searching" for the next line of code, it searches for the next solution or implementation detail, then using it as the query that eventually leads to code.
Having knowledge isn't the same as knowing. I can hold a stack of physics papers in my hand but that doesn't make me a physics professor.
LLMs possess and can retrieve knowledge but they don't understand it, and when people try to get them to do that it's like talking to a non-expert who has been coached to smalltalk with experts. I remember reading about a guy who did this with his wife so she could have fun when travelling to conferences with him!
> Non-trivial coding tasks
A coding agent just beat every human in the AtCoder Heuristic optimization contest. It also beat the solution that the production team for the contest put together. https://sakana.ai/ahc058/
It's not enterprise-grade software, but it's not a CRUD app with thousands of examples in github, either.
> Other people are just less picky than I am
I think this is part of it.
When coding style has been established among a team, or within an app, there are a lot of extra hoops to jump through, just to get it to look The Right Way, with no detectable benefit to the user.
If you put those choices aside and simply say: does it accomplish the goal per the spec (and is safe and scalable[0]), then you can get away with a lot more without the end user ever having a clue.
Sure, there's the argument for maintainability, and vibe coded monoliths tend to collapse in on themselves at ~30,000 LOC. But it used to be 2,000 LOC just a couple of years ago. Temporary problem.
[0]insisting that something be scalable isn't even necessary imo
> When coding style has been established
It feels like you're diminishing the parent commenter's views, reducing it to the perspective of style. Their comment didn't mention style.
Style = syntax, taste, architecture choices, etc. Things you would see on a 15-year-old Java app.
i.e. not a greenfield project.
Isn't coding style a solved problem with claude.md files or something?
You can control some simple things that way. But the subtle stylistic choices that many teams agree on are difficult to articulate clearly. Plus they don’t always do everything you tell them to in the prompts or rule files. Even when it’s extremely clear sometimes they just don’t. And often the thing you want isn’t clear.
Do you have any references for "vibe coded monoliths tend to collapse in on themselves at ~30,000 LOC"? I haven't personally vibed up anything with that many LOC, so I'm legitimately curious if we have solid numbers yet for when this starts to happen (and for which definitions of "collapse").
> When coding style has been established among a team, or within an app, there are a lot of extra hoops to jump through, just to get it to look The Right Way, with no detectable benefit to the user.
Morphing an already decent PR into a different coding style is actually something that LLMs should excel at.
What's that old adage? "Programs must be written for people to read, and only incidentally for machines to execute."[1]
Agreed, but:
There's been a notable jump over the course of the last few months, to where I'd say it's inevitable. For a while I was holding out for them to hit a ceiling where we'd look back and laugh at the idea they'd ever replace human coders. Now, it seems much more like a matter of time.
Ultimately I think over the next two years or so, Anthropic and OpenAI will evolve their product from "coding assistant" to "engineering team replacement", which will include standard tools and frameworks that they each specialize in (vendor lock in, perhaps), but also ways to plug in other tech as well. The idea being, they market directly to the product team, not to engineers who may have specific experience with one language, framework, database, or whatever.
I also think we'll see a revival of monolithic architectures. Right now, services are split up mainly because project/team workflows are also distributed so they can be done in parallel while minimizing conflicts. As AI makes dev cycles faster that will be far less useful, while having a single house for all your logic will be a huge benefit for AI analysis.
I actually think it’s the opposite. We’ll see fewer monorepos because small, scoped repos are the easiest way to keep an agent focused and reduce the blast radius of their changes. Monorepos exist to help teams of humans keep track of things.
This doesn't make any sense. If the business can get rid of their engineers, then why can't the user get rid of the business providing the software? Why can't the user use AI to write it themselves?
I think instead the value is in getting a computer to execute domain-specific knowledge organized in a way that makes sense for the business, and in the context of those private computing resources.
It's not about the ability to write code. There are already many businesses running low-code and no-code solutions, yet they still have software engineers writing integration code, debugging and making tweaks, in touch with vendor support, etc. This has been true for at least a decade!
That integration work and domain-specific knowledge is already distilled out at a lot of places, but it's still not trivial. It's actually the opposite. AI doesn't help when you've finally shaved the yak smooth.
If the business can get rid of their engineers, then why can't the user get rid of the business providing the software?
A lot of businesses are the only users of their own software. They write and use software in-house in order to accomplish business tasks. If they could get rid of their engineers, they would, since then they'd only have to pay the other employees who use the software.
They're much less likely to get rid of the user employees because those folks don't command engineer salaries.
There's no chance LLMs will be an engineering team replacement. The hallucination problem is unsolvable and catastrophic in some edge cases. Any company using such a team would be uninsurable and sued into oblivion.
Writing software is actually one of the domains where hallucinations are easiest to fix: you can easily check whether it builds and passes tests.
If you want to go further, you can even require the LLM to produce a machine checkable proof that the software is correct. That's beyond the state of the art at the moment, but it's far from 'unsolvable'.
If you hallucinate such a proof, it'll just not work. Feed back the error message from the proof checker to your coding assistant, and the hallucination goes away / isn't a problem.
> you can easily check whether it builds and passes tests.
This link were on HN recently: https://spectrum.ieee.org/ai-coding-degrades "...recently released LLMs, such as GPT-5, have a much more insidious method of failure. They often generate code that fails to perform as intended, but which on the surface seems to run successfully, avoiding syntax errors or obvious crashes. It does this by removing safety checks, or by creating fake output that matches the desired format, or through a variety of other techniques to avoid crashing during execution."
The trend for LLM generated code is to build and pass tests but do not deliver functionality needed.Also, please consider how SQLite is tested: https://sqlite.org/testing.html
The ratio between test code and code itself is mere 590 times (590 LOC of tests per LOC of actual code), it used to be more than 1100.
Here is notes on current release: https://sqlite.org/releaselog/3_51_2.html
Notice fixes there. Despite being one of the most, if not the most, tested pieces of software in the world, it still contains errors.
> If you want to go further, you can even require the LLM to produce a machine checkable proof that the software is correct.
Haha. How do you reconcile a proof with actual code?Tests and proofs can only detect issues that you design them to detect. LLMs and other people are remarkably effective at finding all sorts of new bugs you never even thought to test against. Proofs are particularly fragile as they tend to rely on pre/post conditions with clean deterministic processing, but the whole concept just breaks down in practice pretty quickly when you start expanding what's going on in between those, and then there's multithreading...
> Writing software is actually one of the domains where hallucinations are easiest to fix: you can easily check whether it builds and passes tests.
What tests? You can't trust the tests that the LLM writes, and if you can write detailed tests yourself you might as well write the damn software.
Ah, most the problem in programming is writing the tests. Once you know what you need the rest is just typing.
I can see an argument where you can get none programers to create the input and output of said tests but if the can do that, they are basically programmers.
This is of course leaving aside that half the stated use cases I hear for AI are that it can 'write the tests for you'. If it is writing the code and the tests it is pointless.
You focused on writing software, but the real problem is the spec used to produce the software, LLMs will happily hallucinate reasonable but unintended specs, and the checker won’t save you because after all the software created is correct w.r.t. spec.
Also tests and proof checkers only catch what they’re asked to check, if the LLM misunderstands intent but produces a consistent implementation+proof, everything “passes” and is still wrong.
This is why every one of my coding agent sessions starts with "... write a detailed spec in spec.md and wait for me to approve it". Then I review the spec, then I tell it "implement with red/green TDD".
Same, and similarly something like a "create a holistic design with all existing functionality you see in tests and docs plus new feature X, from scratch", then "compare that to the existing implementation and identify opportunities for improvement, ranked by impact" when the code starts getting too branchy. (aka "first make the change easy, then make the easy change"). Just prompting "clean this code up" rarely gets beyond dumb mechanical changes.
Given so much of the work of managing these systems has become so rote now, my only conclusion is that all that's left (before getting to 95+% engineer replacement) is an "agent engineering" problem, not an AI research problem.
In order to prove safety you need a formal model of the system and formally defined safety properties that are both meaningful and understandable by humans. These do not exist for enterprise systems
An exhaustive formal spec doesn't exist. But you can conservatively proof some properties. Eg program termination is far from sufficient for your program to do what you want, but it's probably necessary.
(Termination in the wider sense: for example an event loop has to be able to finish each run through the loop in finite time.)
You can see eg Rust's or Haskell's type system as another light-weight formal model that lets you make and proof some simple statements, without having a full formal spec of the whole desired behaviour of the system.
I am but a lowly IC, with no notion of the business side of things. If I am an IC at, say, a FANG company, what insurance has been taken out on me writing code there?
I think you should try harder to find their limits. Be as picky as you want, but don't just take over after it gave you something you didn't like. Try again with a prompt that talks about the parts you think were bad the first time. I don't mean iterate with it, I mean start over with a brand new prompt. Try to figure out if there is a prompt that would have given you the result you wanted from the start.
It won't be worth it the first few times you try this, and you may not get it to where you want it. I think you might be pickier than others and you might be giving it harder problems, but I also bet you could get better results out of the box after you do this with a few problems.
The biggest frustration with LLMs for me is people telling me I'm not promoting it in a good way. Just think about any product where they are selling a half baked product, and repeatedly telling the user you are not using it properly.
But that's not how most products work.
If you buy a table saw and can't figure out how to cut a straight line in a piece of wood with it - or keep cutting your fingers off - but didn't take any time at all to learn how to use it, that's on you.
Likewise a car, you have to take lessons and a test before you can use those!
Why should LLMs be any different?
It seems generally agreed that LLMs (currently) do better or worse with different programming languages at least, and maybe with other project logistical differences.
The fact that an LLM works great for one user on one project does not mean it will work equally great for another user on a different project. It might! It might work better. It might work worse.
And both users might be using the tool equally well, with equal skill, insofar as their part goes.
A table saw does not advertise to be a panacea which will make everyone obsolete.
You should ignore anyone who says that LLMs are a panacea that will make everyone obsolete.
If my mum buys a copy of Visual Studio, is it their fault if she cannot code?
its more like I buy Visual studio, it will crash at random time, and I get a response like you don't know how to use the ide.
It's not like that though.
It's like you buy Visual Studio and don't believe anyone who tells you that it's complex software with a lot of hidden features and settings that you need to explore in order to use it to its full potential.
Maybe if your coding style is already close to what an LLM like Claude outputs, you’ll never have these issues? At least it generally seems to be doing what I would do myself.
Most of the architectural failures come from it still not having the whole codebase in mind when changing stuff.
I actually think it's less about code style and more about the disjointed way end outcomes seem to be the culmination of a lot of prompt attempts over the course of a project/implementation.
The funny thing is reviewing stuff claude has made isn't actually unfamiliar to me in the slightest. It's something I'm intimately familiar with and have been intimately familiar with for many years, long before this AI stuff blew up...
..it's what code I've reviewed/maintained/rejected looks like when a consulting company was brought on board to build something. Such a company that leverages probably underpaid and overworked laborers both overseas and US based workers on visas. The delivered documentation/code is noisy+disjointed.
Why not post a github gist with prompt and code so that people here can give you their opinion?
On the subpar code, would the code work, albeit suboptimally?
I think part of the problem a lot of senior devs are having is that they see what they do as an artisanal craft. The rest of the world just sees the code as a means to an end.
I don't care how elegantly my toaster was crafted as long as it toasts the bread and doesn't break.
There is some truth to your point but you might want to consider that often seniors concerned with code quality aren't being pedantic about artisanal craft they are worried about the consequences of bad code...
- it becomes brittle and rigid (can't change it, can't add to it)
- it becomes buggy and impossible to fix one bug without creating another
- it becomes harder to tell what it's doing
- plus it can be inefficient / slow / insecure, etc.
The problem with your analogy is that toasters are quite simple. The better example would be your computer, and if you want your computer to just run your programs and not break, then these things matter.
Perhaps a better analogy is the smartphone or personal computer.
Think of all the awful cheapest android phones and Windows PCs and laptops that are slow, buggy, have not had a security update in however long and are thus insecure, become virtually unusable within a couple years. The majority of the people in the world live on such devices either because they don't know better or have no better option. The world continues to turn.
People are fine with imperfection in their products, we're all used to it in various aspects of our lives.
Code being buggy, brittle, hard to extend, inefficient, slow, insecure. None of those are actual deal breakers to the end user, or the owners of the companies, and that's all that really matters at the end of the day in determining whether or not the product will sell and continue to exist.
If we think of it in terms of evolution, the selection pressure of all the things you listed is actually very weak in determining whether or not the thing survives and proliferates.
Hattmall said it well with this:
> The usefulness is a function of how quickly the consequences from poor coding arrive and how meaningful they are to the organization.
I would just add that these hypothetical senior devs we are talking about are real people with careers, accountability and responsibilities. So when their company says "we want the software to do X" those engineers may be responsible for making it happen and accountable if it takes too long or goes wrong.
So rather than thinking of them as being irrationally fixated on the artisanal aspect (which can happen) maybe consider in most cases they are just doing their best to take responsibility for what they think the company wants now and in the future.
There’s for sure legitimacy to the concern over the quality of output of LLMs and the maintainability of that code, not to mention the long term impact on next generation of devs coming in and losing their grasp on the fundamentals.
At the same time, the direction of software by and large seems to me to be going in the direction of fast fashion. Fast, cheap, replaceable, questionable quality.
Not all software can tolerate this, as I mentioned in another comment, flight control software, the software controlling your nuclear power plant, but the majority of the software in the world is far more trivial and its consumers (and producers) more tolerant of flaws.
I don’t think of seniors as purely irrationally fixated on the artisanal aspect, I also think they are rationally, subconsciously or not, fearful of the implications for their career as the bottom falls out of this industry.
I could be wrong though! Maybe high quality software will continue to be what the industry strives for and high paying jobs to fix the flawed vibe coded slop will proliferate, but I’m more pessimistic than to think that.
The usefulness is a function of how quickly the consequences from poor coding arrive and how meaningful they are to the organization.
Like in finance if your AI trading bot makes a drastic mistake it's immediately realized and can be hugely consequential, so AI is less useful. Retail is somewhat in the middle, but for something like marketing or where the largest function is something with data or managerial the negatives aren't as quickly realized so there can be a lot of hype around AI and what it may be able to do.
Another poster commented how very useful AI was to the insurance industry, which makes total sense, because even then if something is terribly wrong it has only a minor chance of ever being an issue and it's very unlikely that it would have a consequence soon.
> I don't care how elegantly my toaster was crafted as long as it toasts the bread and doesn't break.
A consumer or junior engineer cares whether the toaster toasts the bread and doesn’t break.
Someone who cares about their craft also cares about:
- If I turn the toaster on and leave, can it burn my house down, or just set off the smoke alarm?
- Can it toast more than sliced uniform-thickness bread?
- What if I stick a fork in the toaster? What happens if I drop it in the bathtub while on? Have I made the risks of doing that clear in such a way that my company cannot be sued into oblivion when someone inevitably electrocutes themselves?
- Does it work sideways?
- When it fills up with crumbs after a few months of use, is it obvious (without knowing that this needs to be done or reading the manual) that this should be addressed, and how?
- When should the toaster be replaced? After a certain amount of time? When a certain misbehavior starts happening?
Those aren’t contrived questions in service to a tortured metaphor. They’re things that I would expect every company selling toasters to have dedicated extensive expertise to answering.
My contention is:
> A consumer
is all that ultimately matters.
All those things you’re talking about may or may not matter some day, after years and a class action lawsuit that may or may not materialize or have any material impact on the bottom line of the company producing the toaster, by which time millions of units of subpar toasters that don’t work sideways will have sold.
The world is filled with junk. The majority of what fills the world is junk. There are parts of our society where junk isn’t well tolerated (jet engines, mri machines) but the majority of the world tolerates quite a lot of sloppiness in design and execution and the companies producing those products are happily profitable.
Have you tried asking one of your peers who claims to get good results to run a test with you? Where you both try to create the same project, and share your results?
I and one or two others are _the_ AI use experts at my org, and I was by far the earliest adopter here. So I don't really have anyone else with significantly different experiences than me that I could ask.
At work, I have the same difficulty using AI as you. When working on deep Jiras that require a lot of domain knowledge, bespoke testing tools, but maybe just a few lines of actual code changes across a vast codebase, I have not been able to use it effectively.
For personal projects on the other hand, it has expedited me what? 10x, 30x? It's not measurable. My output has been so much more than what would have been possible earlier, that there is no benchmark because these level of projects would not have been getting completed in the first place.
Back to using at work: I think it's a skill issue. Both on my end and yours. We haven't found a way to encode our domain knowledge into AI and transcend into orchestrators of that AI.
I get the best results when using code to demonstrate my intention to an LLM, rather than try and explain it. It doesn't have to be working code.
I think that mentally estimating the problem space helps. These things are probabilistic models, and if there are a million solutions the chance of getting the right one is clearly unlikely.
Feeding back results from tests really helps too.
I think the answer will lie somewhere closer to social psychology and modern economics than to anything in software engineering.
I have been doing the same since GPT-3. I remember a time, probably around 4o when it started to get useful for some things like small React projects but was useless for other things like firestore rules. I think that surface is still jagged, it's just that it's less obviously useless in areas that it's weaker.
When things really broke open for me was when I adopted windsurf with Opus 4, and then again with Opus 4.5. I think the way the IDE manages the context and breaks down tasks helps extend llm usefulness a lot, but I haven't tried cursor and haven't really tried to get good at Claude code.
All that said, I have a lot of experience writing in business contexts and I think when I really try I am a pretty good communicator. I find when I am sloppy with prompts I leave a lot more to chance and more often I don't get what I want, but when I'm clear and precise I get what I want. E.g. if it's using sloppy patterns and making bad architectural choices, I've found that I can avoid that by explaining more about what I want and why I want it, or just being explicit about those decisions.
Also, I'm working on smaller projects with less legacy code.
So in summary, it might be a combination of 1, 2 and the age/complexity of the project you're working on.
If you follow antirez's post history, he was a skeptics until maybe a year ago. Why don't you look at his recent commits and judge for yourself. I suppose the majority of his most recent code is relevant for this discussion.
https://github.com/antirez?tab=overview&from=2026-01-01&to=2...
I don't think I'd be a good judge because I don't have the years of familiarity and expertise in his repos that I do at my job. A lot of the value of me specifically vs an LLM at my job is that I have the tribal knowledge and the LLM does not. We have gotten a lot better at documentation, but I don't think we can _ever_ truly eliminate that factor.
> Every single time, I get something that works, yes, but then when I start self-reviewing the code, preparing to submit it to coworkers, I end up rewriting about 70% of the thing.
Have another model review the code, and use that review as automatic feedback?
I've been using a `/feedback ...` command with claude code where I give it either positive or negative feedback about some action it just did, and it'll look through the session to make some educated guesses about why it did some thing - notably, checking for "there was guidance for this, but I didn't follow it", or "there was no guidance for this".
the outcome is usually a new or tweaked skill file.
it doesn't always fix the problem, but it's definitely been making some great improvements.
CodeRabbit in particular is gold here. I don't know what they do but it is far better at reviewing than any AI model I've seen. From the deep kinds of things it finds, I highly suspect they have a lot of agents routing code to extremely specialized subagents that can find subtle concurrency bugs, misuse of some deep APIs etc. I often have to do the architecture l/bug picture/how this fits into project vision review myself, but for finding actual bugs in code, or things that would be self evident from reading one file, it is extremely good.
That is actually a gold tip. Codex CLI is way less pleasant to use than Opus, but way better at finding bugs, so I combine them.
Codex is a sufficiently good reviewer I now let it review my hand-coded work too. It's a really, really good reviewer. I think I make this point often enough now that I suspect OpenAI should be paying me. Claude and Gemini will happily sign off work that just doesn't work, OpenAI is a beast at code-review.
Instead of rewriting yourself have you tried telling the agent what it did wrong and do the rewrite with it? Then at the end of the session ask it to extract a set of rules that would have helped to get it right the first time. Save that in AGENTS.md. If you and your team do this a few times it can lead to only having to rewrite 5% of the code instead of 70%.
> Instead of rewriting yourself have you tried telling the agent what it did wrong and do the rewrite with it?
I have, it becomes a race to the bottom.
AI is a house painter, wall to wall, with missed spots and drips. Good coders are artists. That said, artists have been known to use assistants on backgrounds. Perhaps the end case is a similar coder/AI collaborative effort?
I'm not sure if I got in this weird LLM bubble where they give me bad advice to drive engagement, because I can't resist trying to correct them and tell them how absurdly wrong they are.
But it is astounding how terrible they are at debugging non-trivial assembly in my experience.
Anyone else have input here?
Am I in a weird bubble? Or is this just not their forte?
It's truly incredible how thoughtless they can be, so I think I'm in a bubble.
I think you're spot on.
So many people hyping AI are only thinking about new projects and don't even distinguish between what is a product and what is a service.
Most software devs employed today work on maintaining services that have a ton of deliberate decisions baked in that were decided outside of that codebase and driven by business needs.
They are not building shiny new products. That's why most of the positive hype about AI doesn't make sense when you're actually at work and not just playing around with personal projects or startup POCs.
I think it comes down to what you mean by sub par code. If you're talking a mess of bubblesorts and other algorithmic problems, that's probably a prompting issue. If you're talking "I just don't like the style of the code, it looks inelegant" that's not really a prompting issue, models will veer towards common patterns in a way that's hard to avoid with prompts.
Think about it like compiler output. Literally nobody cares if that is well formatted. They just care that they can get fairly performant code without having to write assembly. People still dip to assembly (very very infrequently now) for really fine performance optimizations, but people used to write large programs in it (miserably).
There's a huge amount you're missing by boiling down their complaint to "bubble sorts or inelegant code". The architecture of the new code, how it fits into the existing system, whether it makes use of existing utility code (IMO this is a huge downside; LLMs seem to love to rewrite a little helper function 100x over), etc.
These are all important when you consider the long-term viability of a change. If you're working in a greenfield project where requirements are constantly changing and you plan on throwing this away in 3 months, maybe it works out fine. But not everyone is doing that, and I'd estimate most professional SWEs are not doing that, even!
It's a combination of being bad at prompting and different expectations from the tool. You expect it to be one shot, and then rewrite things that don't match up to what you want.
Instead I recommend that you use LLMs to fix the problems that they introduced as well, and over time you'll get better at figuring out the parts that the LLM will get confused by. My hunch is that you'll find your descriptions of what to implement were more vague than you thought, and as you iterate, you'll learn to be a lot more specific. Basically, you'll find that your taste was more subjective than you thought and you'll rid yourself of the expectation that the LLM magically understands your taste.
> But what was the fire inside you, when you coded till night to see your project working? It was building.
I feel like this is not the same for everyone. For some people, the "fire" is literally about "I control a computer", for others "I'm solving a problem for others", and yet for others "I made something that made others smile/cry/feel emotions" and so on.
I think there is a section of programmer who actually do like the actual typing of letters, numbers and special characters into a computer, and for them, I understand LLMs remove the fun part. For me, I initially got into programming because I wanted to ruin other people's websites, then I figured out I needed to know how to build websites first, then I found it more fun to create and share what I've done with others, and they tell me what they think of it. That's my "fire". But I've met so many people who doesn't care an iota about sharing what they built with others, it matters nothing to them.
I guess the conclusion is, not all programmers program for the same reason, for some of us, LLMs helps a lot, and makes things even more fun. For others, LLMs remove the core part of what makes programming fun for them. Hence we get this constant back and forth of "Can't believe others can work like this!" vs "I can't believe others aren't working like this!", but both sides seems to completely miss the other side.
You’re right of course. For me there’s no flow state possible with LLM “coding”. That makes it feel miserable instead of joyous. Sitting around waiting while it spits out tokens that I then have to carefully look over and tweak feels like very hard work. Compared to entering flow and churning out those tokens myself, which feels effortless once I get going.
Probably other people feel differently.
I'm the same way. LLMs are still somewhat useful as a way to start a greenfield project, or as a very hyper-custom google search to have it explain something to me exactly how I'd like it explained, or generate examples hyper-tuned for the problem at hand, but that's hardly as transformative or revolutionary as everyone is making Claude Code out to be. I loathe the tone these things take with me and hate how much extra bullshit I didn't ask for they always add to the output.
When I do have it one-shot a complete problem, I never copy paste from it. I type it all out myself. I didn't pay hundreds of dollars for a mechanical keyboard, tuned to make every keypress a joy, to push code around with a fucking mouse.
I’m a “LLM believer” in a sense, and not someone who derives joy from actually typing out the tokens in my code, but I also agree with you about the hype surrounding Claude Code and “agentic” systems in general. I have found the three positive use cases you mentioned to be transformative to my workflow on its own. I’m grateful that they exist even if they never get better than they are today.
> I didn't pay hundreds of dollars for a mechanical keyboard, tuned to make every keypress a joy, to push code around with a fucking mouse
Can’t you use vim controls?
> and hate how much extra bullshit I didn't ask for they always add to the output.
I can recommend for that problem to make the "jumps" smaller, e.g. "Add a react component for the profile section, just put a placeholder for now" instead of "add a user profile".
With coding LLMs there's a bit of a hidden "zoom" functionality by doing that, which can help calibrating the speed/involvment/thinking you and the LLM does.
Three things I can suggest to try, having struggled with something similiar:
1. Look at it as a completely different discipline, dont consider it leverage for coding - it's it's own thing.
2. Try using it on something you just want to exist, not something you want to build or are interested in understanding.
3. Make the "jumps" smaller. Don't oneshot the project. Do the thinking yourself, and treat it as a junior programmer: "Let's now add react components for the profile section and mount them. Dont wire them up yet" instead of "Build the profile section". This also helps finding the right speed so that you can keep up with what's happening in the codebase
> Try using it on something you just want to exist, not something you want to build or are interested in understanding.
I don't get any enjoyment from "building something without understanding" — what would I learn from such a thing? How could I trust it to be secure or to not fall over when i enter a weird character? How can I trust something I do not understand or have not read the foundations of? Furthermore, why would I consider myself to have built it?
When I enter a building, I know that an engineer with a degree, or even a team of them, have meticulously built this building taking into account the material stresses of the ground, the fault lines, the stresses of the materials of construction, the wear amounts, etc.
When I make a program, I do the same thing. Either I make something for understanding, OR I make something robust to be used. I want to trust the software I'm using to not contain weird bugs that are difficult to find, as best as I can ensure that. I want to ensure that the code is clean, because code is communication, and communication is an art form — so my code should be clean, readable, and communicative about the concepts that I use to build the thing. LLMs do not assure me of any of this, and the actively hamstring the communication aspect.
Finally, as someone surrounded by artists, who has made art herself, the "doing of it" has been drilled into me as the "making". I don't get the enjoyment of making something, because I wouldn't have made it! You can commission a painting from an artist, but it is hubris to point at a painting you bought or commissioned and go "I made that". But somehow it is acceptable to do this for LLMs. That is a baffling mindset to me!
> You can commission a painting from an artist, but it is hubris to point at a painting you bought or commissioned and go "I made that". But somehow it is acceptable to do this for LLMs. That is a baffling mindset to me!
The majority of the work on a lot of famous masterpieces of art was done by apprentices. Under the instruction of a master, but still. No different than someone coming up with a composition, and having AI do a first pass, then going in with photoshop and manually painting over the inadequate parts. Yet people will knob gobble renaissance artists and talk about lynching AI artists.
I've heard this analogy regurgitated multiple times now, and I wish people would not.
It's true that many master artists had workshops with apprenticeships. Because they were a trade.
By the time you were helping to paint portraits, you'd spent maybe a decade learning techniques and skill and doing the unimportant parts and working your way up from there.
It wasn't a half-assed, slop some paint around and let the master come fix it later. The people doing things like portrait work or copies of works were highly skilled and experienced.
Typing "an army of Garfields storming the beach at Normandy" into a website is not the same.
Lately I've been interesting in biosignals, biofeedback and biosynchronization.
I've been really frustrated with the state of Heart Rate Variability (HRV) research and HRV apps, particularly those that claim to be "biofeedback" but are really just guided breathing exercises by people who seem to have the lights on and nobody home. [1]
I could have spent a lot of time reading the docs to understand the Web Bluetooth API and facing up to the stress that getting anything with Bluetooth working with a PC is super hit and miss so estimating the time I'd expect a high risk of spending hours rebooting my computer and otherwise futzing around to debug connection problems.
Although it's supposedly really easy to do this with the Web Bluetooth API I amazingly couldn't find any examples which made all the more apprehensive that there was some reason it doesn't work. [2]
As it was Junie coded me a simple webapp that pulled R-R intervals from my Polar H10 heart rate monitor in 20 minutes and it worked the first time. And in a few days, I've already got an HRV demo app that is superior to the commercial ones in numerous ways... And I understand how it works 100%.
I wouldn't call it vibe coding because I had my feet on the ground the whole time.
[1] for instance I am used to doing meditation practices with my eyes closed and not holding a 'freakin phone in my hand. why they expect me to look at a phone to pace my breathing when it could talk to be or beep at me is beyond me. for that matter why they try to estimate respiration by looking at my face when they could get if off the accelerometer if i put in on my chest when i am lying down is also beyond me.
[2] let's see, people don't think anything is meaningful if it doesn't involve an app, nobody's gotten a grant to do biofeedback research since 1979 so the last grad student to take a class on the subject is retiring right about now...
>I don't get any enjoyment from "building something without understanding" — what would I learn from such a thing? How could I trust it to be secure or to not fall over when i enter a weird character? How can I trust something I do not understand or have not read the foundations of? Furthermore, why would I consider myself to have built it?
All of these questions are irrelevant if the objective is 'get this thing working'.
You seem to read a lot into what I wrote, so let me phrase it differently.
These are ways I'd suggest to approach working with LLMs if you enjoy building software, and are trying to find out how it can fit into your workflow.
If this isnt you, these suggestions probably wont work.
> I don't get any enjoyment from "building something without understanding".
That's not what I said. It's about your primary goal. Are you trying to learn technology xyz, and found a project so you can apply it vs you want a solution to your problem, and nothing exists, so you're building it.
What's really important is that wether you understand in the end what the LLM has written or not is 100% your decision.
You can be fully hands off, or you can be involved in every step.
I build a lot of custom tools, things with like a couple of users. I get a lot of personal satisfaction writing that code.
I think comments on YouTube like "anyone still here in $CURRENT_YEAR" are low effort noise, I don't care about learning how to write a web extension (web work is my day job) so I got Claude to write one for me. I don't care who wrote it, I just wanted it to exist.
>When I enter a building, I know that an engineer with a degree, or even a team of them, have meticulously built this building taking into account the material stresses of the ground, the fault lines, the stresses of the materials of construction, the wear amounts, etc.
You can bet that "AI" is coming for this too. The lawsuits that will result when buildings crumble and kill people because an LLM "hallucinated" will be tragic, but maybe we'll learn from it. But we probably won't.
I think the key thing here is in point 2.
I’ve wanted a good markdown editor with automatic synchronization. I used to used inkdrop. Which I stopped using when the developer/owner raised the price to $120/year.
In a couple hours with Claude code, I built a replacement that does everything I want, exactly the way I want. Plus, it integrates native AI chat to create/manage/refine notes and ideas, and it plugs into a knowledge RAG system that I also built using Claude code.
What more could I ask for? This is a tool I wanted for a long time but never wanted to spend the dozens of hours dealing with the various pieces of tech I simply don’t care about long-term.
This was my AI “enlightenment” moment.
Really interesting. How do you find the quality of the code and the final result to be? Do you maybe have this public, would love to check it out!
The incredible thing (to me) is that this isn’t even remotely a new thing: it’s reviewing pull requests vs writing your own code. We all know how different that feels!
Correct, provided you were the one who wrote an incredibly specific feature request that the pull request solved for you.
I could imagine a world where LLM coding was fun. It would sound like "imagine a game, like Galaxians but using tractor trailers, and as a first person shooter." And it pumps out a draft and you say, "No, let's try it again with an army of bagpipers."
In other words, getting to be the "ideas guy", but without sounding like a dipstick who can't do anything.
I don't think we're anywhere near that point yet. Instead we're at the same point where we are with self-driving: not doing anything but on constant alert.
Prompt one:
imagine a game, like Galaxians but using tractor trailers,
and as a first person shooter. Three.js in index.html
Result: https://gisthost.github.io/?771686585ef1c7299451d673543fbd5dPrompt two:
No, let's try it again with an army of bagpipers.
Result: https://gisthost.github.io/?60e18b32de6474fe192171bdef3e1d91I'll be honest, the bagpiper 3D models were way better than I expected! That game's a bit too hard though, you have to run sideways pretty quickly to avoid being destroyed by incoming fire.
Here's the full transcript: https://gisthost.github.io/?73536b35206a1927f1df95b44f315d4c
There's a reason why bagpipes are banned under the Geneva convention!
> There's a reason why bagpipes are banned under the Geneva convention!
I know this is not Reddit, but when I see such a comment, I can't resist posting a video of "the internet's favorite song" on an electrical violin and bagpipes:
> Through the Fire and Flames (Official Video) - Mia x Ally
Can it make it work on mobile?
Yes, but I didn't bother here (not part of the original prompt).
You're welcome to drop the HTML into a coding agent and tell it to do that. In my experience you usually have to decide how you want that to work - I've had them build me on-screen D-Pad controls before but I've also tried things like getting touch-to-swipe plus an on-screen fire button.
For me the excitement is palpable when I've asked it to write a feature, then I go test it and it entirely works as expected. It's so cool.
There are multiple self driving car companies that are fully autonomous and operating in several cities in the US and China. Waymo has been operating for many years.
There are full self driving systems that have been in operation with human driver oversight from multiple companies.
And the capabilities of the LLMs in regards to your specific examples were demonstrated below.
The inability of the public to perceive or accept the actual state of technology due to bias or cognitive issues is holding back society.
I have both; for embedded and backend I prefer entering code; once in the flow, I produce results faster and feel more confident everything is correct. for frontend (except games), i find everything annoying and a waste of time manually, as do all my colleagues. LLMs really made this excellent for our team and myself. I like doing UX, but I like drawing it with a pen and paper and then do experiments with controls/components until it works. This is now all super fast (I usually can just take photo of my drawings and claude makes it work) and we get excellent end results that clients love.
> For me there’s no flow state possible with LLM “coding”.
I would argue that it's the same question as whether it's possible to get into a flow state when being the "navigator" in a pair-programming session. I feel you and agree that it's not quite the same flow state as typing the code yourself, but when a session with a human programmer or Claude Code is going well for me, I am definitely in something quite close to flow myself, and I can spend hours in the back and forth. But as others in this thread said, it's about the size of the tasks you give it.
I can say I feel that flow state sometimes when it all works but I certainly don't when it doesn't work.
The other day I was making changes to some CSS that I partially understood.
Without an LLM I would looked at the 50+ CSS spec documents and the 97% wrong answers on Stack Overflow and all the splogs and would have bumbled around and tried a lot of things and gotten it to work in the end and not really understood why and experienced a lot of stress.
As it was I had a conversation with Junie about "I observe ... why does it work this way?", "Should I do A or do B?", "What if I did C?" and came to understand the situation 100% and wrote a few lines of code by hand that did the right thing. After that I could have switched it to Code mode and said "Make it so!" but it was easy when I understood it. And the experience was not stressful at all.
You're not alone. I definitely feel like this is / will be a major adaptation required for software engineers going forward. I don't have any solutions to offer you - but I will say that the state that's enabled by fast feedback loops wasn't always the case. For most of my career build times were much, much longer than they are today, as an example. We had to work around that to maintain flow, and we'll have to work around this, now.
I feel the same way often but I find it to be very similar to coding. Whether coding or prompting when I’m doing rote, boring work I find it tedious. When I am solving a hard problem or designing something interesting I am engaged.
My app is fairly mature with well established patterns, etc. When I’m adding “just CRUD” as part of a feature it’s very tedious to prompt agents, reviewing code, rinse & repeat. Were I actually writing the code by hand I would probably be less productive and just as bored/unsatisfied.
I spent a decent amount of time today designing a very robust bulk upload API (compliance fintech, lots of considerations to be had) for customers who can’t do a batch job. When it was finished I was very pleased with the result and had performance tests and everything.
Yes this is exactly what I feel. I disconnect enough that if it’s really taking its time I will pull up Reddit and now that single prompt cost me half an hour.
This.
To me, using an LLMs is more like having a team of ghostwriters writing your novel. Sure, you "built" your novel but it feels entirely different to writing it yourself.
And if you write novels mostly because you enjoy watching them sell, as opposed to sharing ideas with people, you don't care.
To scientists, the purpose of science is to learn more about the world; to certain others, it's about making a number of dollars go up. Mathematicians famously enjoy creating math, and would have no use for a "create more math" button. Musicians enjoy creating music, which is very different from listening to it.
We're all drawn to different vocations, and it's perverse to accept that "maximize shareholder value" is the highest.
I feel differently! My background isn't programming, so I frequently feel inhibited by coding. I've used it for over a decade but always as a secondary tool. Its fun for me to have a line of reasoning, and be able to toy with and analyze a series of questions faster than I used to be able to.
Ditto. Coding isn't what i specifically do, but it's something i will choose to do when it's the most efficient solution to a problem. I have no problem describing what i need a program to do and how it should do so in a way that could be understandable even to a small child or clever golden retriever, but i'm not so great at the part where you pull out that syntactic sugar and get to turning people words into computer words. LLMs tend to do a pretty good job at translating languages regardless of whether i'm talking to a person or using a code editor, but i don't want them deciding what i wanted to say for me.
> no flow state possible with LLM “coding”
I've hit flow state setting it up to fly. When it flys is when the human gets out of the loop so the AI can look at the thing itself and figure out why centering the div isn't working to center the div, or why the kernel isn't booting. Like, getting to a point, pre-AI, where git bisect running in a loop is the flow state. Now, with ai, setting that up is the flow.
Well are you the super developer than never run into issues, challenges? For me and I think most developers, coding is like a continuous stream of problems you need to solve. For me a LLM is very useful, because I can now develop much faster. Don't have to think which sorting algoritm should be used or which trigonometric function I need for a specific case. My LLM buddy solves most of those issues.
When you don't know the answer to a question you ask an LLM, do you verify it or do you trust it?
Like, if it tells you merge sort is better on that particular problem, do you trust it or do you go through an analysis to confirm it really is?
I have a hard time trusting what I don't understand. And even more so if I realize later I've been fooled. Note that it's the same with human though. I think I only trust technical decision I don't understand when I deem the risk of being wrong low enough. Overwise I'll invest in learning and understanding enough to trust the answer.
For all these "open questions" you might have it is better to ask the LLM write a benchmark and actually see the numbers. Why rush, spend 10 minutes, you will have a decision backed by some real feedback from code execution.
But this is just a small part from a much grander testing activity that needs to wrap the LLM code. I think my main job moved to 1. architecting and 2. ensuring the tests are well done.
What you don't test is not reliable yet, looking at code is not testing, it's "vibe-testing" and should be an antipattern, no LGTM for AI code. We should rely on our intuition alone because it is not strict enough, and it makes everything slow - we should not "walk the motorcycle".
Ok. I also have the intuition that more tests and formal specifications can help there.
So far, my biggest issue is, when the code produced is incorrect, with a subtle bug, then I just feel I have wasted time to prompt for something I should have written because now I have to understand it deeply to debug it.
If the test infrastructure is sound, then maybe there is a gain after all even if the code is wrong.
Often those kind of performance things just don't matter.
Like right now I am working on algorithms for computing heart rate variability and only looking at a 2 minute window with maybe 300 data points at most so whether it is N or N log N or N^2 is beside the point.
When I know I computing the right thing for my application and know I've coded it up correctly and I am feeling some pain about performance that's another story.
> I have a hard time trusting what I don't understand
Who doesn't? But we have to trust them anyway, otherwise everyone should get a PhD on everything.
Also for people who "has a hard time trusting", they might just give up when encountering things they don't understand. With AI at least there is a path for them to keep digging deeper and actually verify things to whatever level of satisfaction they want.
I tell it to write a benchmark, and I learn from how it does that.
IME I don't learn by reading or watching, only by wrestling with a problem. ATM, I will only do it if the problem does not feel worth learning about (like jenkinsfile, gradle scripting).
But yes, the bench result will tell something true.
I like writing. I hate editing.
Coding with an LLM seems like it’s often more editing in service of less writing.
I get this is a very simplistic way of looking at it and when done right it can produce solutions, even novel solutions, that maybe you wouldn’t have on your own. Or maybe it speeds up a part of the writing that is otherwise slow and painful. But I don’t know, as somebody who doesn’t really code every time I hear people talk about it that’s what it sounds like to me.
> I think there is a section of programmer who actually do like the actual typing of letters, numbers and special characters into a computer...
Reminds me of this excerpt from Richard Hamming's book:
> Finally, a more complete, and more useful, Symbolic Assembly Program (SAP) was devised—after more years than you are apt to believe during which most programmers continued their heroic absolute binary programming. At the time SAP first appeared I would guess about 1% of the older programmers were interested in it—using SAP was “sissy stuff”, and a real programmer would not stoop to wasting machine capacity to do the assembly. Yes! Programmers wanted no part of it, though when pressed they had to admit their old methods used more machine time in locating and fixing up errors than the SAP program ever used. One of the main complaints was when using a symbolic system you do not know where anything was in storage—though in the early days we supplied a mapping of symbolic to actual storage, and believe it or not they later lovingly pored over such sheets rather than realize they did not need to know that information if they stuck to operating within the system—no! When correcting errors they preferred to do it in absolute binary addresses.
I think this is beside the point, because the crucial change with LLMs is that you don’t use a formal language anymore to specify what you want, and get a deterministic output from that. You can’t reason with precision anymore about how what you specify maps to the result. That is the modal shift that removes the “fun” for a substantial portion of the developer workforce.
> because the crucial change with LLMs is that you don’t use a formal language anymore to specify what you want, and get a deterministic output from that
You don't just code, you also test, and your safety is just as good as your test coverage and depth. Think hard about how to express your code to make it more testable. That is the single way we have now to get back some safety.
But I argue the manual inspection of code and thinking it through in your head is still not strict coding, it is vibe-testing as well, only code backed by tests is not vibe-based. If needed use TLA+ (generated by LLM) to test, or go as deep as necessary to test.
That's not it for me, personally.
I do all of my programming on paper, so keystrokes and formal languages are the fast part. LLMs are just too slow.
I'd be interested in learning more about your workflow. I've certainly used plaintext files (and similar such things) to aid in project planning, but I've never used paper beyond taking a few notes here and there.
Not who you’re replying to, but I do this as well. I carry a pocket notebook and write paragraphs describing what I want to write. Sometimes I list out the fields of a data structure. Then I revise. By the time I actually write the code, it’s more like a recitation. This is so much easier than trying to think hard about program structure while logged in to my work computer with all the messaging and email.
Yes this is my technique as well.
Others have different prerogatives, but I personally do not want to work more than I need to.
its not not about fun. when I'm going through the actual process of writing a function, I think about design issues. about how things are named, about how the errors from this function flow up. about how scheduling is happening. about how memory is managed. I compare the code to my ideal, and this is the time where I realize that my ideal is flawed or incomplete.
I think alot of us dont get everything specced out up front, we see how things fit, and adjust accordingly. most of the really good ideas I've had were not formulated in the abstract, but realizations had in the process of spelling things out.
I have a process, and it works for me. Different people certainly have other ones, and other goals. But maybe stop telling me that instead of interacting with the compiler directly its absolutely necessary that instead I describe what I want to a well meaning idiot, and patiently correct them, even though they are going to forget everything I just said in a moment.
> ... stop telling me that instead of interacting with the compiler directly its absolutely necessary that instead I describe what I want to a well meaning idiot, and patiently correct them, even though they are going to forget everything I just said in a moment.
This perfectly describes the main problem I have with the coding agents. We are told we should move from explicit control and writing instructions for the machine to pulling the slot lever over and over and "persuading the machine" hoping for the right result.
I don't know what book you're talking about, but it seems that you intend to compare the switch to an AI-based workflow to using a higher-level language. I don't think that's valid at all. Nobody using Python for any ordinary purpose feels compelled to examine the resulting bytecode, for example, but a responsible programmer needs to keep tabs on what Claude comes up with, configure a dev environment that organizes the changes into a separate branch (as if Claude were a separate human member of a team) etc. Communication in natural language is fundamentally different from writing code; if it weren't, we'd be in a world with far more abundant documentation. (After all, that should be easier to write than a prompt, since you already have seen the system that the text will describe.)
> Nobody using Python for any ordinary purpose feels compelled to examine the resulting bytecode, for example,
The first people using higher level languages did feel compelled to. That's what the quote from the book is saying. The first HLL users felt compelled to check the output just like the first LLM users.
Yes, and now they don't.
But there is no reason to suppose that responsible SWEs would ever be able to stop doing so for an LLM, given the reliance on nondeterminism and a fundamentally imprecise communication mechanism.
That's the point. It's not the same kind of shift at all.
Hamming was talking about assembler, not a high level language.
Assembly was a "high level" language when it was new -- it was far more abstract than entering in raw bytes. C was considered high level later on too, even though these days it is seen as "low level" -- everything is relative to what else is out there.
The same pattern held through the early days of "high level" languages that were compiled to assembly, and then the early days of higher level languages that were interpreted.
I think it's a very apt comparison.
If the same pattern held, then it ought to be easy to find quotes to prove it. Other than the one above from Hamming, we've been shown none.
Read the famous "Story of Mel" [1] about Mel Kaye, who refused to use optimizing assemblers in the late 1950s because "you never know where they are going to put things". Even in the 1980s you used to find people like that.
The Story of Mel counts against the narrative because Mel was so overwhelmingly skilled that he was easily able to outdo the optimizing compiler.
> you intend to compare the switch to an AI-based workflow to using a higher-level language.
That was the comparison made. AI is an eerily similar shift.
> I don't think that's valid at all.
I dont think you made the case by cherry picking what it can't do. This is exactly the same situation, as the time SAP appeared. There weren't symbols for every situation binary programmers were using at the time. This doesn't change the obvious and practical improvement that abstractions provided. Granted, I'm not happy about it, but I can't deny it either.
Contra your other replies, I think this is exactly the point.
I had an inkling that the feeling existed back then, but I had no idea it was documented so explicitly. Is this quote from The Art of Doing Science and Engineering?
In my feed 'AI hype' outnumbers 'anti-AI hype' 5-1. And anti-hype moderates like antirez and simonw are rare. To be a radical in ai is to believe that ai tools offer a modest but growing net positive utility to a modest but growing subset of hackers and professionals
Well put.
AI obviously brings big benefits into the profession. We just have not seen exactly what they are just yet. How it will unfold.
But personally I feel that a future of not having to churn out yet another crud app is attractive.
In theory “not having to churn out yet another crud app” doesn’t require AI, any ol code generator will do. AI is a really expensive way (in terms of gpus/tpus) to generate boilerplate, but as long as that cost is massively subsidized by investors, you may as well use it.
The problem I see is not so much in how you generate the code. It is about how to maintain the code. If you check in the AI generated code unchanged then do you start changing that code by hand later? Do you trust that in the future AI can fix bugs in your code. Or do you clean up the AI generated code first?
LLMs remove the familiarity of “I wrote this and deeply understand this”. In other words, everything is “legacy code” now ;-)
For those who are less experienced with the constant surprises that legacy code bases can provide, LLMs are deeply unsettling.
This is the key point for me in all this.
I've never worked in web development, where it seems to me the majority of LLM coding assistants are deployed.
I work on safety critical and life sustaining software and hardware. That's the perspective I have on the world. One question that comes up is "why does it take so long to design and build these systems?" For me, the answer is: that's how long it takes humans to reach a sufficient level of understanding of what they're doing. That's when we ship: when we can provide objective evidence that the systems we've built are safe and effective. These systems we build, which are complex, have to interact with the real world, which is messy and far more complicated.
Writing more code means that's more complexity for humans (note the plurality) to understand. Hiring more people means that's more people who need to understand how the systems work. Want to pull in the schedule? That means humans have to understand in less time. Want to use Agile or this coding tool or that editor or this framework? Fine, these tools might make certain tasks a little easier, but none of that is going to remove the requirement that humans need to understand complex systems before they will work in the real world.
So then we come to LLMs. It's another episode of "finally, we can get these pesky engineers and their time wasting out of the loop". Maybe one day. But we are far from that today. What matters today is still how well do human engineers understand what they're doing. Are you using LLMs to help engineers better understand what they are building? Good. If that's the case you'll probably build more robust systems, and you _might_ even ship faster.
Are you trying to use LLMs to fool yourself into thinking this still isn't the game of humans needing to understand what's going on? "Let's offload some of the understanding of how these systems work onto the AI so we can save time and money". Then I think we're in trouble.
" They make it easier to explore ideas, to set things up, to translate intent into code across many specialized languages. But the real capability—our ability to respond to change—comes not from how fast we can produce code, but from how deeply we understand the system we are shaping. Tools keep getting smarter. The nature of learning loop stays the same."
Learning happens when your ideas break, when code fails, unexpected things happen. And in order to have that in a coding agent you need to provide a sensitive skin, which is made of tests, they provide pain feedback to the agent. Inside a good test harness the agent can't break things, it moves in a safe space with greater efficiency than before. So it was the environment providing us with understanding all alone, and we should make an environment where AI can understand what are the effects of its actions.
I don't think "understanding" should be the criteria, you can't commit your eyes in the PR. What you can commit is a test that enforces that understanding programatically. And we can do many many more tests now than before. You just need to ensure testing is deep and well designed.
> Are you trying to use LLMs to fool yourself into thinking this still isn't the game of humans needing to understand what's going on?
This is a key question. If you look at all the anti-AI stuff around software engineering, the pervading sentiment is “this will never be a senior engineer”. Setting aside the possibility of future models actually bridging this gap (this would be AGI), let’s accept this as true.
You don’t need an LLM to be a senior engineer to be an effective tool, though. If an LLM can turn your design into concrete code more quickly than you could, that gives you more time to reason over the design, the potential side effects, etc. If you use the LLM well, it allows you to give more time to the things the LLM can’t do well.
Why can't you use LLMs with formal methods? Mathematicians are using LLMs to develop complex proofs. How is that any different?
I don't know why you're being downvoted, I think you're right.
I think LLMs need different coding languages, ones that emphasise correctness and formal methods. I think we'll develop specific languages for using LLMs with that work better for this task.
Of course, training an LLM to use it then becomes a chicken/egg problem, but I don't think that's insurmountable.
maybe. I think we're really just starting this, and I suspect that trying to fuse neural networks with symbolic logic is a really interesting direction to try to explore.
that's kind of not what we're talking about. a pretty large fraction of the community thinks programming is stone cold over because we can talk to an LLM and have it spit out some code that eventually compiles.
personally I think there will be a huge shift in the way things are done. it just won't look like Claude.
I suspect that we are going to have a wave of gurus who show up soon to teach us how to code with LLMs. There’s so much doom and gloom in these sorts of threads about the death of quality code that someone is going to make money telling people how to avoid that problem.
The scenario you describe is a legitimate concern if you’re checking in AI generated code with minimal oversight. In fact I’d say it’s inevitable if you don’t maintain strict quality control. But that’s always the case, which is why code review is a thing. Likewise you can use LLMs without just checking in garbage.
The way I’ve used LLMs for coding so far is to give instructions and then iterate on the result (manually or with further instructions) until it meets my quality standards. It’s definitely slower than just checking in the first working thing the LLM churns out, but it’s sill been faster than doing it myself, I understand it exactly as well because I have to in order to give instructions (design) and iterate.
My favorite definition of “legacy code” is “code that is not tested” because no matter who writes code, it turns into a minefield quickly if it doesn’t have tests.
> My favorite definition of “legacy code” is “code that is not tested” because no matter who writes code, it turns into a minefield quickly if it doesn’t have tests.
On the contrary, legacy code has, by definition, been battle tested in production. I would amend the definition slightly to:
“Legacy code is code that is difficult to change.”
Lacking tests is one common reason why this could be, but not the only possible reason.
> My favorite definition of “legacy code” is “code that is not tested” because no matter who writes code, it turns into a minefield quickly if it doesn’t have tests.
On the contrary, legacy code has, by definition, been battle tested in production. I would amend the definition slightly to:
“Legacy code is code that is difficult to write tests for”.
How do you know that it's actually faster than if you'd just written it yourself? I think the review and iteration part _is_ the work, and the fact that you started from something generated by an LLM doesn't actually speed things up. The research that I've seen also generally backs this idea up -- LLMs _feel_ very fast because code is being generated quickly, but they haven't actually done any of the work.
Because I’ve been a software engineer for over 20 years. If I look at a feature and feel like it will take me a day and an LLM churns it out in a hour including the iterating, I’m confident that using the LLM was meaningfully faster. Especially since engineers (including me) are notoriously bad at accurate estimation and things usually take at least twice as long as they estimate.
I have tested throwing several features at an LLM lately and I have no doubt that I’m significantly faster when using an LLM. My experience matches what Antirez describes. This doesn’t make me 10x faster, mostly because so much of my job is not coding. But in term of raw coding, I can believe it’s close to 10x.
> My favorite definition of “legacy code” is “code that is not tested” because no matter who writes code, it turns into a minefield quickly if it doesn’t have tests.
Unfortunately, "tests" don't do it, they have to be "good tests". I know, because I work on a codebase that has a lot of tests and some modules have good tests and some might as well not have tests because the tests just tell you that you changed something.
I see where you're coming from, and I agree with the implication that this is more of an issue for inexperienced devs. Having said that, I'd push back a bit on the "legacy" characterization.
For me, if I check in LLM-generated code, it means I've signed off on the final revision and feel comfortable maintaining it to a similar degree as though it were fully hand-written. I may not know every character as intimately as that of code I'd finished writing by hand a day ago, but it shouldn't be any more "legacy" to me than code I wrote by hand a year ago.
It's a bit of a meme that AI code is somehow an incomprehensible black box, but if that is ever the case, it's a failure of the user, not the tool. At the end of the day, a human needs to take responsibility for any code that ends up in a product. You can't just ship something that people will depend on not to harm them without any human ever having had the slightest idea of what it does under the hood.
Take responsibility by leaving a good documentation of your code and a beefy set of tests, future agents and humans will have a point to bootstrap from, not just plain code.
I think it was Cory Doctorow who compared AI-generated code to asbestos. Back in its day, asbestos was in everything, because of how useful it seemed. Fast forward decades and now asbestos abatement is a hugely expensive and time-consuming requirement for any remodeling or teardown project. Lead paint has some of the same history.
Get your domain names now! AI Slop Abatement, the major growth industry of the 2030s.
You don't just code with AI, you provide 2 things
1. a detailed spec, result of your discussions with the agent about work, when it gets it you ask the agent to formalize it into docs
2. an extensive suite of tests to cover every angle; the tests are generated, but your have to ensure their quality, coverage and depth
I think, to make a metaphor, that specs are like the skeleton of the agent, tests are like the skin, while the agent itself is the muscle and cerebellum, and you are the PFC. Skeleton provides structure and decides how the joints fit, tests provide pain and feedback. The muscle is made more efficient between the two.
In short the new coding loop looks like: "spec -> code -> test, rinse and repeat"
Is it really much different from maintaining code that other people wrote and that you merged?
Yes, this is (partly) why developer salaries are so high. I can trust my coworkers in ways not possible with AI.
There is no process solution for low performers (as of today).
The solution for low performers is very close oversight. If you imagine an LLM as a very junior engineer who needs an inordinate amount of hand holding (but who can also read and write about 1000x faster than you and who gets paid approximately nothing), you can get a lot of useful work out of it.
A lot of the criticisms of AI coding seem to come from people who think that the only way to use AI is to treat it as a peer. “Code this up and commit to main” is probably a workable model for throwaway projects. It’s not workable for long term projects, at least not currently.
A Junior programmer is a total waste of time if they don't learn. I don't help Juniors because it is an effective use of my time, but because there is hope that they'll learn and become Seniors. It is a long term investment. LLMs are not.
It’s a metaphor. With enough oversight, a qualified engineer can get good results out of an underperforming (or extremely junior) engineer. With a junior engineer, you give the oversight to help them grow. With an underperforming engineer you hope they grow quickly or you eventually terminate their employment because it’s a poor time trade off.
The trade off with an LLM is different. It’s not actually a junior or underperforming engineer. It’s far faster at churning out code than even the best engineers. It can read code far faster. It writes tests more consistently than most engineers (in my experience). It is surprisingly good at catching edge cases. With a junior engineer, you drag down your own performance to improve theirs and you’re often trading off short term benefits vs long term. With an LLM, your net performance goes up because it’s augmenting you with its own strengths.
As an engineer, it will never reach senior level (though future models might). But as a tool, it can enable you to do more.
> It’s far faster at churning out code than even the best engineers.
I'm not sure I can think of a more damning indictment than this tbh