The future of software development is software developers
codemanship.wordpress.com421 points by cdrnsf 3 months ago
421 points by cdrnsf 3 months ago
Most people in this thread are quibbling about the exact degree of utility LLMs provide, which a tedious argument.
What's more interesting to me is, per the article, the concern regarding everyone who is leaning into LLMs without realizing (or downplaying) the exorbitant, externalized cost. Our current LLM usage is being subsidized to the point of being free by outside investment. One day when the well runs dry, you must be able to either pay the actual cost (barring grand technology breakthroughs), or switch back to non-LLM workflows. I run local LLMs infrequently, and every single prompt makes my beefy PC sounds like a jet engine taking off. It's a great reminder to not become codependent.
As someone who works on the design and construction of datacenters, I cannot stress enough how apropos this comment is. Even before the first conversation in your IDE starts, the load on national and local government resources, local utility capacity, and roadway infrastructure is enormous. We're all paying whether we're using the tools or not.
Nearly nobody cares about the load on “national and local government resources, local utility capacity, and roadway infrastructure” for any other day-to-day activity. Why should they care about the same for AI which for most people is “out there online” somewhere? Related my, crypto bros worried about electricity usage only so far as its expense went and whether they could move closer to hydro dams.
The parent comment's point is that we _should_ care because cheap frontier-model access (that many of us have quickly become hopelessly dependent on) might be temporary.
It's amazing that anyone that has seen anything in technology in the last 30 years can say, "better be careful. They might stop subsidizing this and then it's gunna get expensive!" is ridiculous. I can buy a 1Tb flash drive for $100. Please, even with every reason to amortize the hardware over the longest horizon possible are only going out 6 years. 64K should be enough for anyone right?
I think the heavy investor subsidization / speculation makes this different. The high cost of early 1Tb flash drives was largely borne by buyers.
Yeah, I can't wait to buy some RAM for my PC! Oh, wait, the AI companies are buying up all the RAM sticks on the planet and driving up their prices to comical highs, surely these beacons of ethics and morality won't do the same with their services that are actively hemorrhaging Billions of dollars, they're providing these services to us out of the goodness of their black hearts and not any kind of monetary incentive after all!
Yes, hardware has become cheaper, but services all enshittify the moment the investors start to ask for some return.
If expert devs have junior devs to assign code to, that you review and integrate, do they become “hopelessly dependent” on junior devs?
My experience of expert devs is those who are happy to have extra leverage are not slowed much by having to do it themselves.
In no cases have I seen experts become “dependent” on the junior devs.
They do quite soon after they have become managers or product owners or “architects”.
They should care because they are expensive. If we become dependent on something that is expensive, we have to maintain a certain level of economic productivity to sustain our dependence.
For AI, once these companies or shareholders start demanding profit, then users will be footing the bill. At this rate, it seems like it'll be expensive without some technological breakthrough as another user mentioned.
For other things, like roads and public utilities, we have to maintain a certain level of economic productivity to sustain those as well. Roads for example are expensive to maintain. Municipalities, states, and the federal government within the US are in lots of debt associated with roads specifically. This debt may not be a problem now, but it leaves us vulnerable to problems in the future.
> Nearly nobody cares about ...
That's an accurate and sad truth about humanity in general, isn't it? We all feel safer and saner if we avoid thinking about how things really are. It's doubly true if our hands are dirty to some extent.
At the same time, I submit that ignoring the effectiveness of very small contingents of highly motivated people is a common failure mode of humanity in general. Recall that "nearly nobody" also describes "people who are the President of the United States." Observe how that tiny rounding error of humanity is responsible for quite a bit of the way the world goes - for good or ill. Arguably, that level of effectiveness doesn't even require much intelligence.
> Why should they care about the same for AI which for most people is “out there online” somewhere?
Well, some will be smart enough to see the problem. Some portion thereof will be wise enough to see a solution. And a portion of those folks will be motivated enough to implement it. That's all that's required. Very simple even if it's not very easy or likely.
I always liken it to using Uber in ~2012. It was fun to get around major metro areas for dirt cheap. But then prices rose dramatically over the next decade+ as the company was forced to wean itself off of VC subsidies.
It’s common since year dot fir new businesses to compete on price to attract customers and gain market share. It wasn’t invented by uber
Same with Airbnb. Oh and Moviepass. Those were the days.
Except none of those cost structures are based primarily on a resource that gets cheaper over time.. a.k.a. compute.
Computer isn’t getting cheaper, growth right now is supply constrained to memory which if you haven’t seen the news recently…
Training is getting exponentially more expensive. And inference isn’t that cheap unless you can do it locally
... and people kept using Uber.
Uber and Lyft put all the taxis out of business and now cost as much as the taxis they displaced
Ever notice that even where Uber doesn’t operate most of ride sharing alternatives work pretty much the same way? Go to South Asia, China, Middle East, or South East Asia.
Consumers pick those services because of what Uber pioneered — trust and convenience. You know exactly how much you pay, you pay everything upfront, you know you are dropped off where you need to be. There are of course exceptions, but exceptions they are.
Cost maybe the initial selling point but people stick with Uber and similar services despite higher cost, not because they don’t have other options.
> because of what Uber pioneered — trust
I really dislike the retro fitting of history. I’ve read more occurrences of serious SA by uber drivers and zero for normal taxi in the last few years
Not everywhere. Here the government fucked Uber etc. big time because it required the companies to pay for taxi licenses if I remember correctly. That is if they want to deliver a taxi service.
There's a lot in this comment that doesn't exactly fit.
First of all, there could be other solutions, such as B2B subsidizing individual user plans, or more fine grained model tiering per cost.
Also, yes you can get some access for free, but even today the higher tiers of proprietary models is around $200/mo for individual users, which might still be subsidized but is definitely not free, and is quite a chunk of money at $2400 a year!
I don't know what your setup is at the moment, but it's possible more efficient hardware and stack are available that you're not utilizing. Of course this depends on what models you're trying to run.
I think that smaller models will become a lot better, and hardware will become more optimized as well. We're starting to see this with NPUs and TPUs.
All this means running models will cost less, and maybe upgrading the power grid will also reduce cost of energy, making it more affordable.
I don't see any way that AI will go away because it "hits a wall". We have long passed the point of no return.
You are looking at it from the individual's PoV, but the OP is using the bird view from high above. It is the total amount of effort deployed today already to provide all the existing AI services, which is enormous. Data centers, electricity, planning/attention (entities focused on AI have less time to work on something else), components (Nvidia shortage, RAM shortage), etc.
This is not about finance, but about the real economy and how much of it, and/or its growth, is diverted to AI. The real economy is being reshaped, influencing a lot of other sectors independent of AI use itself. AI heavily competes with other uses for many kinds of actual real resources - without having equally much to show for it yet.
Just an example: https://www.technologyreview.com/2025/05/20/1116327/ai-energ... (it is worth skip-reading it all, the headline on its own is useless)
This is a good point but you can see the price "ceiling" by examining the prices for PCs that can effectively run local models. A DGX Spark is ~$4k (plus power) for example.
That's not nothing, but it's still not very much to pay compared to e.g. the cost of a FTE.
You're counting the cost of running the model, but what about training it? You can't count the compute and data costs at $0.
You can assume that already-published open weights models are available at $0, regardless of how much money was sunk into their original development. These models will look increasingly stale over time but most software development doesn't change quickly. If a model can generate capable and up-to-date Python, C++, Java, or Javascript code in 2025 then you can expect it to still be a useful model in 2035 (based on the observation that then-modern code in these languages from 2015 works fine today, even if styles have shifted).
>2025-2035
Depending on other people to maintain backward compatibility so that you can keep coding like it’s 2025 is its own problematic dependency.
You could certainly do it but it would be limiting. Imagine that you had a model trained on examples from before 2013 and your boss wants you to take over maintenance for a React app.
You're all referencing the strange idea in a world where there would be no open-weight coding models trained in the future. Even in a world where VC spending vanished completely, coding models are such a valuable utility that I'm sure at the very least companies/individuals would crowdsource them on a reoccurring basis, keeping them up to date.
The value of this technology has been established, it's not leaving anytime soon.
SOTA models cost hundreds of millions to train. I doubt anyone is crowdsourcing that.
And that’s assuming you already have a lot of the infrastructure in place.
I think faang and the like would probably crowdsource it given that they would—according to the hypothesis presented—would only have to do it every few years, and ostensibly are realizing improved developer productivity from them.
I don’t think the incentive to open source is there for $200 million LLM models the same way it is for frameworks like React.
And for closed source LLMs, I’ve yet to see any verifiable metrics that indicate that “productivity” increases are having any external impact—looking at new products released, new games on Steam, new startups founded etc…
Certainly not enough to justify bearing the full cost of training and infrastructure.
2013 was pre-LLM. If devs continue relying on LLMs and their training would stop (which i would find unlikely), still the tools around the LLMs will continue to evolve and new language features will get less attention and would only be used by people who don't like to use LLMs. Then it would be a race of popularity between new language (features) and using LLMs steering 'old' programming languages and APIs. Its not always the best technology that wins, often its the most popular one. You know what happened during the browser wars.
That can't come anywhere close to running the current SotA models, though.
Most things don't require SotA models.
But still, right now, you don't have to worry as even these SotA models are subsidized really so much and you can just use them for free on websites and then if you don't even want to type, go use a cheaper model or even a free model with something like opencode even to then act as a mini agent of things
Usually I just end up it being more focused in a single file which isn't really the best practise but its usually for prototyping purposes anyway so it ends up being really good
uv scripts are good for python, and I usually create golang single main.go files as well as I feel like it can be a binary, compile fast and cross compilation and still easy and simple so yeah :)
I find the cost discussion to be exceedingly more tedious. This would be a more compelling line of thinking if we didn't have highly effective open-weight models like qwen3-coder, glm 4.7 etc. which allow us to directly measure the cost of running inference with large models without confounding factors like VC money. Regardless of the cost of training, the models that exist right now are cheap and effective enough to push the conversation right back to "quibbling about the exact degree of utility LLMs provide".
>I run local LLMs infrequently, and every single prompt makes my beefy PC sounds like a jet engine taking off. It's a great reminder to not become codependent.
I would try setting the GPU to run at a lower power level. I set my GPU power level to 80% and it becomes much quieter, and only runs maybe 5% slower at most.
Also I 100% agree with the rest of your comment. We can only power the current growth we are seeing for so long.
LLMs cannot generate coherent sentences
LLMs writing prose is too robotic
LLMs output is too dependent on prompts to be interesting
LLMs take too much RAM to run effectively
LLMs take too much electricity to run locally
LLMs work locally but are a bit too slow for my taste
LLMs output mostly correct code but it isn't applicable to my codebase
LLMs make tool calls to pull in additional context
LLMs outputted code works for most developers but not my codebase <---- you are currently here
isn't this template supposed to mean that all the previous considerations are now obsolete?
I guess you could argue that the standard LLM sentence structure is too robotic but prompting mostly fixes that.
The rest is no longer true, indeed
So what'll happen to all these companies building on top of openai license. I don't hear these warnings in professional circles, only online
A competitive coding like devstral 2 runs fast enough to be very helpful: https://www.hardware-corner.net/devstral-2-hardware-requirem...
The required hardware is fits the budget for a professional developer.
Putting LLMs in the cloud allows the hardware to be utilised better and to have sharding of big models.
> One day when the well runs dry, you must be able to either pay the actual cost
What multiple of the current cost do you expect? Currently, GitHub Copilot and ChatGPT for Business cost $19/month and €29/month respectively. Even a 10×–20× increase will still be economically viable in a professional setting if the tools continue to save hours of work.
These tools (e.g. Chatgpt pro) lose money at $200/month
So expect, maybe, $1000 a month? Until your business is dependant on these LLMs. Then they can extract basically all your margin lol
At a $1,000/month price point, wouldn't the economics start favoring buying GPUs and running local LLMs? Even if they're weaker, local models can still cover enough use cases to justify the switch.
Claude max is like $100/mo, and if you’re a daily user you’re likely going to need max
The cost is coming down fast. You can get a $2000 desktop machine (AMD 395) that can run effectively chatGPT 3.5 levels of LLMs at over 100 tokens per second.
if you wrote this comment 70 years ago when computers were the size of rooms, it would make a lot of sense, and yet we know how history played out where everyone has a super computer in their pocket.
for some reason it feels like people are under the assumption that hardware isnt going to improve or something?
Writing my comment on this post, I kind of feel like LLM's are like similar to wordpress/drag and drop tool although its more inconsistent too perhaps not sure
I 100% share the codependent path too and had written a similar comment some 2-3 days ago but these AI companies which provide are either seriously negative/loss making subsidizing or they are barely net zero. I doubt that it will continue and so the bubble will burst I guess and prices of these will rise perhaps
We will see perhaps something like google which can feed on advertising can perhaps still provide such subsidies for a longer time but the fact of the matter is that I have no alleigance to any model as some might have and I will simply shift to the cheapest thing which can still provide me / be enough for my queries or prototypes mostly I suppose.
I am sorry, but this kind of level-headed and realistic take is completely unacceptable on hackernews, and you should be ashamed of yourself. This is not a place for rational discussion when it comes to LLMs.
LLMs are amazing and they will change everything, and then everything will be changed.
After working with agent-LLMs for some years now, I can confirm that they are completely useless for real programming.
They never helped me solve complex problems with low-level libraries. They can not find nontrivial bugs. They don't get the logic of interwoven layers of abstractions.
LLMs pretend to do this with big confidence and fail miserably.
For every problem I need to turn my brain to ON MODE and wake up, the LLM doesn't wake up.
It surprised me how well it solved another task: I told it to set up a website with some SQL database and scripts behind it. When you click here, show some filtered list there. Worked like a charm. A very solved problem and very simple logic, done a zillion times before. But this saved me a day of writing boilerplate.
I agree that there is no indication that LLMs will ever cross the border from simple-boilerplate-land to understanding-complex-problems-land.
I can confirm that they are completely useless for real programming
And I can confirm, with similar years of experience, that they are not useless.Absolutely incredible tools that have saved hours and hours helping me understand large codebases, brainstorm features, and point out gaps in my implementation or understanding.
I think the main disconnect in the discourse is that there are those pretending they can reliably just write all the software, when anyone using them regularly can clearly see they cannot.
But that doesn't mean they aren't extremely valuable tools in an engineer's arsenal.
Same. I started coding before hitting puberty, and Im well into my 30s.
If you know the problem space well, you can let LLMs(I use Claude and ChatGPT) flesh it out.
> I use Claude and ChatGPT
Both for code? For me, it's Claude only for code. ChatGPT is for general questions.
Yes, I use them in tandem. Generally Claude for coding and ChatGPT when I run out of tokens in Claude.
I also use ChatGPT to summarise my project. I ask it to generate mark down and PDFs, explaining the core functionality.
I feel like I have to be strategic with my use of claude code. things like frequently clearing out sessions to minimize context, writing the plan out to a file so that I can review it more effectively myself and even edit it, breaking problems down into consumable chunks, attacking those chunks in separate sessions, etc. it's a lot of prep work I have to do to make the tool thrive. that doesn't mean it's useless, though.
"real programming"
Perhaps you're doing some amazing low-level work, but it feels like you're way overestimating how much of our industry does that. A massive amount of developers show up to work every day and just stitch together frameworks and libraries.
In many ways, it feels similar to EVs. Just because EVs aren't yet, and may never be, effective to moving massive amounts of cargo in a day with minimal refueling, doesn't mean that they aren't an effective solution for the bulk of drivers who have an average commute of 40 miles a day.
> After working with agent-LLMs for some years now, I can confirm that they are completely useless for real programming
This is a bit of no-true-scottsman, no? For you "real programming" is "stuff LLMs are bad at," but a lot of us out in the real world are able to effectively extract code that meets the requirements of our day jobs from tossing natural language descriptions into LLMs.
I actually find the rise of LLM coding depressing and morally problematic (re copyright / ownership / license laundering), and on a personal level I feel a lot of nostalgia for the old ways, but I simply can't levy an "it's useless" argument against this stuff with any seriousness.
I only use it sparingly thus far, and for small things, but I don't find it depressing at all - but timely.
All those many, many languages, frameworks, libraries, APIs and there many many iterations, soooo much time lost on minute details. The natural language description, even highly detailed down to being directly algorithmic, is a much better level for me. I have gotten more and more tired of coding, but maybe part of it is too much Javascript and its quickly changing environment and tools, for too many years (not any more though). I have felt that I'm wasting way too much time chasing all those many, many details for quite some time.
I'm not pro-high-level-programming per se - I started a long time ago with 8 bit assembler and knowing every one of the special registers and RAM cells. I cherish the memories of complex software fitting on a 1.44 MB floppy. But it had gotten just a bit too extreme with all the little things I had to pay attention to that did not contribute to solving the actual (business) problem.
I feel it's a bit early even if it's already usable, but I hope they can get at least one more giant leap out of AI in the next decade or so. I am quite happy to be able to concentrate on the actual task, instead of the programming environment minutiae, which has exploded in size and complexity across platforms.
"they are completely useless for real programming"
You and I must have completely different definitions of "real programming". In this very comment, you described a problem that the model solved. The solution may not have involved low-level programming, or discovering a tricky bug entrenched in years-worth of legacy code, but still a legitimate task that you, as a programmer, would've needed to solve otherwise. How is that not "real programming"?
I wouldn't describe the LLM's actions in the example as "solving a problem" so much as "following a well-established routine". If I were to, for instance, make a PB&J sandwich, I wouldn't say that what I'm doing is "real cooking" even if it might technically fit the definition.
If an LLM controlling a pair of robot hands was able to make a passable PB&J sandwich on my behalf, I _guess_ that could be useful to me (how much time am I really saving? is it worth the cost? etc.), but that's very different from those same robo-hands filling the role of a chef de cuisine at a fine dining restaurant, or even a cook at a diner.
In this analogy you're clearly a private chef with clients who have very specific wishes and allergies.
The rest of us are just pumping out CRUD-burgers off the API assembly line. Not exactly groundbreaking stuff.
LLMs are really good with burgers, but not so much being a private chef.
Every useful CRUD app becomes its own special snowflake with time and users.
Now if your CRUD app never gets any users sure it stays generic. But we’ve had low code solutions that solve this problem for decades.
LLMs are good at stuff that probably should have been low code in the first place, but couldn’t be for reasons. That’s useful, but it comes with a ton of trade offs. And these kind of solutions covet a lot less ground than you’d think.
I'm old enough to remember the "OMG low-code is going to take our jeeeerbbs!" panic :D
Like LLMs they took away a _very_ specific segment of software, Zapier, n8n, NodeRED etc. do some things in a way that bespoke apps can't - but they also hit a massive hard wall where you either need to do some really janky shit or just break out Actual Code to get forward.
People are saying Codex 5.2 fullsolved crypto challenges in 39C3 CTF last weekend.
Three months ago I would have agreed with you, but anecdotal evidence says Codex 5.2 and Opus 4.5 are finally there.
You'll get a vastly different experience the more you use these tools and learn their limitations and how you can structure things effectively to let them do their job better. But lots of people, understandably, don't take the time to actually sit down and learn it. They spend 30 seconds on some prompt not even a human would understand, and expect the tooling to automatically spend 5 hours trying its hardest at implementing it, then they look at the results and conclude "How could anyone ever be productive with this?!".
People say a lot of things, and there is a lot of context behind what they're saying that is missing, so then we end up with conversations that basically boil down to one person arguing "I don't understand how anyone cannot see the value in this" with another person thinking "I don't understand how anyone can get any sort of value out of this", both missing the other's perspective.
Prompt engineering is just good transfer notes and ticket writing, which is something a majority of the devs I've worked with don't enjoy or excel at
I've been using Codex and Claude Sonnet for many months now for personal (Codex) and work (Sonnet) and I agree. Three months ago these tools were highly usable, now with Codex 5.2 and Sonnet 4.5 I think we're at the point where you can confidently rely on them to analyze your repo codebase and solve, at the very least, small scoped problems and apply any required refactor back throughout the codebase.
6-12+ months ago the results I was getting with these tools were highly questionable but in the last six months the changes have been pretty astounding
Sonnet is dumb as a bag of bricks compared to Opus, perhaps you meant Opus? I never use sonnet for anything anymore, it’s either too verbose or just can’t handle tasks which Opus one shots.
These anecdotes feel so worthless. I notice almost no difference between the two and get generally high quality results from either. This is also a worthless anecdote. I'm guessing what kind of codebase you are working in matters a lot as well as the tasks you're giving it.
I use the Copilot extension in VS Code, which links back to my enterprise GitHub account, where I have Claude Sonnet 4.5 available amongst other things. I'm not familiar with Opus. I just open the Copilot Chat window in my VS Code, configure it to use Sonnet 4.5, tell it what I need and it writes the responses and code for me. I'm not using it for large tasks. Most of my usage is "examine this codebase and tell me how to fix xyz problem" or "look at this source code file and show me the code to implement some feature, make sure to examine the entire codebase for insight into how it should be integrated with the rest of the project"
There's other more advanced coding AI tools but this has accomplished most all of my needs so far
The Copilot extension in VS Code includes Opus as well. It costs three times as much as Claude, so I'd expect it to perform better or be able to handle more complex tasks, but if you're happy with Claude - I am too - more power to you.
It’s crazy how different my experience is. I think perhaps it’s incredibly important what programming language you are using, what your project and architecture is like. Agents are making an extraordinary contribution to my productivity. If they jacked my Claude Code subscription up to $500/month I would be upset but almost certainly would keep paying it, that’s how much value it brings.
I’m in enterprise ERP.
It sounds like you use your personal Claude Code subscription for work of your employer, but that is not something I would ever consider doing personally so I imagine I must be mistaken.
Can you elaborate slightly on what you pay for personally and what your employer pays for with regards to using LLMs for Enterprise ERP?
Freelancers regularly use tools such as Copilot and Claude, it's always handled professionally and in agreement with their customers. I've seen other freelancers do it plenty of times in the last 1-2 years at my customer sites.
Why so narrow minded?
I'm inquisitive not narrow minded.
The GP didn't mention anything about freelancing so unless you know them or stalked them you are perhaps being narrow minded here.
They also never said anything about being employed.
You are being narrow minded here.
Again, I disagree and reaffirm my being full of inquisitiveness.
You are being downright unpleasant and I don't think we should continue this conversation further until you open your mind.
I own my own business.
Interesting and thanks for clarifying that aspect. I have a few more questions if you would be able to answer any of them at any level of detail I would appreciate it.
How much would you be willing to pay to continue using Claude on a monthly basis before you stopped?
Do you currently maintain the new (as of two weeks ago) cash reserve to ensure it continues working when limits are reached and how much do you reserve for said reserve?
Finally, do you send your customer's code or data directly to Claude or do you use it indirectly on generic stuff and then manually specialize the outputs?
Even more important than those things, is how well you can write and communicate your ideas. If you cannot communicate your ideas so a human could implement it as you wanted it to without asking extra questions, a LLM isn't gonna be able to.
As someone who has managed engineers for many years I find those skills immediately applicable to the LLM domain. If you aren't used to communicating what you are trying to build to other engineers I think using the AI is harder as you need to develop those skills.
I'd take it a step further and say that for any engineer who is used to collaborating with others, engineers or not, should have these skills already, but as most of us know, communication is a generally lacking skill among the population at large, even among engineers too.
Natural language programming has arrived in my opinion. If you're not a developer or have any experience programming it won't help much
> After working with agent-LLMs for some years now, I can confirm that they are completely useless for real programming.
"completely useless" and "real programming" are load bearing here. Without a definition to agree on for those terms, it's really hard not to read that as you're trying to troll us by making a controversial unprovable claim that you know will get people that disagree with you riled up. What's especially fun is that you then get to sneer at the abilities of anybody making concrete claims by saying "that's not real programming".
How tiresome.
Who cares about semantics.
Ultimately it all boils down to the money - show me the money. OAI have to show money and so do its customers from using this tool.
But nope, the only thing out there where it matters is hype. Nobody is on an earnings call clearly showing how they had a numerical jump in operating efficiency.
Until I see that, this technology has a dated shelf life and only those who already generate immense cash flows will fund its continued existence given the unfavourable economics of continued reinvestment where competition is never-ending.
The "real programming" people are moving the goalposts of their no true scotsman fallacy so fast they're leaving Roadrunner style dust behind them.
Yes, there are things LLMs can't do at all, some where they are actively dangerous.
But also there are decently sized parts of "software development" where any above average LLM can speed up the process as long as whoever is using it knows hot to do so and doesn't fight the tool.
Who cares. Focus on what matters. OAI knows this considering they are dedicating a lot of their resources toward figuring out how to become profitable.
isnt OAI only unprofitable because they are putting all their money into more training?
the product market fit for LLMs is already clearly found, there's just no moat to it. tokens are a commodity
maybe they'll ask cut off the free tiers in 2026 and the only thing left will be China and open router
agreed. we should instead be sneering at the AI critics because "you're holding it wrong"
> After working with agent-LLMs for some years now
Some years? I don't remember any agents being any good at all before just over 1 year ago with Cursor and stuff really didn't take off until Claude Code.
Which isn't to say you weren't working with agent-LLMs before that, but I just don't know how relevant anything but recent experience is.
> I can confirm that they are completely useless for real programming
Can you elaborate on "real programming" ?
I am assuming you mean the simplest hard problem that is solved. The value of the work is measured in those terms. Easy problems have boilerplate solutions and have been solved numerous times in the past. LLMs excel here.
Hard problems require intricate woven layers of logic and abstraction, and LLMs still struggle since they do not have causal models. The value however is in the solution of these kinds of problems since the easy problems are assumed to be solved already.
> After working with agent-LLMs for some years now, I can confirm that they are completely useless for real programming. > They never helped me solve complex problems with low-level libraries. They can not find nontrivial bugs. They don't get the logic of interwoven layers of abstractions.
This was how I felt until about 18 months ago.
Can you give a single, precise example where modern day LLMs fail as woefully as you describe?
i had to disable baby Ceph (Deepseek 3.1) from writing changes in Continue because he's like a toddler. But, he did confirm some solutions and wrote a routine and turn me on to some libraries, etc
so I see what you're saying. he comes up with the wrong answers a lot to a problem involving a group of classes in related files
however it's Continue, so it can read files in vs code which is really nice and that helps a lot with its comprehension so sometimes it does find the issue or at least the nature of the issue
I tend to give it bug n-1 to pre digest while I work on bug n
>After working with agent-LLMs for some years now, I can confirm that they are completely useless for real programming.
>They never helped me solve complex problems with low-level libraries. They can not find nontrivial bugs. They don't get the logic of interwoven layers of abstractions.
>LLMs pretend to do this with big confidence and fail miserably.
This is true for most developers as well. The mean software developer, especially if you outsource, has failure modes worse than any LLM and round-trip time is not seconds but days.
The promise of LLMs is not that they solve the single most difficult tasks for you instantly, but that they do the easy stuff well enough that they replace offshore teams.
> The promise of LLMs is not that they solve the single most difficult tasks for you instantly, but that they do the easy stuff well enough that they replace offshore teams.
But that's exactly the *promise* of LLMs by the hypepeople behind it.
>But that's exactly the promise of LLMs by the hypepeople behind it.
I do not know and do not care what the "hypepeople" say. I can tell you that, by pure logic alone, LLMs will be superior at simple and routine tasks sooner, which means they will compete with outsourced labor first.
LLMs need to be measured against their competition and their competition right now is outsourced labor. If an LLM can outperform an offshore team at a fraction of the cost, why would any company choose the offshore team? Especially when the LLM eliminates some of the biggest problems with offshore teams (communication barriers, round trip times).
If LLMs take any programmer jobs they will at the very beginning make those outsourced jobs obsolete, so the only relevant question is whether they have done that or are in the process of doing so. If they don't, then their impact will be minimal, if they do, then their impact will be massive. I think that this line of thinking is a far better benchmark then asking whether an LLM gets X or Y question wrong Z% of the time.
> If an LLM can outperform an offshore team at a fraction of the cost,..
And "a few moments later" happens the same as with those "cost effective" clouds.
[1] https://www.heise.de/en/news/IDC-Many-companies-want-partly-...
[2] https://www.idc.com/resource-center/blog/storm-clouds-ahead-... (original)
in the end, it all comes down to roi; if spending x dollars a month brings in an additional 5x revenue then its gonna be worth?
then again, i have some suspicion that alot of consumer-focused end products using llms in the backend (hello chatbots) expecting big returns for all those tokens spent may have some bad news coming... if the bubble starts popping i'm guessing it starts there...
Outsourced devs wielding smart models are even cheaper than onshore and the models lift all boats wrt capability.
The bottleneck will soon be ideas for the things to build.
> The bottleneck will soon be ideas for the things to build
No, it won't. The utility of LLMs is already growing asymptotically now...
>Outsourced devs wielding smart models are even cheaper than onshore
But they do not compete. They have totally different jobs.
I bet you trusted the Blockchain bros and were first in line to buy NFTs too. No?
Why would you trust the hype when you can verify this stuff yourself pretty easily.
Claude is currently porting my rust emulator to WASM. It's not easy at all, it struggles, I need to guide it quite a lot but it's way easier to let him do it than me learning yet another tech. For the same result I have 50% the mental load...
The idea they're good for development is propped up a lot by people able to have a react + tailwind site spun up fast. You know what also used to be able to scaffold projects quickly? The old init scripts and generators!
I really really want this to be true. I want to be relevant. I don’t know what to do if all those predictions are true and there is no need (or very little need) for programmers anymore.
But something tells me “this time is different” is different this time for real.
Coding AIs design software better than me, review code better than me, find hard-to-find bugs better than me, plan long-running projects better than me, make decisions based on research, literature, and also the state of our projects better than me. I’m basically just the conductor of all those processes.
Oh, and don't ask about coding. If you use AI for tasks above, as a result you'll get very well defined coding task definitions which an AI would ace.
I’m still hired, but I feel like I’m doing the work of an entire org that used to need twenty engineers.
From where I’m standing, it’s scary.
I was a chef in Michelin-starred restaurants for 11 years. One of my favorite positions was washing dishes. The goal was always to keep the machine running on its 5-minute cycle. It was about getting the dishes into racks, rinsing them, and having them ready and waiting for the previous cycle to end—so you could push them into the machine immediately—then getting them dried and put away after the cycle, making sure the quality was there and no spot was missed. If the machine stopped, the goal was to get another batch into it, putting everything else on hold. Keeping the machine running was the only way to prevent dishes from piling up, which would end with the towers falling over and breaking plates. This work requires moving lightning fast with dexterity.
AI coding agents are analogous to the machine. My job is to get the prompts written, and to do quality control and housekeeping after it runs a cycle. Nonetheless, like all automation, humans are still needed... for now.
If it requires an expert engineer/dishwasher to keep the flow running perfectly, the human is the bottleneck in the process. This sounds a lot more like the past before AI to me. What AI does is just give you enough dishes that they don’t need to be washed at all during dinner service. Just let them pile up dirty or throw them away and get new dishes tomorrow it’s so immaterial to replace that washing them doesn’t always make sense. But if for some reason you do want to reuse them, then, it washes and dries them for you too. You just look over things at the end and make sure they pass your quality standards. If they left some muck on a plate or lipstick on a cup, just tell it not to let that happen again and it won’t. So even your QC work gets easier over time. The labor needed to deal with dirty dishes is drastically reduced in any case.
> humans are still needed... for now
"AI" doesn't have a clue what to do on its own. Humans will always be in the loop, because they have goals, while the AI is designed to placate and not create.
The amount of "AI" garbage I have to sift through to find one single gem is about the same or more work than if I had just coded it myself. Add to that the frustration of dealing with a compulsive liar, and it's just a fucking awful experience for anyone that actually can code.
Humans are still needed, but they just got down-skilled.
> got down-skilled.
who's to say that it's a down?
Orchestrating and doing higher level strategic planning, such that the sub-tasks can be AI produced, is a skill that might be higher than programming.
> Coding AIs design software better than me, review code better than me, find hard-to-find bugs better than me, plan long-running projects better than me, make decisions based on research, literature, and also the state of our projects better than me.
That is just not true, assuming you have a modicum of competence (which I assume you do). AIs suck at all these tasks; they are not even as good as an inexperienced human.
For all we know, you both could comparing using a Nokia 3310 and a workstation PC based on the hardware, but you both just say "this computer is better than that computer".
There are a ton of models out there, ran in a ton of different ways, that can be used in different ways with different harnesses, and people use different workflows. There is just so many variables involved, that I don't think it's neither fair nor accurate for anyone to claim "This is obviously better" or "This is obviously impossible".
I've been in situations where I hit my head against some hard to find bug for days, then I put "AI" (but what? No one knows) to it and it solves it in 20 minutes. I've also asked "AI" to do trivial work that it still somehow fucked up, even if I could probably have asked a non-programmer friend to do it and they'd be able to.
The variance is great, and the fact that system/developer/user prompts matter a lot for what the responses you get, makes it even harder to fairly compare things like this without having the actual chat logs in front of you.
> The variance is great
this strikes me as a very important thing to reflect on. when the automobile was invented, was the apparent benefit so incredibly variable?
> was the apparent benefit so incredibly variable?
Yes, lots of people were very vocally against horseless-carriages, as they were called at the time. Safety and public nuisance concerns were widespread, the cars were very noisy, fast, smoky and unreliable. Old newspapers are filled with opinions about this, from people being afraid of horseless-carriages spooking other's horses and so on. The UK restricted the adoption of cars at one point, and some Canton in Switzerland even banned cars for a couple of decades.
Horseless-carriages was commonly ridiculed for being just for "reckless rich hobbyists" and similar.
I think the major difference is that cars produced immediate, visible externalities, so it was easy for opposition to focus on public safety in public spaces. In contrast, AI has less physically visible externalities, although they are as important, or maybe even more important, than the ones cars introduced.
yeah I agree about the negative externalities but I'm curious about the perceived benefits. did anybody argue that cars were actually slower than horse and carriage? (were they at first?)
The cars were obviously faster than the typical horse transportation and I don't think anyone tried to argue against that, but laws typically restricted cars so they couldn't go faster than horses, at least in highly populated areas like cities. As others mentioned too, the benefit of not needing roads to go places were highlighted as a drawback of cars too. People argued that while cars might go faster, the result would be that the world would be worse off in total.
sure but my point is people could agree they were faster at least. that is decidedly not true for LLMs. maybe due to alignable vs non-alignable differences
Is this a trick question? Yes it was. A horse could go over any terrain while a car could only really go over very specific terrain designed for it. We had to terraform the world in order to make the automobile so beneficial. And it turned out that this terraforming had many unintended consequences. It's actually a pretty apt comparison to LLMs.
who would I be trying to trick if it was? you didn't answer the question anyways. I'm not wondering whether cars were seen as strictly better than horses in all situations. I'm wondering if people disagreed so vehemently about whether cars were faster road transportation than horses
LLMs generate the most likely code given the problem they're presented and everything they've been trained on, they don't actually understand how (or even if) it works. I only ever get away with that when I'm writing a parser.
> they don't actually understand how
but if it empirically works, does it matter if the "intelligence" doesn't "understand" it?
Does a chess engine "understand" the moves it makes?
It matters if AGI is the goal. If it remains a tool to make workers more productive, then it doesn't need to truly understand, since the humans using the tools understand. I'm of the opinion AI should have stood for Augmented (Human) Intelligence outside of science fiction. I believe that's what early pioneers like Douglas Engalbert thought. Clearly that's what Steve Jobs and Alan Kay thought computing was for.
AGI is such a meaningless concept. We can’t even fully design what human intelligence is (and when a human fails it meaning they lack human intelligence). It’s just philosophy.
AGI is about as well defined as "full self-driving" :D
It's an useless philosophical discussion.
If it empirically works, then sure. If instead every single solution it provides beyond a few trivial lines falls somewhere between "just a little bit off" and "relies entirely on core library functionality that doesn't actually exist" then I'd say it does matter and it's only slightly better than an opaque box that spouts random nonsense (which will soon include ads).
Those are 2024-era criticisms of LLMs for code.
Late 2025 models very rarely hallucinate nonexistent core library functionality - and they run inside coding agent harnesses so if they DO they notice that the code doesn't work and fix it.
This sounds like you're copy-pasting code from ChatGPT's web interface, which is very 2024.
Agentic LLMs will notice if something is crap and won't compile and will retry, use the tools they have available to figure out what's the correct way, edit and retry again.
Depends on how he defined "better". If he uses the word "better" to mean "good enough to not fail immediately, and done in 1/10th of the time", then he's correct.
I think I've been using AI wrong. I can't understand testimonies like this. Most times I try to use AI for a task, it is a shitshow, and I have to rewrite everything anyway.
Have you tried Opus 4.5 (or similar recent models)? With Claude code 2, it's actually harder to mess things up IMO
I remember when about a year ago people were asking the same thing about gpt-4.5, the answer is always “yes, I’ve tried them all”
Ok, but have you tried claude-sonnet-GPT-codex-4.5-thinking-fast? That's the game changer. Anyone saying bad things about vibe coding without trying claude-sonnet-GPT-codex-4.5-thinking-fast is like a dinosaur to me, doomed to extinction. Seriously, give claude-sonnet-GPT-codex-4.5-thinking-fast a try, you'll thank me ;)
Fair. Well personally they didn't work well for me (on a huge, complex codebase) until the latest batch. Now they do.
I don’t know about right/wrong. You need to use the tools that make you productive. I personally find that in my work there are dozens of little scripts or helper functions that accelerate my work. However I usually don’t write them because I don’t have the time. AI can generate these little scripts very consistently. That accelerates my work. Perhaps just start simple.
Instead of generating, exporting or copy pasting just seems more reliable to me and also takes very little time.
I think what matters most is just what you're working on. It's great for crud or working with public APIs with lots of examples.
For everything else, AI has been a net loss for me.
> there are dozens of little scripts or helper functions that accelerate my work. However I usually don’t write them because I don’t have the time
People who write things like this can't expect to be taken seriously.
Before AI you didn't have time to write things that saved you time? So you just ended up spending (wasting) more time by going the long way? That was a better choice than just doing the thing that would have saved you time?
Do you tell AI the patterns/tools/architecture you want? Telling agents to "build me XYZ, make it gud!" is likely to precede a mess, telling it to build a modular monolith using your library/tool list, your preferred folder structure, other patterns/algorithms you use, etc will end you up with something that might have some minor style issues or not be perfectly canonical, but will be approximately correct within a reasonable margin, or is within 1-2 turns of being so.
You have to let go of the code looking exactly a certain way, but having code _work_ a certain way at a coarse level is doable and fairly easy.
We are way beyond this. Now you use your plain text prompt to generate a requirements spec that the AI will follow when implementing your project
Kiro is just trying to build a product around exactly what I'm talking about. I'm not a fan, because it's simultaneously too heavyweight and agents don't respect all the details of the specs it creates enough to make the time investment in super-detailed specs worthwhile.
I have a spec driven development tool I've been working on that generates structured specs that can be used to do automatic code generation. This is both faster and more robust.
Honestly, even this isn't really true anymore. With Opus 4.5 and 5.2 Codex in tools like Cursor, Claude Code, or Codex CLI, "just do the thing" is a viable strategy for a shockingly large category of tasks.
Just do the thing can produce functional code, but even with Opus4.5/Codex5.2, there are still plenty of moments where the way it decides to do something is cringe.
Agree. But it's increasingly the case, IME, that for a a lot of tasks, you can start with that. If it does it well, great. If it does something stupid, it's easy enough to ask it to completely rework the stupid thing in a better way, and it can do it quickly. That's still a huge shift compared to the olden days (three months ago) where you needed to really break things down into small chunks for it to get to a success state.
>You have to let go of the code looking exactly a certain way, but having code _work_ a certain way at a coarse level is doable and fairly easy.
So all that bullshit about "code smells" was nonsense.
A lot of code smells matter more for humans than LLMs (and LLMs have their own unique code smells). For example, nested ternary operators are a great source of bugs in human code, but agents could care less, but humans handle multiple files with the same variable names and lots of duplicated code well, whereas this stuff confuses agents.
>but agents could care less,
The phrase is "couldn't care less". If you "could care less" then you actually care about it. If you "couldn't care less" then there's no caring at all.
have you tried using $NEWEST_MODEL ?
It’s because depending on the person the newest model crossed the line into being useful for them personally. It’s not like a new version crosses the line for everyone. It happens gradually. Each version more and more people come into the fold.
For me Claude code changed the game.
yes, it is trivially true that each new person who recommends LLMs is a new person coming into the fold
You get new people recommending the latest version all the time to people who are unconvinced because that version is usually what brought them into the fold.
What you’re mocking is somewhat of a signal of actual improvement of the models and that improvement as a result becoming useful to more and more people.
how much time/effort have you put in to educate yourself about how they work, what they excel at, what they suck at, what is your responsibility when you use them…? this effort is directly proportional to how well they will serve you
>> From where I’m standing, it’s scary.
You are being fooled by randomness [1]
Not because the models are random, but because you are mistaking a massive combinatorial search over seen patterns for genuine reasoning. Taleb point was about confusing luck for skill. Dont confuse interpolation for understanding.
You can read a Rust book after years of Java, then go build software for an industry that did not exist when you started. Ask any LLM to write a driver for hardware that shipped last month, or model a regulatory framework that just passed... It will confidently hallucinate. You will figure it out. That is the difference between pattern matching and understanding.
I've worked with a lot of interns, fresh outs from college, overseas lowest bidders, and mediocre engineers who gave up years ago. All over the course of a ~20 year career.
Not once in all that time has anyone PRed and merged my completely unrelated and unfinished branch into main. Except a few weeks ago. By someone who was using the LLM to make PRs.
He didn't understand when I asked him about it and was baffled as to how it happened.
Really annoying, but I got significantly less concerned about the future of human software engineering after that.
Have you used an LLM specifically trained for tool calling, in Claude Code, Cursor or Aider?
They’re capable of looking up documentation, correcting their errors by compiling and running tests, and when coupled with a linter, hallucinations are a non issue.
I don’t really think it’s possible to dismiss a model that’s been trained with reinforcement learning for both reasoning and tool usage as only doing pattern matching. They’re not at all the same beasts as the old style of LLMs based purely on next token prediction of massive scrapes of web data (with some fine tuning on Q&A pairs and RLHF to pick the best answers).
I'm using Claude code to help me learn Godot game programming.
One interesting thing is that Claude will not tell me if I'm following the wrong path. It will just make the requested change to the best of its ability.
For example a Tower Defence game I'm making I wanted to keep turret position state in an AStarGrid2D. It produced code to do this, but became harder and harder to follow as I went on. It's only after watching more tutorials I figured out I was asking for the wrong thing. (TileMapLayer is a much better choice)
LLMs still suffer from Garbage in Garbage out.
don't use LLMs for Godot game programming.
edit: Major engine changes have occurred after the models were trained, so you will often be given code that refers to nonexistent constants and functions and which is not aware of useful new features.
before coding I just ask the model "what are the best practices in this industry to solve this problem? what tools/libraries/approaches people use?
after coding I ask it "review the code, do you see any for which there are common libraries implementing it? are there ways to make it more idiomatic?"
you can also ask it "this is an idea on how to solve it that somebody told me, what do you think about it, are there better ways?"
> before coding I just ask the model "what are the best practices in this industry to solve this problem? what tools/libraries/approaches people use?
Just for the fun of it, and so you lose your "virginity" so to speak, next time when the magic machine gives you the answer about "what it thinks", tell it its wrong in a strict language and scold it for misleading you. Tell it to give you the "real" best practices instead of what it spat out. Then sit back and marvel at the machine saying you were right and that it had mislead you. Producing a completely, somewhat, or slightly different answer (you never know what you get on the slot machine).
Both the before and after are better done manually. What you are describing is fine for the heck of it (I‘ve vibe coded a whisper related rust port today without having any actual rust skills), but I’d never use fully vibed software in production. That’s irresponsible in multiple ways.
Ask a model to
"Write a chess engine where pawns move backward and kings can jump like nights"
It will keep slipping back into real chess rules. It learned chess, it did not understand the concept of "rules"
Or
Ask it to reverse a made up word like
"Reverse the string 'glorbix'"
It will get it wrong on the first try. You would not fail.
Or even better ask it to...
"Use the dxastgraphx library to build a DAG scheduler."
dxastgraphx is a non existing library...
Marvel at the results...tried in both Claude and ChatGPT....
I‘ve just tried the dxastgraphx one in pi with Opus 4.5. This was its response:
I couldn't find a library called dxastgraphx in either pip (Python) or npm (JavaScript) package registries. This library doesn't appear to exist.
Did you perhaps mean one of these popular DAG/graph libraries?
Python:
- networkx - comprehensive graph library with DAG support
- graphlib - Python standard library (3.9+) with TopologicalSorter
- dask - parallel computing with DAG task scheduling
JavaScript/TypeScript:
- graphlib - graph data structures
- dagre - DAG layout
Would you like me to build a DAG scheduler using one of these existing libraries, or would you like me to implement one from scratch? Let me know which language and approach you prefer.Tried ChatGPT (free version). I answered correctly to the first question.
Answer to second question:
"I can do that, but there’s a catch: dxastgraphx is not a known or standard Python (or JS) library as of now. I don’t have any public API or documentation for it to target directly.
To avoid guessing wrong, here are the two sensible options:"
Why would I ask the model to reverse the string 'glorbix,' especially in the context of software engineering?
just tried to reverse the string you provided using Gemini. it worked fine on the first try
In gemini:
> The reverse of the string 'gssssyeekxxuoaqaaaiooo9aaa7zjx' is:
xjz7aaa9oooiiaaaqauxxkyeessssg
> Youve made a mistake
``` I see where I tripped up! Let's re-examine that long string carefully.
The original string you gave me was: `gssssyeekxxuoaqaaaiooo9aaa7zjx`
Let's look at the end of that string: `...aaa7zjx`. When we reverse it, it should start with: `xjz7aaa...`
Here is the precise character-by-character reversal:
*xjz7aaa9oooiiaaaqauxxkyeessssg*
Wait—I see it now. In my previous response, I included an extra *'i'* and missed one *'o'* in the middle section. Let's do a strict count to get it perfect:
* *Original:* `g` `ssss` `y` `ee` `k` `xx` `u` `o` `a` `q` `aaa` *`iooo`* `9` `aaa` `7` `z` `j` `x` * *Correct Reverse:* `x` `j` `z` `7` `aaa` `9` *`oooi`* `aaa` `q` `a` `o` `u` `xx` `k` `ee` `y` `ssss` `g`
The correct reversed string is: *xjz7aaa9oooiiaaaqauoxxkyeessssg*
Thank you for catching that. It’s easy for the eyes to glaze over when there are so many repeating vowels!
Since I've stumbled a bit on these, would you like to give me one more string to see if I can get it right on the first try? ```
After more back and fors it consistently fails in this task, even though when strictly dividing the tokens it will get this right. Yet the final answer is always wrong.
You’re trying to interrogate a machine as you would a human and presenting this as evidence that machines aren’t humans. Yes, you’re absolutely right! And also completely missing the point.
Why would you expect an LLM or even a human to succeed in these cases? “Write a piece of code for a specification that you can’t possibly know about?” That’s why you have to do context engineering, just like you’d provide a reference to a new document to an engineer writing code.
This is exactly what happened to me: novel or uncommon = hallucinate or invent wrong.
It is ok for getting snippets for example and saying (I did it). Please make this MVVM style. It is not perfect, but saves time.
For very broad or novel reasoning, as of today... forget it.
They do all those things you've mentioned more efficiently than most of us, but they fall woefully short as soon as novelty is required. Creativity is not in their repertoire. So if you're banging out the same type of thing over and over again, yes, they will make that work light and then scarce. But if you need to create something niche, something one-off, something new, they'll slip off the bleeding edge into the comfortable valley of the familiar at every step.
I choose to look at it as an opportunity to spend more time on the interesting problems, and work at a higher level. We used to worry about pointers and memory allocation. Now we will worry less and less about how the code is written and more about the result it built.
Take food for example. We don't eat food made by computers even though they're capable of making it from start to finish.
Sure we eat carrots probably assisted by machines, but we are not eating dishes like protein bars all day every day.
Our food is still better enjoyed when made by a chef.
Software engineering will be the same. No one will want to use software made by a machine all day every day. There are differences in the execution and implementation.
No one will want to read books entirely dreamed up by AI. Subtle parts of the books make us feel something only a human could have put right there right then.
No one will want to see movies entirely made by AI.
The list goes on.
But you might say "software is different". Yes but no, in the abundance of choice, when there will be a ton of choice for a type of software due to the productivity increase, choice will become more prominent and the human driven software will win.
Even today we pick the best terminal emulation software because we notice the difference between exquisitely crafted and bloated cruft.
You should look at other engineering disciplines. How many highway over passes have unique “chef quality” designs? Very few. Most engineering is commodity replications of existing designs. The exact same thing applies to software engineering. Most of us engineers are replicating designs that came earlier. LLMs are good at generating the rote designs that make up the bulk of software by volume. Who benefit from an artisanal REST interface? The best practices were codified over a decade ago.
> How many highway over passes have unique “chef quality” designs?
Have you ever built a highway overpass? That kind of engineering is complex and interdisciplinary. You need to carry out extensive traffic pattern analysis and soil composition testing to even know where it should go.
We're at a point where we've already automated all the simple stuff. If you want a website, you don't type out html tags. You use Squarespace or Wordpress or whatever. If you need a backend, you use Airtable. We already spend most of our time on the tricky stuff. Sure, it's nice that LLMs can smooth the rough edges of workflows that nobody's bothered to refine yet, but the software commodities of the world have already been commodified.
Just like cooking in the middle ages. As the kitchen, hygiene, etc. got better, so did the chefs and so did the food.
This is just a transition.
re-Rest API, you're right. But again, we use roombas to vacuum when the floor layout is friendly to them. Not all rooms can be vacuumed by roombas. Simple Rest api can be emitted one shot from an LLM and there is no room for interpretation. But ask a future LLM to make a new kind of social network and you'll end up with a mash up of the existing ones.
Same thing, you and I won't use a manual screwdriver when we have 100 screws to get in, and we own an electric drill.
That didn't reinvent screws nor the assembly of complex items.
I'm keeping positive in the sense that LLMs will enable us to do more, and to learn faster.
The sad part about vibe coding is you learn very little. And to live is to learn.
You'll notice people vibecoding all day become less and less attached to the product they work on. That's because they've given away the dopamine hits of the many "ha-ha" moments that come from programming. They'll lose interest. They won't learn anymore and die off (career wise).
So, businesses that put LLM first will slowly lose talent over time, and business that put developers first will thrive.
It's just a transition. A fast one that hits us like a wall, and it's confusing, but software for humans will be better made by humans.
I've been programming since the 80s. The level of complexity today is bat shit insane. I welcome the LLM help in managing 3 code bases of 3 languages spread across different architectures (my job) to keep sane!
I disagree with the vibecoding take. Its a new skill that absolutely has a place in developers skillset and it may be of great importance for some kinds of projects. You can learn so much by vibecoding little projects that otherwise would never see the light of day.
There is a part of this that is true. But when you get the nuanced parts of every "replicated design" or need the tweaks or what the AI gave you is just wrong, that deteriorates quality.
For many tasks it is ok, for others it is just a NO.
For software maintenance and evolution I think it won't cut it.
The same way a Wordpress website can do a set of useful things. But when you need something specific, you just drop to programming.
You can have your e-commerce web. But you cannot ask it to give you a "pipeline excution as fast as possible for calculating and solving math for engineering task X". That needs SIMD, parallelization, understanding the niche use you need, etc. which probably most people do not do all the time and requires specific knowledge.
Is your argument that we only want things that are hand-crafted by humans?
There are lots of things like perfectly machined nails, tools, etc. that are much better done by machines. Why couldn't software be one of those?
> So if you're banging out the same type of thing over and over again, yes, they will make that work light and then scarce.
The same thing over and over again should be a SaaS, some internal tool, or a plugin. Computers are good at doing the same thing over and over again and that's what we've been using them for
> But if you need to create something niche, something one-off, something new, they'll slip off the bleeding edge into the comfortable valley of the familiar at every step.
Even if the high level description of a task may be similar to another, there's always something different in the implementation. A sports car and a sedan have roughly the same components, but they're not engineered the same.
> We used to worry about pointers and memory allocation.
Some still do. It's not in every case you will have a system that handle allocations and a garbage collector. And even in those, you will see memory leaks.
> Now we will worry less and less about how the code is written and more about the result it built.
Wasn't that Dreamweaver?
I think your image of LLMs is a bit outdated. Claude Code with well-configured agents will get entirely novel stuff done pretty well, and that’s only going to get better over time.
I wouldn’t want to bet my career on that anyway.
As of today NONE of the known AI codebots can solve correctly ANY of the 50+ programming exercises we use to interview fresh grads or summer interns. NONE! Not even level 1 problems that can be solved in fewer than 20 lines of code with a bit of middle school math.
After 25+ years in this field, having interviewed ~100 people for both my startup and other companies, I'm having a hard time believing this. You're either in an extremely niche field (such as to make your statement irrelevant to 99.9% of the industry), or it's hyperbole, or straight up bs.
Interviewing is an art, and IME "gotcha" types of questions never work. You want to search for real-world capabilities, and like it or not the questions need to match those expectations. If you're hiring summer interns and the SotA models can't solve those questions, then you're doing something wrong. Sorry, but having used these tools for the past three years this is extremely ahrd to believe.
I of course understand if you can't, but sharing even one of those questions would be nice.
I agree, it’s hard to believe. Hopefully the original comment author can share one of those questions.
I promise you that I can show you how to reliably solve any of them using any of the latest OpenAI models. Email me if you want proof; josh.d.griffith at gmail
I'd watch that show ideally with few base rules though, e.g.
- the problems to solve must NOT be part of the training set
- the person using the tool (e.g. OpenAI, Claude, DevStral, DeepSeek, etc) must NOT be able to solve problems alone
as I believe otherwise the 1st is "just" search and the 2nd is basically offloading the actual problem solving to the user.
> the person using the tool (e.g. OpenAI, Claude, DevStral, DeepSeek, etc) must NOT be able to solve problems alone
I think this is a good point, as I find the operators input is often forgotten when considering the AIs output. If it took me an hour and decades of expertise to get the AI to output the right program, did the AI really do it? Could someone without my expertise get the same result?
If not, then maybe we are wasting our time trying to mash our skills through vector space via a chat interface.
It's definitely scary in a way.
However I'm still finding a trend even in my org; better non-AI developers tend to be better at using AI to develop.
AI still forgets requirements.
I'm currently running an experiment where I try to get a design and then execute on an enterprise 'SAAS-replacement' application [0].
AI can spit forth a completely convincing looking overall project plan [1] that has gaps if anyone, even the AI itself, tries to execute on the plan; this is where a proper, experienced developer can step in at the right steps to help out.
IDK if that's the right way to venture into the brave new world, but I am at least doing my best to be at a forefront of how my org is using the tech.
[0] - I figured it was a good exercise for testing limits of both my skills prompting and the AI's capability. I do not expect success.
> I’m basically just the conductor of all those processes.
a car moves faster than you, can last longer than you, and can carry much more than you. But somehow, people don't seem to be scared of cars displacing them(yet)? Perhaps autodriving would in the near future, but there still needs to be someone making decisions on how best to utilize that car - surely, it isn't deciding to go to destination A without someone telling them.
> I feel like I’m doing the work of an entire org that used to need twenty engineers.
and this is great. A combine harvester does the work of what used to be an entire village for a week in a day. More output for less people/resources expended means more wealth produced.
> a car moves faster than you, can last longer than you, and can carry much more than you. But somehow, people don't seem to be scared of cars displacing them(yet)?
People whose life were based around using horses for transportation were very scared of cars replacing them though, and correctly so, because horses for transportation is something people do for leisure today, not necessity. I feel like that's a more apt analogy than comparing cars to any human.
> More output for less people/resources expended means more wealth produced.
This is true, but it probably also means that this "more wealth produced" will be more concentrated, because it's easier to convince one person using AI that you should have half of the wealth they produce, rather than convincing 100 people you should have half of what they produce. From where I'm standing, it seems to have the same effects (but not as widespread or impactful, yet) as industrialization, that induced that side-effect as well.
Analogies are not going to work. Bug it's just as likely that, in the worst case, we are stage coach drivers who have to use cars when we just really love the quiet slowness of horses.
And parent is scared of being made redundant by AI because they need their job to pay for their car, insurance, gas and repairs.
> a car moves faster than you, can last longer than you, and can carry much more than you. But somehow, people don't seem to be scared of cars displacing them(yet)?
???
Cars replaced horses, not people.
In this scenario you are the horse.
Well no, you'd be the horse driver who becomes a car driver
> Well no, you'd be the horse driver who becomes a car driver
Well, that's the crux of the argument. The pro-AI devs are making the claim that devs are the horse-drivers, the anti-AI is making the claim that devs are the horses themselves.
There is no objective way to verify who is right in this case, we just have to see it play out.
I don't really understand what you are saying... Anyways glad you got what I am saying at least
That's kind of the point of the article, though.
Sure LLMs can churn out code, and they sort of work for developers who already understand code and design, but what happens when that junior dev with no hard experience builds their years of experience with LLMs?
Over time those who actually understand what the LLMs are doing and how to correct the output are replaced by developers who've never learned the hard lessons of writing code line by line. The ability to reason about code gets lost.
This points to the hard problem that the article highlights. The hard problem of software is actually knowing how to write it, which usually takes years, sometimes up to a decade of real experience.
Any idiot can churn out code that doesn't work. But working, effective software takes a lot of skill that LLMs will be stripping people of. Leaving a market there for people who have actually put the time in and understand software.
My experience with these tools is far and away no where close to this.
If you're really able to do the work of a 20 man org on your own, start a business.
This is not how I think about it. Me and the coding assistant is better then me or the coding assistant separately.
For me its not about me or the coding assistant, its me and the coding assistant. But I'm also not a professional coder, i dont identify as a coder. I've been fiddling with programming my whole life, but never had it as title, I've more worked from product side or from stakeholder side, but always got more involved, as I could speak with the dev team.
This also makes it natural for me to work side-by-side with the coding assistant, compared maybe to pure coders, who are used to keeping the coding side to themselves.
I have been using the most recent Claude, ChatGPT and Gemini models for coding for a bit more than a year, on a daily basis.
They are pretty good at writing code *after* I thoroughly described what to do, step by step. If you miss a small detail they get loose and the end result is a complete mess that takes hours to clean up. This still requires years of coding experience, planning ahead in head, you won't be able to spare that, or replace developers with LLMs. They are like autocomplete on steroids, that's pretty much it.
Yes what you are describing is exactly what Kiro solves
> Through Kiro, we reinvented how developers work with AI agents.
Even according to it’s documentation it is still built for developers, so my point still stands. You need dev experience to use this tool, same as other LLM-based coding tools.
I am sorry to say you are not a good programmer.
I mean, AIs can drop something fast the same way you cannot beat a computer at adding or multiplying.
After that, you find mistakes, false positives, code that does not work fully, and the worse part is the last one: code that does not work fully but also, as a consequence, that you do NOT understand yet.
That is where your time shrinks: now you need to review it.
Also, they do not design systems better. Maybe partial pieces. Give them something complex and they will hallucinate worse solutions than what you already know if you have, let us say, over 10 years of experience programming in a language (or mabye 5).
Now multiply this unreliability problem as the code you "AI-generate" grows.
Now you have a system you do not know if it is reliable and that you do not understand to modify. Congrats...
I use AI moderately for the tasks is good at: generate some scripts, give me this small typical function amd I review it.
Review my code: I will discard part of your mistakes and hallucinations as a person that knows well the language and will find maybe a few valuable things.
Also, when reviewing and found problems in my code I saw that the LLMs really need to hallucinate errors that do not exist to justify their help. This is just something LLMs seem to not be accurate at.
Also, when problems go a bit more atypical or past a level of difficulty, it gets much more unreliable.
All in all: you are going to need humans. I do not know how many, I do not know how much they will improve. I just know that they are not reliable and this "generate-fast-unreliable vs now I do not know the codebase" is a fundamental obstacle that I think it is if not very difficult, impossible to workaround.
I feel you, it's scary. But the possibilities we're presented with are incredible. I'm revisiting all these projects that I put aside because they were "too big" or "too much for a machine". It's quite exciting
>> Coding AIs design software better than me
Absolutely flat out not true.
I'm extremely pro-faster-keyboard, i use the faster keyboards in almost every opportunity i can, i've been amazed by debugging skills (in fairness, i've also been very disappointed many times), i've been bowled over by my faster keyboard's ability to whip out HTML UI's in record time, i've been genuinely impressed by my faster keyboard's ability to flag flaws in PRs i'm reviewing.
All this to say, i see lots of value in faster keyboard's but add all the prompts, skills and hooks you like, explain in as much detail as you like about modularisation, and still "agents" cannot design software as well as a human.
Whatever the underlying mechanism of an LLM (to call it a next token predictor is dismissively underselling its capabilities) it does not have a mechanism to decompose a problem into independently solvable pieces. While that remains true, and i've seen zero precursor of a coming change here - the state of the art today is equiv to having the agent employ a todo list - while this remains true, LLMs cannot design better than humans.
There are many simple CRUD line of business apps where they design well enough (well more accurately stated, the problem is small/simple enough) that it doesn't matter about this lack of design skill in LLMs or agents. But don't confuse that for being able to design software in the more general use case.
Exactly, for the thing that has been done in Github 10000x times over, LLMs are pretty awesome and they speed up your job significantly (it's arguable if you would be better off using some abstraction already built if that's the case).
But try to do something novel and... they become nearly useless. Not like anything particularly difficult, just something that's so niche it's never been done before. It will most likely hallucinate some methods and call it a day.
As a personal anecdote, I was doing some LTSpice simulations and tried to get Claude Sonnet to write a plot expression to convert reactance to apparent capacitance in an AC sweep. It hallucinated pretty much the entire thing, and got the equation wrong (assumed the source was unit intensity, while LTSpice models AC circuits with unit voltage. This surely is on the internet, but apparently has never been written alongside the need to convert an impedance to capacitance!).
Try have your engineers pick up some product work. Clients do NOT want to talk to bots.
> Coding AIs design software better than me, review code better than me, find hard-to-find bugs better than me, plan long-running projects better than me, make decisions based on research, literature, and also the state of our projects better than me.
They don't do any of that better than me; they do it poorer and faster, but well enough for most of the time.
There will be a need. Don't worry. Most people still haven't figured out how to properly read and interpret instructions. So they build things incorrectly - with or without AI
Seriously. The bar is that low. When people say "AI slop" I just chuckle because it's not "AI" it's everyone. That's the general state of the industry.
So all you have to do is stay engaged, ask questions, and understand the requirements. Know what it is you're building and you'll be fine.
More than any other effect they have LLMs breed something called "learned helplessness". You just listed a few things it may stay better than you at, and a few things that it is not better than you at and never will be.
Planning long running projects and deciding are things only you can do well!! Humans manage costs. We look out for our future. We worry. We have excitement, and pride. It wants you to think none of these things matter of course, because it doesn't have them. It says plausible things at random, basically. It can't love, it can't care, it won't persist.
WHATEVER you do don't let it make you forget that it's a bag of words and you are someing almost infinitely more capable, not in spite of human "flaws" like caring, but because of them :)
Plus I think I've almost never see so little competition for what I think are the real prizes! Everyone's off making copies of copies of copies of the same crappy infrastructure we already have. They're busy building small inconsequential side projects so they can say they built something using an LLM.
> They're busy building small inconsequential side projects
Unironically, sending a program to build those for me have send me almost endless amount of time. I'm a pretty distracted individual, and pretty anal about my workflow/environment, so lots of times I've spent hours going into rabbit-holes to make something better, when I could have just sucked it up and do it the manual way instead, even if it takes mental energy.
Now, I can still do those things, but not spend hours, just a couple of minutes, and come back after 20-30 minutes to something that lets me avoid that stuff wholesale. Once you start stacking these things, it tends to save a lot of time and more importantly, mental energy.
So the programs by themselves are basically "small inconsequential side projects" because they're not "production worthy and web scale SaaS ready to earn money", but they help me and others who are building those things in a big way.
But isn't that exactly the kind of learned helplessness being discussed? As a fellow distracted individual, I have seen instant gratification erode all of my most prized hobbies and skills. Why read a book when I can scroll on my phone? My distress tolerance is lower than ever. LLMs feel like a bridge too far, for me anyway.
Nothing has been eroded for me, in fact it had the opposite effect. It's easier to get into new hobbies, easier to develop skills, I value reading on my own more than I did before. At least for me, LLMs act as multipliers of what I can and want to do, it hasn't removed my passion for music production, 3D, animation or programming one bit, if anything it's fueled those passions and let me do stuff within them faster and better.
Nothing I could make would be very good. So the only reason I would, say, write, is in order to write, not to have produced an essay. Hobbies are ways to pass time productively. If it took less time, it wouldn't be a better use of time, but a worse one.
It's not about being able to do more faster, but be able to faster get help doing what you wanted to do. For example, before LLMs, if I wanted to figure out how to do something with a specific analog synth I basically spent time reading manuals and browsing internet forums, piecing together whatever I could find into something actionable, sometimes slightly wrong, but at least in the right direction.
Nowadays, I fire off the LLM to figure it out for me, then try out what I get back, and I can move on to actually having fun playing on the synth, rather than trying to figure out how to do what I wanted to do.
The end goal for me with my hobbies is more or less the same, have fun. But for me the fun is not digging through manuals, it is to "do" or "use" or "perform" or whatever. I like music production because I like to make music, not because I like digging through manuals for some arcane knowledge.
But looking up information via an LLM is an entirely different category of usage. I have no problem with that (well, much less of a problem).