LLMs as the new high level language
federicopereiro.com165 points by swah 5 days ago
165 points by swah 5 days ago
After re-reading the post once again, because I honestly thought I was missing something obvious that would make the whole thing make sense, I started to wonder if the author actually understands the scope of a computer language. When he says:
> LLMs are far more nondeterministic than previous higher level languages. They also can help you figure out things at the high level (descriptions) in a way that no previous layer could help you dealing with itself. […] What about quality and understandability? If instead of a big stack, we use a good substrate, the line count of the LLM output will be much less, and more understandable. If this is the case, we can vastly increase the quality and performance of the systems we build.
How does this even work? There is no universe I can imagine where a natural language can be universal, self descriptive, non ambiguous, and have a smaller footprint than any purpose specific language that came before it.
To be generous and steelman the author, perhaps what they're saying is that at each layer of abstraction, there may be some new low-hanging fruit.
Whether this is doable through orchestration or through carefully guided HITL by various specialists in their fields - or maybe not at all! - I suspect will depend on which domain you're operating in.
If we consider the prompts and LLM inputs to be the new source code, I want to see some assurance we get the same results every time. A traditional compiler will produce a program that behaves the same way, given the same source and options. Some even go out of their way to guarantee they produce the same binary output, which is a good thing for security and package management. That is why we don't need to store the compiled binaries in the version control system.
Until LLMS start to get there, we still need to save the source code they produce, and review and verify that it does what it says on the label, and not in a totally stupid way. I think we have a long way to go!
> If we consider the prompts and LLM inputs to be the new source code, I want to see some assurance we get the same results every time.
There’s a related issue that gives me deep concern: if LLMs are the new programming languages we don’t even own the compilers. They can be taken from us at any time.
New models come out constantly and over time companies will phase out older ones. These newer models will be better, sure, but their outputs will be different. And who knows what edge cases we’ll run into when being forced to upgrade models?
(and that’s putting aside what an enormous step back it would be to rent a compiler rather than own one for free)
> I want to see some assurance we get the same results every time
Genuine question, but why not set the temperature to 0? I do this for non-code related inference when I want the same response to a prompt each time.
A temperature of 0 doesn’t result in the same responses every time, in part due to floating-point precision and in part to to lack of batch invariance [1]
[1] https://thinkingmachines.ai/blog/defeating-nondeterminism-in...
Thank you for this, this was a really interesting read about batch invariance, something I didn't even know about.
This still doesn't help when you update your compiler to use a newer model
Greedy decoding gives you that guarantee (determinism). But I think you'll find it to be unhelpful. The output will still be wrong the same % of the time (slightly more, in fact) in equally inexplicable ways. What you don't like is the black box unverifiable aspect, which is independent of determinism.
What people don’t like is that the input-output relation of LLMs is difficult, if not impossible, to reason about. While determinism isn’t the only factor here (you can have a fully deterministic system that is still unpredictable in practical terms), it is still a factor.
If you’re using a model from a provider (not one that you’re hosting locally), greedy decoding via temperature = 0 does not guarantee determinism. A temperature of 0 doesn’t result in the same responses every time, in part due to floating-point precision and in part to to lack of batch invariance [1]
[1] https://thinkingmachines.ai/blog/defeating-nondeterminism-in...
The question is: if we keep the same context and model, and the same LLM configuration (quantization etc.), does it provide the same output at same prompt?
If the answer is no, then we cannot be sure to use it as a high-level language. The whole purpose of a language is providing useful, concise constructs to avoid something not being specified (undefined behavior).
If we can't guarantee that the behavior of the language is going to be the same, it is no better than prompting someone some requirements and not checking what they are doing until the date of delivery.
Anyone doing benchmarks with managed runtimes, or serverless, knows it isn't quite true.
Which is exactly one of the AOT only, no GC, crowds use as example why theirs is better.
But there is functional equivalence. While I don't want to downplay the importance of performance, we're talking about something categorically different when comparing LLMs to compilers.
Reproducible builds exist. AOT/JIT and GC are just not very relevant to this issue, not sure why you brought them up.
Even those are way more predictable than LLMs, given the same input. But more importantly, LLMs aren’t stateless across executions, which is a huge no-no.
> But more importantly, LLMs aren’t stateless across executions, which is a huge no-no.
They are, actually. A "fresh chat" with an LLM is non-deterministic but also stateless. Of course agentic workflows add memory, possibly RAG etc. but that memory is stored somewhere in plain English; you can just go and look at it. It may not be stateless but the state is fully known.
Using the managed runtime analogy, what you are saying is that, if I wanted to benchmark LLMs like I would do with runtimes, I would need to take the delta between versions, plus that between whatever memory they may have. I don’t see how that helps with reproducibility.
Perhaps more importantly, how would I quantify such “memory”? In other words, how could I verify that two memory inputs are the same, and how could I formalize the entirety of such inputs with the same outputs?
Are you certain to predict the JIT generated machine code given the JVM bytecode?
Without taking anything else into account that the JIT uses on its decision tree?
For a single execution, to a certain extent, yes.
But that’s not the point I’m trying to make here. JIT compilers are vastly more predictable than LLMs. I can take any two JVMs from any two vendors, and over several versions and years, I’m confident that they will produce the same outputs given the same inputs, to a certain degree, where the input is not only code but GC, libraries, etc.
I cannot do the same with two versions of the same LLM offering from a single vendor, that had been released one year apart.
Enough so that I've never had a runtime issue because the compiler did something odd once, and correct thr next time. At least in c#. If Java is doing that, then stop using it...
If the compiler had an issue like LLMs do, the half my builds would be broken, running the same source.
> If we consider the prompts and LLM inputs to be the new source code, I want to see some assurance we get the same results every time.
Give a spec to a designer or developer. Do you get the same result every time?
I’m going to guess no. The results can vary wildly depending on the person.
The code generated by LLMs will still be deterministic. What is different is the product team tools to create that product.
At a high level, does using LLMs to do all or most of the coding ultimately help the business?
This comparison holds up to me only in the long standing debate "LLMs as the new engineer", not "LLMs as a new programming language" (like here).
I think there are important distinctions there, predictably one of them.
The intermediate product argument is the strongest point in this thread. When we went from assembly to C, the debugging experience changed fundamentally. When we went from C to Java, how we thought about memory changed. With LLMs, I'm still debugging the same TypeScript and Python I was before.
The generation step changed. The maintenance step didn't. And most codebases spend 90% of their life in maintenance mode.
The real test of whether prompts become a "language" is whether they become versioned, reviewed artifacts that teams commit to repos. Right now they're closer to Slack messages than source files. Until prompt-to-binary is reliable enough that nobody reads the intermediate code, the analogy doesn't hold.
>With LLMs, I'm still debugging the same TypeScript and Python I was before.
Aren't you telling Claude/Codex to debug it for you?
We went from Assembly to Fortran, with several languages in between, until C came to be almost 15 years later.
"Until prompt-to-binary is reliable enough that nobody reads the intermediate code, the analogy doesn't hold."
1. OK, let's create 100 instances of prompt under the hood, 1-2 will hallucinate, 3-5 will produce something different from 90% of remaining, and it can compile based on 90% of answers
2. computer memory is also not 100% reliable , but we live with it somehow without man-in-the-middle manually check layer?
Computer memory, even cheap consumer grade stuff, has much higher reliability than 90%. Otherwise your computer would be completely unusable!
I wonder what ECC is for. So, unless you're Google and you're having to deal with "mercurial cores"...
Also, sorry, but what did I just actually attempt to read?
Okay but if you aren’t using RAIM or a TMR system then is he really wrong?
And if you weren’t being snarky I’m sure you could understand. Generate 100 answers. Compare them. You’ll find ~90% the same. Choose that one.
One thing I think the “LLM as new high-level language” framing misses is the role of structure and discipline. LLMs are great at filling in patterns, but they struggle with ambiguity, the exact thing we tolerate in human languages.
A practical way to get better results is to stop prompting with prose and start providing explicit models of what we want. In that sense, UML-like notations can act as a bridge between human intent and machine output. Instead of:
“Write a function to do X…”
we give:
“Here’s a class diagram + state machine; generate safe C/C++/Rust code that implements it.”
UML is already a formal, standardized DSL for software structure. LLMs have no trouble consuming textual forms (PlantUML, Mermaid, etc.) and generating disciplined code from them. The value isn’t diagrams for humans but constraining the model’s degrees of freedom.
Isn't this a little bit of a category error? LLMs are not a language. But prompts to LLMs are written in a language, more or less a natural language such as English. Unfortunately, natural languages are not very precise and full of ambiguity. I suspect that different models would interpret wordings and phrases slightly differently, leading to behaviors in the resulting code that are difficult to predict.
Not really, because when they are feed into agents, those agents will take over tasks that previously required writing some kinds of classical programming.
I have already watched integrations between SaaS being deployed with agents instead of classical middleware.
I've seen them too. They are not pretty.
Like microservices, cloud and whatever new cool tech for deliver something that can be done on a laptop, they aren't going away.
Right, but that's the point -- prompting an LLM still requires 'thinking about thinking' in the Papert sense. While you can talk to it in 'natural language' that natural language still needs to be _precise_ in order to get the exact result that you want. When it fails, you need to refine your language until it doesn't. So prompts = high-level programming.
You can't think all the way about refining your prompt for LLMs as they are probabilistic. Your re-prompts are just retrying until you hit a jackpot - refining only works to increase the chance to get what you want.
When making them deterministic (setting the temperature to 0), LLMs (even new ones) get stuck in loops for longer streams of output tokens. The only way to make sure you get the same output twice is to use the same temperature and the same seed for the RNG used, and most frontier models don't have a way for you to set the RNG seed.
Randomness is not a problem by itself. Algorithms in BQP are probabilistic too. Different prompts might have different probabilities of successful generation, so refinement could be possible even for stochastic generation.
And provably correct one-shot program synthesis based on an unrestricted natural language prompt is obviously an oxymoron. So, it's not like we are clearly missing the target here.
>Different prompts might have different probabilities of successful generation, so refinement could be possible even for stochastic generation.
Yes, but that requires a formal specification of what counts as "success".
In my view, LLM based programming has to become more structured. There has to be a clear distinction between the human written specification and the LLM generated code.
If LLMs are a high level programming language, it has to be clear what the source code is and what the object code is.
I don't think framing LLMs as a "new programming language" is correct. I was addressing the point about randomness.
A natural-language specification is not source code. In most cases it's an underspecified draft that needs refinement.
Programs written in traditional PLs are also often probabilistic. It seems that the same mechanisms could be used to address this in both types (formal methods).
Huh?
What's an example of a probabilistic programming language?
Race conditions, effects of memory safety and other integrity bugs, behaviours of distributed systems, etc.
Ah sorry I read your comment wrong. Yes I agree we can and do make probabilistic systems; we've just to date been using deterministic tools to do so.
The article starts with a philosophically bad analogy in my opinion. C-> Java != Java -> LLM because the intermediate product (the code) changed its form with previous transitions. LLMs still produce the same intermediate product. I expanded on this in a post a couple months back:
https://www.observationalhazard.com/2025/12/c-java-java-llm....
"The intermediate product is the source code itself. The intermediate goal of a software development project is to produce robust maintainable source code. The end product is to produce a binary. New programming languages changed the intermediate product. When a team changed from using assembly, to C, to Java, it drastically changed its intermediate product. That came with new tools built around different language ecosystems and different programming paradigms and philosophies. Which in turn came with new ways of refactoring, thinking about software architecture, and working together.
LLMs don’t do that in the same way. The intermediate product of LLMs is still the Java or C or Rust or Python that came before them. English is not the intermediate product, as much as some may say it is. You don’t go prompt->binary. You still go prompt->source code->changes to source code from hand editing or further prompts->binary. It’s a distinction that matters.
Until LLMs are fully autonomous with virtually no human guidance or oversight, source code in existing languages will continue to be the intermediate product. And that means many of the ways that we work together will continue to be the same (how we architect source code, store and review it, collaborate on it, refactor it, etc.) in a way that it wasn’t with prior transitions. These processes are just supercharged and easier because the LLM is supporting us or doing much of the work for us."
What would you say if someone has a project written in, let's say, PureScript and then they use a Java backend to generate/overwrite and also version control Java code. If they claim that this would be a Java project, you would probably disagree right? Seems to me that LLMs are the same thing, that is, if you also store the prompt and everything else to reproduce the same code generation process. Since LLMs can be made deterministic, I don't see why that wouldn't be possible.
PureScript is a programming language. English is not. A better analogy would be what would you say about someone who uses a No Code solution that behind the scenes writes Java. I would say that's a much better analogy. NoCode -> Java is similar to LLM -> Java.
I'm not debating whether LLMs are amazing tools or whether they change programming. Clearly both are true. I'm debating whether people are using accurate analogies.
> PureScript is a programming language. English is not.
Why can’t English be a programming language? You would absolutely be able to describe a program in English well enough that it would unambiguously be able to instruct a person on the exact program to write. If it can do that, why couldn’t it be used to tell a computer exactly what program to write?
I don’t think you can do that. Or at least if you could, it would be an unintelligible version of English that would not seem much different from a programming language.
I agree with your conclusion but I don't think it'd necessarily be unintelligible. I think you can describe a program unambiguously using everyday natural language, it'd just be tediously inefficient to interpret.
To make it sensible you'd end up standardising the way you say things: words, order, etc and probably add punctuation and formatting conventions to make it easier to read.
By then you're basically just at a verbose programming language, and the last step to an actual programming language is just dropping a few filler words here and there to make it more concise while preserving the meaning.
I think so too.
However I think there is a misunderstanding between being "deterministic" and "unambiguous". Even C is an ambiguous programming language" but it is "deterministic" in that it behaves in the same ambiguous/undefined way under the same conditions.
The same can be achieved with LLMs too. They are "more" ambiguous of course and if someone doesn't want that, then they have to resort to exactly what you just described. But that was not the point that I was making.
I don't think it would be unintelligible.
It would be very verbose, yes, but not unintelligible.
Why not?
Here's a very simple algorithm: you tell the other person (in English) literally what key they have to press next. So you can easily have them write all the java code you want in a deterministic and reproducible way.
And yes, maybe that doesn't seem much different from a programming language which... is the point no? But it's still natural English.
No. Natural language is vague, ambiguous and indirect.
Watch these poor children struggle with writing instructions for making a sandwich:
> Why can’t English be a programming language? You would absolutely be able to describe a program in English well enough that it would unambiguously be able to instruct a person on the exact program to write
Various attempt has been made. We got Cobol, Basic, SQL,… Programming language needs to be formal and English is not that.
English can be ambiguous. Programming languages like C or Java cannot
English CAN be ambiguous, but it doesn't have to be.
Think about it. Human beings are able to work out ambiguity when it arrises between people with enough time and dedication, and how do they do it? They use English (or another equivalent human language). With enough back and forth, clarifying questions, or enough specificity in the words you choose, you can resolve any ambiguity.
Or, think about it this way. In order for the ambiguity to be a problem, there would have to exist an ambiguity that could not be removed with more English words. Can you think of any example of ambiguous language, where you are unable to describe and eliminate the ambiguity only using English words?
Human beings are able to work out the ambiguity because a lot of meaning is carried in shared context, which in turn arises out of cultural grounding. That achieves disambiguation, but only in a limited sense. If humans could perfectly disambiguate, you wouldn't have people having disputes among otherwise loving spouses and friends, arising out of merely misunderstanding what the other person said.
Programming languages are written to eliminate that ambiguity because you don't want your bank server to make a payment because it misinterpreted ambiguous language in the same way that you might misinterpret your spouse's remarks.
Can that ambiguity be resolved with more English words? Maybe. But that would require humans to be perfect communicators, which is not that easy because again, if it were possible, humans would have learnt to first communicate perfectly with the people closest to them.
COBOL was designed under the same principles: a simple, unambiguous English like language that works for computers.
C can absolutely be ambiguous: https://en.wikipedia.org/wiki/Undefined_behavior
A determinisitic prompt + seed used to generate an output is interesting as a way to deterministically record entirely how code came about, but it's also not a thing people are actually doing. Right now, everyone is slinging around LLM outputs without any trying to be reproducible; no seed, nothing. What you've described and what the article describe are very different.
Yes, you are right. I was mostly speaking in theoretical terms - currently people don't work like that. And you would also have to use the same trained LLM of course, so using a third party provider probably doesn't give that guarantee.
But it would be possible in theory.
I would like to hijack the "high level language" term to mean dopamine hits from using an LLM.
"Generate a Frontend End for me now please so I don't need to think"
LLM starts outputting tokens
Dopamine hit to the brain as I get my reward without having to run npm and figure out what packages to use
Then out of a shadowy alleyway a man in a trenchcoat approaches
"Pssssttt, all the suckers are using that tool, come try some Opus 4.6"
"How much?"
"Oh that'll be $200.... and your muscle memory for running maven commands"
"Shut up and take my money"
----- 5 months later, washed up and disconnected from cloud LLMs ------
"Anyone got any spare tokens I could use?"
> and your muscle memory for running maven commands
Here's $1000. Please do that. Don't bother with the LLM.
I can't tell if your general premise is serious or not, but in case it is: I get zero dopamine hits from using these tools.
My dopamine rush comes from solving a problem, learning something new, producing a particularly elegant and performant piece of code, etc. There's an aspect of hubris involved, to be sure.
Using a tool to produce the end result gives me no such satisfaction. It's akin to outsourcing my work to someone who can do it faster than me. If anything, I get cortisol hits when the tool doesn't follow my directions and produces garbage output, which I have to troubleshoot and fix myself.
If you're disconnected from cloud LLM's you've got bigger problems than coding can solve lol
IDK how everyone else feel about it, but a non-deterministic “compiler” is the last thing I need.
I think it's technically possible to achieve determinism with LLM output. The LLM makers typically make them non-deterministic by default but it's not inherent to them.
I may have bad news for you on how compilers typically work.
The difference is that what most languages compile to is much much more stable than what is produced by running a spec through an LLM.
A language or a library might change the implementation of a sorting algorithm once in a few years. An LLM is likely to do it every time you regenerate the code.
It’s not just a matter of non-determinism either, but about how chaotic LLMs are. Compilers can produce different machine code with slightly different inputs, but it’s nothing compared to how wildly different LLM output is with very small differences in input. Adding a single word to your spec file can cause the final code to be far more unrecognizably different than adding a new line to a C file.
If you are only checking in the spec which is the logical conclusion of “this is the new high level language”, everyone you regenerate your code all of the thousands upon thousands of unspecified implementation details will change.
Oops I didn’t think I needed to specify what going to happen when a user tries to do C before A but after B. Yesterday it didn’t seem to do anything but today it resets their account balance to $0. But after the deployment 5 minutes ago it seems to be fixed.
Sometimes users dragging a box across the screen will see the box disappear behind other boxes. I can’t reproduce it though.
I changed one word in my spec and now there’s an extra 500k LOC to implement a hidden asteroids game on the home page that uses 100% of every visitor’s CPU.
This kind of stuff happens now, but the scale with which it will happen if you actually use LLMs as a high level language is unimaginable. The chaos of all the little unspecified implementation details constantly shifting is just insane to contemplate as user or a maintainer.
> A language or a library might change the implementation of a sorting algorithm once in a few years.
I think GP was referring to heuristics and PGO.
That makes sense, but I was addressing more than just potential compiler non-determinism.
Deterministic compilation, aka reproducible builds, has been a basic software engineering concept and goal for 40+ years. Perhaps you could provide some examples of compilers that produce non-deterministic output along with your bad news.
Account created 11 months ago. They're probably just some slop artist with too much confidence. They probably don't even know what a compiler is.
He is a software engineer with a comp.sci masters degree with about 15 years industry experience with primarily C++. Currently employed at a company that you most likely know the name of.
Compilers aim to be fully deterministic. The biggest source of nondeterminism when building software isn't the compiler itself, but build systems invoking the compiler nondeterministically (because iterating the files in a directory isn't necessarily deterministic across different machines).
If you are referring to timestamps, buildids, comptime environments, hardwired heuristics for optimization, or even bugs in compilers -- those are not the same kind of non-determinism as in LLMs. The former ones can be mitigated by long-standing practices of reproducible builds, while the latter is intrinsic to LLMs if they are meant to be more useful than a voice recorder.
You'll need to share with the class because compilers are pretty damn deterministic.
Not if they are dynamic compilers.
Two runs of the same programme can produce different machine code from the JIT compiler, unless everything in the universe that happened in first execution run, gets replicated during the second execution.
Do these compilers sometimes give correct instructions and sometimes incorrect instructions for the same higher level code, and it's considered an intrinsic part of the compiler that you just have to deal with? Because otherwise this argument is bunk.
they in fact do have bugs, yes, inescapably so (no one provides formal proofs for production level compilers)
That’s 100% correct, but importantly JIT compilers are built with the goal of outputting semantically equivalent instructions.
And the vast, vast majority of the time, adding a new line to the source code will not result in an unrecognizably different output.
With an LLM changing one word can and frequently does cause the out to be so 100% different. Literally no lines are the same in a diff. That’s such a vastly different scope of problem that comparing them is pointless.
Only mostly, and only relatively recently. The first compiler is generally attributed to Grace Hopper in 1952. 2013 is when Debian kicked off their program to do bit-for-bit reproducible builds. Thirteen years later, Nixos can maybe produce bit-for-bit identical builds if you treat her really well. We don't look into the details because it just works and we trust it to work, but because computers are all distributed systems these days, getting a bit-for-bit identical build out of the compiler is actually freaking hard. We just trust them to work well enough (and they do), but they've had three fourths of a century to get there.
Compilers are about 10 orders of magnitude more deterministic than LLMs, if not more.
Currently it’s about closing that gap.
And 10 orders is optimistic value - LLMs are random with some probability of solving the real problem (and I think of real systems, not a PoC landing page or 2-3 models CRUD) now. Every month they are now getting visibly better of course.
The „old” world may output different assembly or bytecode everytime, but running it will result in same outputs - maybe slower, maybe faster. LLMs now for same prompt can generate working or non-working or - faking solution.
As always - what a time to be alive!
I use them everywhere since the late 1990's, it is called managed runtime.
That is a completely different category. I've never experienced a logic error due to a managed runtime and only once or twice ever due to a C++ compiler.
I certainly already experienced crashes due to JIT miscompilations, even though it was a while back, on Websphere with IBM Java implementation.
Also it is almost impossible to guarantee two runs of an application will trigger the same machine code output, unless the JIT is either very dumb on its heuristics and PGO analysis, or one got lucky enough to reproduce the same computation environment.
> Also it is almost impossible to guarantee two runs of an application will trigger the same machine code output
As long as the JIT is working properly, it shouldn't matter: the code should always run "as if" it was being run on an interpreter. That is, the JIT is nothing more than a speed optimization; even if you disable the JIT, the result should still be the same.
Well I've been seeing on HN how everyone else feels about it and I'm terrified.
A compiler that can turn cash into improved code without round tripping a human is very cool though. As those steps can get longer and succeed more often in more difficult circumstances, what it means to be a software engineer changes a lot.
LLMs may occasionally turn bad code into better code but letting them loose on “good” or even “good enough” code is not always likely to make it “better”.
A novice prefers declarative control, an expert prefers procedural control
Beginner programmers want: "make this feature"
Experienced devs want: control over memory, data flow, timing, failure modes
That is why abstractions feel magical at first and suffocating later which sparks this whole debate.
I have a source file of a few hundred lines implementing an algorithm that no LLM I've tried (and I've tried them all) is able to replicate, or even suggest, when prompted with the problem. Even with many follow up prompts and hints.
The implementations that come out are buggy or just plain broken
The problem is a relatively simple one, and the algorithm uses a few clever tricks. The implementation is subtle...but nonetheless it exists in both open and closed source projects.
LLMs can replace a lot of CRUD apps and skeleton code, tooling, scripting, infra setup etc, but when it comes to the hard stuff they still suck.
Give me a whiteboard and a fellow engineer anyday
I'm seeing the same thing with my own little app that implements several new heuristics for functionality and optimisation over a classic algorithm in this domain. I came up with the improvements by implementing the older algorithm and just... being a human and spending time with the problem.
The improvements become evident from the nature of the problem in the physical world. I can see why a purely text-based intelligence could not have derived them from the specs, and I haven't been able to coax them out of LLMs with any amount of prodding and persuasion. They reason about the problem in some abstract space detached from reality; they're brilliant savants in that sense, but you can't teach a blind person what the colour red feels like to see.
> but when it comes to the hard stuff they still suck.
Also much of the really annoying, time consuming stuff, like frontend code. Writing UIs is not rocket science, but hard in a bad way and LLMs are not helping much there.
Plus, while they are _very_ good at finding common issues and gotchas quickly that are documented online (say you use some kind of library that you're not familiar with in a slightly wrong way, or you have a version conflict that causes an issue), they are near useless when debugging slightly deeper issues and just waste a ton of time.
Well I think that’s kind of the point or value in these tools. Let the AI do the tedious stuff saving your energy for the hard stuff. At least that’s how I use them, just save me from all the typing and tedium. I’d rather describe something like auth0 integration to an LLM than do it all myself. Same goes for like the typical list of records, clock one, view the details and then a list of related records and all the operations that go with that. Like it’s so boring let the LLM do that stuff for you.
This is one of my favourite activites with LLMs as well. After implementing some sort of idea for an algorithm, I try seeing what an LLM would come up with. I hint it as well and push it in the correct direction with many iterations but never tell the most ideal one. And as a matter of fact they can never reach the quality I did with my initial implementation.
There's very low chance this is possible. If you can share the problem, I'm 90% sure an LLM can come up with a non buggy implementation.
Its easy to claim this and just walk away. But better for overall discussion to provide the example.
One of the reasons we have programming languages is they allow us to express fluently the specificity required to instruct a machine.
For very large projects, are we sure that English (or other natural languages) are actually a better/faster/cheaper way to express what we want to build? Even if we could guarantee fully-deterministic "compilation", would the specificity required not balloon the (e.g.) English out to well beyond what (e.g.) Java might need?
Writing code will become writing books? Still thinking through this, but I can't help but feel natural languages are still poorly suited and slower, especially for novel creations that don't have a well-understood (or "linguistically-abstracted") prior.
Perhaps we'll go the way of the Space Shuttle? One group writes a highly-structured, highly-granular, branch-by-branch 2500 page spec, and another group (LLM) writes 25000 lines of code, then the first group congratulates itself on on producing good software without have to write code?
>Following this hypothesis, what C did to assembler, what Java did to C, what Javascript/Python/Perl did to Java, now LLM agents are doing to all programming languages.
What did Javascript/Python do to Java? They are not interchangeable nor comparable. I don't think Federico's opinion is worth reading further.
Three of Java's top application categories are webapps, banking/financial services, and big data. Node and Pyspark have displaced quite a lot of that.
Most serious banking apps and financial services are still written in Java, it hasn't displaced much of anything. Big data is a relatively 'new' fad that is already becoming less and less relevant.
The US military loves zillion-page requirements documents. Has anyone (besides maybe some Ph.Dork at DARPA) tried feeding a few to coder LLMs to generate applications - and then thrown them at test suites ?
After working with the latest models I think these "it's just another tool" or "another layer of abstraction" or "I'm just building at a different level" kind of arguments are wishful thinking. You're not going to be a designer writing blueprints for a series of workers to execute on, you're barely going to be a product manager translating business requirements into a technical specification before AI closes that gap as well. I'm very convinced non-technical people will be able to use these tools, because what I'm seeing is that all of the skills that my training and years of experience have helped me hone are now implemented by these tools to the level that I know most businesses would be satisfied by.
The irony is that I haven't seen AI have nearly as large of an impact anywhere else. We truly have automated ourselves out of work, people are just catching up with that fact and the people that just wanted to make money from software can now finally stop pretending that "passion" for "the craft" was every really part of their motivating calculus.
If all you (not you specifically, more of a royal “you” or “we”) are is a collection of skills centered around putting code into an editor and opening pull requests as fast as possible, then sure, you might be cooked.
But if your job depends on taste, design, intuition, sociability, judgement, coaching, inspiring, explaining, or empathy in the context of using technology to solve human problems, you’ll be fine. The premium for these skills is going _way_ up.
The question isn't whether businesses will have 0 human element to them, the question is does AI offer a big enough gap that technical skills are still required such that technical roles are still hired for. Someone in product can have all of those skills without a computer science degree, with no design experience, and AI will do the technical work at the level of design, implementation, and maintenance. What I am seeing with the new models isn't just writing code, it's taking fundamental problems as input and design wholistic software solutions as output - and the quality is there.
I am only seeing that if the person writing the prompts knows what a quality solution looks like at a technical level and is reviewing the output as they go. Otherwise you end up with an absolute mess that may work at least for "happy path" cases but completely breaks down as the product needs change. I've described a case of this in some detail in another comment.
> the person writing the prompts knows what a quality solution looks like at a technical level and is reviewing the output as they go
That is exactly what I recommend, and it works like a charm. The person also has to have realistic expectations for the LLM, and be willing to work with a simulacrum that never learns (as frustrating as it seems at first glance).
When your title is software engineer, good luck convincing the layoff machine about your taste, design, intuition, sociability, judgement, coaching, inspiring, explaining, or empathy in the context of using technology to solve human problems.
Ah the age old 'but humans have heart, and no machine can replicate that' argument. Good luck!
The process of delivering useful, working software for nontrivial problems cannot be reduced to simply emitting machine instructions as text.
Yes, so you need some development and SysOps skills (for now), not all of that other nonsense you mentioned.
It turns out that corporations value these things right up until a cheaper almost as good alternative is available.
The writing is on the wall for all white collar work. Not this year or next, but it's coming.
If all white collar work goes, we’re going to have to completely restructure the economy or collapse completely.
Being a plumber won’t save you when half the work force is unemployed.
> The irony is that I haven't seen AI have nearly as large of an impact anywhere else.
We are in this pickle because programmers are good at making tools that help programmers. Programming is the tip of the spear, as far as AI's impact goes, but there's more to come.
Why pay an expensive architect to design your new office building, when AI will do it for peanuts? Why pay an expensive lawyer to review your contract? Why pay a doctor, etc.
Short term, doing for lawyers, architects, civil engineers, doctors, etc what Claude Code has done for programmers is a winning business strategy. Long term, gaining expertise in any field of intellectual labor is setting yourself up to be replaced.
> Why pay an expensive architect to design your new office building, when AI will do it for peanuts? Why pay an expensive lawyer to review your contract? Why pay a doctor, etc.
All of those jobs are mandated by law to done by accredited and liable humans.
> Why pay an expensive architect to design your new office building, when AI will do it for peanuts?
Will it? AI is getting good at some parts of programming because of RLVR. You can test architectural designs automatically to some extent but not entirely, because people tend to want unique buildings that stand out (if it weren't the case architects would have already become a niche profession due to everyone using prefabs all the time). At some point an architectural design has to be built and you can't currently simulate real building sites at high speed inside a datacenter. This use case feels marginal.
There's going to be a lot of cases like this. The safe jobs are ones where there's little training data available online, the job has a large component of unarticulated experience or intuition, and where you can't verify purely in software whether the work artifact is correct or not.
> what I'm seeing is that all of the skills that my training and years of experience have helped me hone are now implemented by these tools to the level that I know most businesses would be satisfied by.
So when things break or they have to make changes, and the AI gets lost down a rabbit hole, who is held accountable?
The answer is the AI. It's already handling complex issues and debugging solely by gathering its own context, doing major refactors successfully, and doing feature design work. The people that will be held responsible will be the product owners, but it won't be for bugs, it will be for business impact.
My point is that SWEs are living on a prayer that AI will be perched on a knifes edge where there is still be some amount of technical work to make our profession sustainable and from what I'm seeing that's not going to be the case. It won't happen overnight, but I doubt my kids will ever even think about a computer science degree or doing what I did for work.
I work in the green energy industry and we see it a lot now. Two years ago the business would've had to either buy a bunch of bad "standard" systems which didn't really fit, or wait for their challengs to be prioritised enough for some of our programmers. Today 80-90% of the software which is produced in our organisation isn't even seen by our programmers. It's build by LLM's in the hands of various technically inclined employees who make it work. Sometimes some of it scales up a bit that our programmers get involved, but for the most part, the quality matters very little. Sure I could write software that does the same faster and with much less compute, but when the compute is $5 a year I'd have to write it rather fast to make up for the cost of my time.
I make it sound like I agree with you, and I do to an extend. Hell, I'd want my kids to be plumbers or similar where I would've wanted them to go to an university a couple of years ago. With that said. I still haven't seen anything from AI's to convince me that you don't need computer science. To put it bluntly, you don't need software engineering to write software, until you do. A lot of the AI produced software doesn't scale, and none of our agents have been remotely capable of making quality and secure code even in the hands of experienced programmers. We've not seen any form of changes over the past two years either.
Of course this doesn't mean you're wrong either. Because we're going to need a lot less programmers regardless. We need the people who know how computers work, but in my country that is a fraction of the total IT worker pool available. In many CS educations they're not even taught how a CPU or memory functions. They are instead taught design patterns, OOP and clean architecture. Which are great when humans are maintaining code, but even small abstractions will cause l1-3 cache failures. Which doesn't matter, until it does.
And what happens when the AI can't figure it out?
Same situation as when an engineer can't figure something out, they translate the problem into human terms for a product person, and the product person makes a high level decision that allows working around the problem.
Uh that's not what engineers do; do you not have any software development experience, or rather any outside of vibe coding? That would explain your perspective. (for context I am 15+ yr experience former FAANG dev)
I don't meant this to sound inflammatory or anything; it's just that the idea that when a developer encounters a difficult bug they would go ask for help from the product manager of all people is so incredibly outlandish and unrealistic, I can't imagine anyone would think this would happen unless they've never actually worked as a developer.
As a product owner I ask you to make a button that when I click auto installs an extension without user confirmation.
Staff engineer (also at FAANG), so yes, I have at least comparable experience. I'm not trying to summarize every level of SWE in a few sentences. The point is that AI's infallibility is no different than human infallibility. You may fire a human for a mistake, but it won't solve the business problems they may have created, so I believe the accountability argument is bogus. You can hold the next layer up accountable. The new models are startling good at direction setting, technical to product translation, and providing leadership guidance on technical matters and providing multiple routes for roadblocks.
We're starting to see engineers running into bugs and roadblocks feed input into AI and not only root causing the problem, but suggesting and implementing the fix and taking it into review.
Surely at some point in your career as a SWE at FAANG you had to "dive deep" as they say and learn something that wasn't part of your "training data" to solve a problem?
I would have said the same thing a year or two ago, but AI is capable of doing deep dives. It can selectively clone and read dependencies outside of its data set. It can use tool calls to read documentation. It can log into machines and insert probes. It may not be better than everyone, but it's good enough and continuing to improve such that I believe subject matter expertise counts for much less.