LLMs are eroding my software engineering career and I don't know what to do
human-in-the-loop.bearblog.dev577 points by poisonfountain 5 hours ago
577 points by poisonfountain 5 hours ago
Wut? I pilot LLMs all day but there's no way in hell I'd agree to be at the helm of a finance product. That first pillar is still there. Maybe the author isn't aware of the impact they have, but I know, with the evidence of reverted PRs, that when I step outside my area of deep knowledge I can no longer call BS on the agents. Our most capable agent, with access to the same kind of distributed systems the author talks about, is regularly wrong, frequently myopic, and just outright dumb constantly. It's the expertise of engineers on the team that push it back on track.
Posting this under a burner so I don't dox myself: I work in FinTech on a regulated product. We have access to Mythos. Mythos identified part of our codebase that it confidently asserted was not complaint with a particular regulation and we were at grave risk by allowing it to operate the way it was.
Except this was not the case, it had of course hallucinated what the regulation actually required (I know this because the code in question had already been reviewed by human counsel). This is (supposedly) the most bleeding-edge model available.
We use a lot of genAI to help us write code, but there is no way in the mid-term we could ever rely on these tools to actually build compliant financial products. We'd have to be totally mad. Yes, lots of Fintech companies are using these agents to accelerate, but anyone who's using them to actually ship product without a human actually digging into it is opening themselves up to a world of risk.
I have worked on highly regulated areas in finance (risk). Compliance is a highly creative art, often requiring lots of out-of-the-box thinking and non-obvious solutions. The people I found worst at this were IT. They tend to over-interpret regulation, and super-restrict beyond what is needed for actual de-facto compliance.
My guess is the model makes the same mistakes as the programmers: taking 'rules' literally, unaware of sectoral joint understanding, validated interpretations and habits. (btw. this is often on the non-tech side also a difference between regulatory and legal. The former are much more result oriented while the latter are primarily risk averse.
> IT. They tend to over-interpret regulation, and super-restrict beyond what is needed for actual de-facto compliance.
IME this is less the fault of IT and more so bad auditors that won't consider, or just don't understand, what compensating controls are. If it doesn't meet their little checklist exactly, they fail the audit.
It's cause IT never has to live with the consequences of their decisions. Who cares if the other department keeps bleeding talent because you twisted the knobs so hard no one wants to work in your system?
Sounds like communication between departments sucks. If IT develops for them, you’d expect there to be a feedback loop?
Yes. Exactly. This is not a reflection of where I am now in any way shape or form. Just my observation of previous places I've worked.
Who gets in trouble if it turns out you are actually held to the literal rule?
Contrary to what you indicate rules are not declared in a vacuum, for people to read and then algorithmically 'implement'. There are many ways to interpret regulation, and there will be both accompanying clarifications, as well as compliance departments negotiating with regulators on what is an acceptable and sufficient compliance action. Then there furthermore is a risk that will be calculated vs the cost and opportunity costs etc.
As an enterprise architect, these are all part of the meetings you have with compliance when you are working on major projects. I have had the privilege of working with some excellent compliance officers, and they are the opposite of the nay-saying caricature that is often painted of them. I found these people to be extremely creative and helpful, working together towards solutions rather than stalling or nixing viable progress.
I also work in finance and my recent experience with regulators is really discouraging. DOGE wiped out a large amount of the regulators in government. It seems like most of the regulators remaining are the inexperienced and low tenure. Within the past few months we've attempted to roll out new financial products. When we attempt to send our proposal to them, they can't even tell us who we're supposed to send it to.
It doesn't feel like we're living in the same world of regulation that existed prior to DOGE.
> There are many ways to interpret regulation,
Then the rules should enumerate all the ways. From your posts, you come across as if programmers don't know what they are doing which is insulting to those who work in mission critical industries like aviation where a programmer could be criminally charged if he/she didn't implement the specs STRICTLY.
The point was about who is on the hook and why they might be less permissive.
I'm not implying anything else. I used your own "literal" wording to refer to the "more strict than yours" interpretation.
I suppose I should have used scare quotes around "literal".
'The company' would be on the hook. Inside, it might be the compliance team that signed off on the solution, but it usually is not the sort of blame game at that point. I'm not saying these scapegoat trails do not exist, but they are far less common than you would imagine if you only read about them in the press.
Company politics, feudal wars, fiefdom protections, backstabbing and outright sabotaging, now there's a daily occurrence and many minions are cannon fodder in those skirmishes, but they usually stay clear of regulatory issues minefields.
I am skeptical that developers who implement a non-compliant solution that gets a company in trouble get off scot-free.
If the company you work for actually had such a no-fault culture, I doubt you'd be criticizing programmers so aggressively for being sticklers, but would instead be trying to understand and account for the systemic factors (including human factors) behind their behavior.
That's why you work with your Legal/Compliance Team to make sure you stay in line. They can explain when a rule applies and when it doesn't. This needs the engineering side to be able to explain what's happening, and translate it into the business process as closely as possible, and the legal side to be able to apply the law to the case.
If you think rules are literal than you aren’t aware how the world works.
There’s a reason it’s called “judgement”
In your world, do subordinates ever get scapegoated for bending the rules at a boss's behest?
...And that judgement could take them literally. So what is your point?
My point was simply that it's easy to scoff at someone else being careful if it's their neck and not yours.
They could but they don't. That's pretty much the whole job. You can also appeal decisions to a more reasonable party if you draw RobotJudge3000 for your trial
IMHO even if we are using auditing tools I believe we must use deterministic tools for critical analysis like this. Such rule and pattern based systems may not scale beyond certain point but they can be accurate.
3 years max. Maybe 5 if you are lucky.The models will continue to improve. The exponential gains in compute efficiency that have been ongoing for 70+ years will continue and that will result in even smarter models. There are dramatic hardware changes in the pipeline.
But really that particular issue could have been solved by literally just telling it in a markdown file or instructions something like "verify all facts or compliance requirements with web search and include citations in responses".
This is akin to “don’t make mistakes”
“Verify all facts and compliance requirements” leaves enormous holes even if you assume the LLM has a concept of facts and requirements (it does not).
What facts? What requirements? For what industry? For what subset of that industry? For what country or countries that you will be doing business in? Are these current “facts” and “requirements” or is the LLM referencing a dusty article from 1992 for which the subject matter has been radically overhauled?
In my job I regularly see small but incredibly important mistakes like this lead to major issues. Some of those are human driven but increasingly the defense of the person responsible has turned into “Claude said it was fine though!”
It can make mistakes and will sometimes, but what he specifically mentioned was a case where it did not pull up a reference that it needed. So using a web search tool effectively would make a big difference.
It still does not rise the standard he requires which your response indicated would be easy for the model to achieve with a simple prompt.
Additionally, using a specific tool does not suddenly give the model common sense enough to say “this piece of information doesn’t answer the question of whether this solution fits in this specific industry at this time in this place”.
Stuff like that is risk tolerance... its not strictly codified and its more akin to probability. Different companies at different stages, in different industries will all interpret their risk differently... how will a smarter model improve that?
Ah yes, the magical equivalent of "you are a senior software engineer who writes bug-free code".
IME people would benefit greatly from the process, albeit tedious and time-consuming, of testing out the same prompt sequence/session with the exact same model multiple times. It becomes clear extremely quickly how capable but unreliable and inconsistent a model can be even when given the same context. If you have ever completed a long, complicated task with an agent and then lost the session and tried doing the same thing again from scratch you may have had the experience of seeing the subtle changes that come up in the model's thinking which lead it to accept or reject certain paths and ignore or incorporate prompt instructions like the one you've provided.
The classic 3-5 year window for a new technology that is uncertain and requires just a few more breakthroughs to get there...
It was my impression that a whole lot of products are only pretending to be compliant, and that it's much more profitable to operate like that.
I've worked in fintech for 30 years. I've never seen a product that was intentionally "only pretending to be compliant" with laws.
I've seen accidental non-compliance. I've seen what I would call negligent compliance, where a company attempted to be compliant but didn't meet full, correct compliance (one example I've seen is that a company assigned resources to compliance and forgot to increase resources as workload increased, causing them to be increasingly behind on compliance work), but I've never seen a company that just decided to pretend to be compliant knowing that they were not.
In my experience this is not representative of most fintechs. Of course there are both cases of real intentional noncompliance, and accidental, but by and large it seems like everyone’s trying to innovate within the law.
This makes sense because these companies want to become large companies and contract with large companies. Large companies, by and large, try to follow the law (while trying to bend it to the limit) because they're aware they have a big target on their back and no CEO wants to be on the front page of the papers for tanking a company in such a stupid fashion.
Even if that's the case, I feel like accurately knowing which regulations you're in compliance with and not is would be kind of important from a risk management perspective. From a "maximize profits" perspective (which I'm not saying is good but what you're saying you thought they operated with), you'd want to know the potential gain from ignoring a given regulation and the likelihood of getting caught (along with the cost of the punishment if that's happens). This is the kind of math that I'd expect a finance company to be pretty familiar with, and giving that up for a fuzzy "idk if we're in compliance or not" check seems like a pretty huge liability (unless there's confidence in not being liable for blindly trusting the LLM, which I hope is not the future we're headed for but I guess I can never be totally confident in us not somehow ending up with rules that defy common sense).
Companies that are growing tend towards faking compliance. Many financial rules like pci only kick in at certain scales. So a company growing very quickly will often be behind the curve but will do everything to seem like they are compliant. Then they would hire people like me to come in and make them actually compliant. More often than not, making an effort at improvement was enough to keep the ball rolling.
I think it's the same throughout startup software to be honest. It's just easier to point out when there's clear rules.
Security, GDPR, backups, build pipelines, disaster recovery, most of it will be faked, half-heartedly done once or ignored entirely.
Then there's the more abstract things like scalability, idempotency when integrating with external APIs, error recovery, accessibility, UX, etc.
Almost always that sort of stuff will have been entirely ignored, or there will be a fig leaf over a real mess of misunderstood standards or manual intervention steps.
Startup developers usually have to be generalists as they often wear many hats, so things that need deeper domain knowledge get done to a bare minimum.
The dynamic of agent codes human reviews does seem like the only sane one for the foreseeable future. Even Anthropic themselves still fall back to this.
The problem is that sucks, even if all software engineers keep their jobs and salaries, the floor is still pulled out from under us. Imagine if a surgeons job was to supervise robot surgeons from a remote computer, or a woodworker just signs off on work before the machines do all the cutting and assembly. Sure they still have important jobs in their field but the soul & humanity of their skill is gone.
"Soul and humanity" is doing a lot of work here.
Does the woodworker who shape using a handsaw use less "soul" than the one who uses a machine?
Does the musician who use a DAW and VSTs instead of analogue tape recorders create music with less "soul"?
Does the painter who buys acryllic paint instead of synthesizing their own dye from plants use less "soul"?
As technological innovation progresses, the barrier to creation falls. The process of creating something is not to be conflated with the final piece of art itself.
Not _my_ opinion, but I just wanted to share that many people (in the Midwest) do believe that anything synthetic that it not readily made from simple materials has "less soul". It's a sorta test of "if I dropped you off in the jungle, can you still produce works of soul? Or are you just another cog in the machine.".
Does the carpenter who used to build custom fit cabinets with hand and power tools put in the same creativity when he just carries around a scanner, scans the area, the customers use software to select the layout, approve the work, then the CNC cuts out the wood, then all that's left is to put the screws in the holes and go home.
This isn't like the step from hand saws to power saws, and it's disingenuous to pretend like it is. This is what the startup machine has been doing to every industry... finding... "inefficiencies" and "optimizing" them.
Your analogies are flawed. DAWs and skill saws generate nothing. They take skill to operate, and a novice cannot use these tools at all unless they know the craft.
Compare to this to prompting an LLM: “Generate a third person where game with a view from above where you can steal cars, shoot at people, run from the police, etc.” Anybody with access to the tool can do this, and the results are just another uninspiring GTA clone that you would imagine.
The latter is more like a carpenter ordering their “work” from alibaba then it is like using a skill saw.
Except it's not just a tool.
It's when a woodworker, musician or painter completely outsources their work and just marks what's wrong, sending those parts back. Yes, the final art piece might be the same, but the artist definitely uses less of their "soul".
I never found there to be much soul and humanity in the job to begin with. Coding personal projects has soul, but for me at least the demands of high-velocity sprint-based software development to match business needs removed most of the soul and humanity long before AI got good at coding. And I mean, I totally understand why it has to be like that. In most businesses, you do better by shipping decent software fast than by shipping great software slowly. I don't have a problem with that in principle. But it does mean that for me, the software development side of things has never had much soul and humanity to begin with. It was just being a glorified assembly line worker, with the sprints being the assembly line. Of course, others may have had very different experiences, but that has been mine.
For me, AIs have actually made the job more soulful, not less. For one thing, it lets me use the part of my mind that is good at human language, not just the part of my mind that is good at software. This makes the job feel a bit less one-dimensional in terms of what parts of me are engaged while doing it. For another, I find it liberating to no longer have to think much about boilerplate code or to spend time roaming around the Internet looking up documentation of various language syntax and API details, the vast majority of which are arbitrary rather than being based on any kind of mathematical beauty. For me it makes the job more soulful that I can think of the job on a higher level instead of having to spend effort on arbitrary and tedious details.
Of course there is still the question of "will the job even exist in a few years, at least for more than a relatively small number of people?". But that's a separate question. For now at least, I am finding that for me AIs have brought a lot more soul and humanity to the job than it ever had before.
I think there is a big difference between a surgeon, who is performing a specific task with a clear outcome, to a woodworker, who might produce a unique piece of art or a functional chair. I think the surgeon-type tasks will be replaced eventually. More interesting are the woodworker types, which has some similarities to SWEs.
When industrialization hit, we definitely lost a ton of craftsmanship and craftsman, but a standard Ikea chair is less likely to wobble than the average chair at a much better price (for a random example). Yes, we traded artistry for convenience, but what we really did was bifurcate our needs between "some place stable to sit" from "a beautiful chair for my home". Most people wanted the former more than the latter, and the same applies to software.
If we split the roles into buckets, many woodworkers disappeared, some became artisans, some became designers for industrially-produced products, and some catered to Luddites for a long transitional period. Despite Anthropic's claims, SWEs won't disappear in a year but over a generation or two, no matter how good LLMs become.
Obviously software is much more complicated and integrated into other elements of business, which in a way makes it more vulnerable to AI taking over and in another way will be at the mercy of larger shifts to how businesses organize human roles and responsibilities. What we call "taste" comes down to "intent" - what the hell does a company do? What should it be doing and how should it operate? These will be the only questions that matter and the one thing LLMs can't replace since they will always choose the most default path. So I think human's roles will be to inject intent/taste at different levels of abstraction throughout an organization.
After a couple of years of this their expertise will be gone too and then nobody is qualified to supervise the clankers.
I've worked on projects in the airline and health industry which are highly regulated too. The regulations can be incredibly difficult to process and implement, and make sure you adhere to everything correctly. I've been involved in multiple scenarios where people have made false assertions about compliance or lack of. I'd still place a bet that the SOA models make _far_ less mistakes than humans.
They might make fewer mistakes, but they aren't evenly distributed. They don't use logic when making mistakes, it is gaps in the training data and now large of a span they have to bridge in the latent space. Just as they aren't smart like humans, they aren't stupid like humans. Don't mistake rate for quality.
For some reason, tons of people seem to be in camps at both extremes. It's either "AI sucks don't trust it!" or "AI is so much better than humans!"
But the most reasonable take, which I'm happy to see reflected in so many comments in this thread, is… use both.
Do an AI pass, and have humans verify, and vice versa. Let the humans drive the AI. Then the unique shortcomings of each party can be covered by the other's strengths.
AI review is never going to beat a fully resourced human review.
It might beat an underresourced human review, on time, efficiency, cost metrics. But on the metric of accuracy, throwing unlimited humans at a problem will still beat throwing unlimited AI at it
That's an irrelevant comparison because cost is always a constraint, so there are not going to be unlimited AI or humans. The question is how to optimally combine them for a given cost.
> Do an AI pass, and have humans verify, and vice versa. Let the humans drive the AI.
You can do that, sure. But doing so negates any improvements in speed the LLM brought. And at that point, you may as well just do it yourself to begin with.
When Google showed up on the scene I found I no longer needed to memorize basic syntax and other such things. If I couldn't remember on the fly, i'd just do a quick google search and move on. This freed space in my mind to instead focus on bigger & better things.
I use GenAI tools when coding a lot, but I do not vibe code. I go through everything it generated, and we iterate. And yes, it doesn't save me a lot of time. But what it does do is free up mental capacity in a similar manner. But instead of syntax, it's more complicated patterns. Maybe I don't remember how to stitch something together, but i know it can be done. Instead of spending the time to look it up and then code it, I just tell it to do it for me.
Yeah, humans reviewing the AI review can only detect the false positives, where the LLM claims something is non-compliant and flags it for review/correction by a human or another agent. Human review can’t find the false negatives (true deficiencies not flagged) unless you do a full audit yourself to find whatever deficiencies the AI missed.
I feel like you're missing the point that it's more thorough to use both. Speed isn't the only factor that matters.
This makes sense, but a logical next step is to have one AI write code, and then have another AI, instead of humans, verify it.
Or are current AIs too similar for that to be fruitful?
not according. to my experience.
regulation questions. even the simple ones, AI gets all the time wrong. it wasn't Mythos, but other models like opus.
I can adjust the view on this topic if/when we get access to mythos.
>I'd still place a bet that the SOA models make _far_ less mistakes than humans.
Genuine question: your top coder seems to be producing the most error-free code from your perspective, has the deepest knowledge of the architecture and codebase, and is faster on the trigger than the others.
But your top coder has proven and verifiable dementia, where they will confidently assume the existence of apis and code that do not exist, mix up the purpose of others and forget other things, and you can't predict when and how they will introduce errors into the system or the severity of such errors.
Are you really comfortable letting this person with dementia generate most of your codebase in the airline and health industry?
I also hope you have an iron-clad agreement that prevents the model provider from doing silent updates because all your evidence of correctness you collected thus far goes out the window in that case.
Another genuine question:
You have witnessed a human coder and the AI you're using make the same important mistake. Assuming you do not have the time and resources to retrain, fine tume, and test your frontier model:
Who would you trust not to make the same mistake multiple times in the future after you have warned them that their job depends on it, the AI or the human?
Your top coder has guard rails in place to prevent him autonomously going free - right? This is how you should approach agentic development with LLMs. Like it or not, we are the final bastion, the gatekeepers. The hallucination thing I think is mostly overblown and from speaking to colleagues it seems to vary wildly depending on which model and harness you are using - always go for SOA. In the last 3 months I can count on one hand where it's done something wrong and that's primarily as I'm operating it with guard rails and giving it context.
>Your top coder has guard rails in place to prevent him autonomously going free - right?
The parent is implying they would prefer an AI when working in the airline and health industry because it makes less errors. Read the comment again.
They have not said, "Hey, I work in the airline and health industry and I'd love to use AI for a couple of the bullshit IT UIs we have as long as we can put guardrails on the AI to stay in its lane."
I asked a yes or no question. The guardrails you can put to mitigate errors are the same guardrails pre-AI for the humans (tests, regressions, reviews). If you were wary of employing a top lead engineer with verifiable dementia prior to AI for a mission critical system, logic implies you should think twice giving that much responsibility to an AI as well.
> The hallucination thing I think is mostly overblown
Can you predict when and how the SOTA model will hallucinate? Yes or no. Can you predict the severity impact of that error beforehand? Yes or no.
>from speaking to colleagues it seems to vary wildly depending on which model and harness you are using
You have partially answered my question it would seem.
> Can you predict when and how the SOTA model will hallucinate? Yes or no. Can you predict the severity impact of that error beforehand? Yes or no.
No, but the same can be said for your colleagues. You might call what the LLM does hallucinations, I'd call them mistakes. I think we have totally forgotten that humans make them all the time and are confidently wrong too.
Your original question, doesn't really get to the bottom of the point I'm trying to make, and I don't really feel it fairly represents the issue we are talking about here. They are not the same things.
>No, but the same can be said for your colleagues.
That's absolutely false. My collegues don't routinely and confidently invent apis that are not there, or spectacularly and repeatedly misunderstand the purpose of certain functions or exhibit extreme forgetfullness. Especially when I've warned them. Hallucinations and confabulations in otherwise healthy individuals are mental disorders. When I ask them why they made an certain kind of error, I can expect to get a reasonable answer. No one has uttered the phrase "Bob hallucinated again while writing those tests" when the Bob in question is a human.
Well, your experience doesn't align with mine. I have been using, and in part of an organisation that is extensively using, Claude with Opus for everything for about 3 months now and I am not experiencing the problems you describe. We'll have to agree to disagree here.
> I'd still place a bet that the SOA models make _far_ less mistakes than humans.
Well too bad, the problem is that they also produce things much faster than humans so errors will compound quicker.
This stupid argument again. The number of mistakes _does not matter_. Get. This. In. Your. Head. The predictability of the _type_ of error is what matters. For LLMs and machine learning in general the error distribution is not what you would expect and it is not possible to predict either.
In some sense, you should still act on this, since if an external auditor relies on the same stack, it'll still cause you headaches.
I use Opus 4.8 and GPT 5.5 and haven't suffered from hallucinations in months. But we also put a lot of effort into our harness.
Opus 4.8 and gpt constantly hallucinate stuff as well. If you haven’t encountered or caught it that’s something different. Of course these days it’s mostly confidently asserting a wrong thing.
Sometimes the harness can only be a human.
And this is fine. Developing new software with a really smart intern is the same, you, as an expert, need to bring your experience/expertise on the table to have everything right. Because experience needs time.
> it had of course hallucinated what the regulation actually required
Did it do the correct job once you put the regulations doc(s) in the context?
What I usually do when in doubt is challenge the AI. “Please quote the section of regulation the product is non compliant with”. It usually admits it hallucinated the whole thing.
It sometimes says that even if it hasn't though, so like everything with LLMs, you can't actually rely on that.
100%. Unfortunately those not in the depths of mission critical systems or regulated products will continue to believe that producing tons of code quickly using LLMs without humans in these systems is acceptable.
Here's an example of what we will continue to see with folks fully immersed in gen AI psychosis:
"The creator of claude code said that he no longer writes code for about 6 months and now has Claude doing all his work now. He also said recently that he no longer prompts Claude and now has it running in loops and it is self-improving itself and performing better than a human!"
If the code produced by the LLM is perfect, the LLM takes the credit. But when a disaster happens, you cannot blame the LLM and it then falls on the human who did it.
I don't think SWEs heavily vibe-coding with LLMs realize the risk in not understanding what the code the LLM being produced is doing even after generating tests (lol). We will see more of this too. [0]
[0] https://sketch.dev/blog/our-first-outage-from-llm-written-co...
Why is it such a dramatic statement for Boris to claim that he no longer writes code?
Are people on HN still typing out functions by hand one character at a time?
It would be like a developer in 2020 claiming that he only writes assembly because compilers can’t be trusted. No one is taking that person seriously. If you chose a career in tech you made a decision to work in one of the fastest moving fields in human history. Now it’s time to get over it, learn the new tools and adapt.
>Are people on HN still typing out functions by hand one character at a time?
Well I use tab completion, of course. And I copy-paste snippets from LLM more often than from SO now. But otherwise not much has changed in my career in the last 5 years. Is this different for you?
I'm not fundamentally opposed to code generation, and I use LLMs for some taks, but I don't see myself vibecoding whole pages of production code. I vibecoded a throwaway note-taking app for myself though.
> Now it’s time to get over it, learn the new tools and adapt.
If the AI is producing what you tell it to, why are you needed?
> Now it’s time to get over it, learn the new tools and adapt.
No, thank you. I have used the new tools, determined that they aren't helpful to me, and set them aside as I would with any other bad tool. I don't feel the need to let hype take the steering wheel.
> Now it’s time to get over it, learn the new tools and adapt.
Exactly. You are free to use openclaw or a coding agent to build a competing bank, hedge-fund, hospital or even a new airliner because the previous ones were built by humans. Surely an AI can do it better by itself.
So why haven't you done it yet?
> Are people on HN still typing out functions by hand one character at a time?
Yes, me. Yes, I tried LLMs for what I am doing and will try again in few months. No, there was no noticeable or clear improvement over doing it manually.
Yes, I am using some LLMs for some purposes but Claude Code had slight improvement, if any, not worth introducing proprietary dependency.
It is because HN is contrarian and behind the times.
I work at a big tech company and I don't know a single person that still hand writes code. Most people haven't hand written code for at least half a year now.
I do wonder what sort of bug is making its rounds on HN that people here find this so shocking and unbelievable.
> Why is it such a dramatic statement for Boris to claim that he no longer writes code?
Because we can actually see the disjointed slop that Anthropic produces. And when issues happen, they can't fix them for weeks on end because no one understands what code does anymore, and all of their "hard problems causing issues" they blog about are literally "if we had actual engineers this wouldn't even be an issue to begin with". Like this bullshit they had in spring: https://www.anthropic.com/engineering/april-23-postmortem
> It would be like a developer in 2020 claiming that he only writes assembly because compilers can’t be trusted.
LLMs are not compilers. For a few very obvious reasons I'll leave as an exercise to figure out
False-positive rate is so high with Mythos according to friends and other reporting I have seen.
The original Mythos release used ASan to filter false-positives so it was able to maintain a good FPR, but when Mythos moves into domains that don't have a readily available oracle to help filter hits, the result is a deluge of false bullshit.
Have you added "Make no mistakes" to the proompt? Mythos can't go wrong then, must be a skill issue.
its shocking people don't realize you're being ironic
AI cannot fail, it can only be failed
My current favorite in that area ( because I saw it in the wild ) is:
"Make it better" with no additional or reasonable previous explanation of what better might mean.
"AI will figure it out" not for pattern extraction, but for a full blown analysis with equally generic prompt all confidently stated by an executive telling people working it how it works
If you talk to it like a programmer talks to a computer, it works a lot better.
So the question remains if non-programmers will adapt, the LLMs will accept wider range of input styles, or .. its just another abstraction layer for devs to use.
I've observed this in the wild where someone is iterating with an LLM and giving it only negative feedback. For example responding to edits with "don't make it blue" rather than "keep the existing button shape, and change the color back to green".
The LLM doesn't really come back the way a human would and say "so what color do you want?".. it just, guesses. Now abstract that to more complex tasks.
I realize they’re being ironic, it’s just a poor contribution to an otherwise productive conversation.
what am i missing?
you take a spec and create tests, every little thing
you use another ai to verify these tests against the spec
you review the tests vs the spec (at one point human review)
you put the tests off limits to change / wall them.
you let the ai write the software that fulfills the tests.
there will be some gaps where you repeat the cycle above
if the tests fulfill the spec, the code will fulfill the spec
>you take a spec and create tests, every little thing
A spec detailed enough and unambiguous enough to be translated into machine execution deterministically is called code.
Unlike a compiler, AI can build with a spec that is not detailed enough or unambiguous enough: It does so by filling in the gaps with educated guesses.
This is safe if and only if you take the time to later read the output, understand what its guesses were, and judge wether they were acceptable. No AI can do this for you because the truth lies in your original intentions, which it does not have access to.
The jury is out there on how reliable and time consuming this is vs writing the code yourself; it is not immediately obvious that is faster or requires a smaller cognitive load.
Code is not a spec. It's an instruction set. It can be a spec if you try hard but that's not an inherent property of code. For example you can write code to be a compiler..that makes it a spec. But hello world is not a spec.
As for whether or not LLMs can write unit tests. The answer is yes.
Hello world is a spec. The spec says to produce the text hello world on standard output.
Try running it without a compatible ABI. See how far you get.
Not sure what the point is. We can update the spec with "in the presence of a compatible ABI".
All I'm saying is a program isn't VHS. It's a VHS tape. At that point it's largely philosophy. Can you reconstruct a VHS format from a VHS tape? Sure.
If each step requires micro-steps iterating with an LLM with human review to prevent hallucinations creeping in.. at some point you might just be better off letting the human do the work.
Particularly as tokenmaxxing has ended and people are being charged more economic prices. If the pricing 5-10x the way Uber,etc did on the path to profitability.. even more so.
IME, regulatory compliance is something you are rarely able to test for in a nice little box or with well-known suite. So there's no easy "this complies" in many situations, no matter how many lawyers, compliance officers, and llm's you run it past.
so, whats the difference to human engineering?
other than there are "internal micro feedback loops" during development?
I walked down that path for a few months. The more you constrain LLM's, the more underhanded they behave in order to produce something that satisfies all the constraints.
Doing the above doesn't actually make the model smarter, so, if it couldn't get to correct code with fewer steps, then the light you see at the end of the tunnel is an oncoming train.
This is such an abstract principle that the principle itself cannot be refuted. The plan sounds fine on paper. "Just iterate bro". But it entirely depends on what rational agents you put into the system. Obviously, if I sub in a 5 year old child everywhere, this loop breaks. Humans and AI, sometimes one is better than the other at certain things, we're still learning.
The only way to test this is to test it out, in real life. Sometimes people see results, sometimes people don't. Note that yes, I am including the entire iteration process - even after iterating, people still don't see results with AI.
I have had both positive and negative experiences with AI, over multi-week projects. But apparently on hackernews, anything positive about AI is proof that AI is superhuman and taking over, and all follies about AI are lies by stupid humans who secretly have psychological dispositions to fear AI. Sometimes the AI genuinely isn't good enough. Are we not allowed to say that now? We might not know why, but it's just the truth.
The other solution is to formally analyze the entire space of possible actions the agent can take a priori. Then yes, you can definitively say whether or not the principle breaks or not. Can you, though? Can you give a formal specification for the space of possible actions for AI and show that your loop never breaks, or breaks less than humans, or any other sensible criteria? If not, then you can't just give an abstract principle and start making inferences from that.
It’s impossible to write a spec that’s not ambiguous , complete and correct in natural languages. Thus prompts will always generate unreliable software.
Is that all that Mythos did?
Did it find any real potential issue, optimization/simplification opportunities, or sparked any thought-provoking discussion within your organization?
Or was it purely a net negative experience?
Read their comment. It's a negative anecdote surrounded by them using genAI all the time.
You're the only one coming away thinking there was a net negative experience.
In regulated industries none of those matter if the tool invents compliance issues or breaks compliance.
The only thought-ptovoking discussion should be "why the hell do we have this stochastic parrot anywhere near out codebase"
I think that what technical people fail to understand is that a lot of the time, "compliance" is not the same as a binary compiles/does not compile. For a lot of rules/regulations, compliance means "making enough effort that legal is willing to back you up".
A system which will just randomly decide to give the legal team reasons to not back you up is:
* A system whose output will get brought up in lawsuits and make legal's job harder.
* A system that will make the dev team perpetually chase its tail while it oscillates between the several different valid interpretations of the rules.
Odd take. So if it identified 17 real gaps and helped fix them, the fact it was wrong about one gap, and the appropriate humans caught it and no harm was done, the whole thing is useless?
Not saying that is the situation, I don’t know. But if “one error is too many” is your point of view… do you think the humans in these orgs are 100% perfect 100% of the time?
> So if it identified 17 real gaps and helped fix them, the fact it was wrong about one gap, and the appropriate humans caught it and no harm was done
How many gaps have humans not caught?
> But if “one error is too many” is your point of view
Yes, in regulated industries "one error is too many" is the only right approach.
Yes, humans also make errors, and there you have a range of options: from tracing and finding the causes of the error (and tightening processes) to literally jailing those responsible. Your hallucination machine will happily "identify" 17 gaps, and create 34 more. And no, there are no processes to make it better. The "make no mistakes" incantation will happily be ignored for obvious reasons, regardless of how many forms of it you throw at it.
Isn't that a net positive though? (not sure about the cost human and tech cost). I'm guessing that without using Mythos, those conversations would never have been had, and confidence in the compliance of the product would've been lower.
I love using AI tools as casinos. It's epic in helping to forge ideas and kickstart thought processes. You basically have the entirety of world knowledge at your fingertips to have a pint with.
your parent:
> the code in question had already been reviewed by human counsel
> I'm guessing that without using Mythos, those conversations would never have been had, and confidence in the compliance of the product would've been lower.
The conversations had already been had and the product made compliant. Mythos just pulled new rules out of its ass and of course the product wasn't compliant with those. So they do a fire drill and find that to be the case at great expense.
Yeah you can frame it as "more checking is always better" if you wanted but that's just the same old "other people's resources are valueless" slight of hand we see on everything. It probably was mostly wasteful work.
There's a chapter in Simple Sabotage about how to undermine a white collar organization from the inside. One of the key tactics is to hold meetings that revisit decided upon points, and to invent unnecessary process / checking.
So, in this case, the LLM's behavior was equivalent to the behavior of the resistance during WWII.
I think that book should be required reading for all engineering students.
> It's the expertise of engineers on the team that push it back on track.
But how are you so sure your colleagues are not more "expert" than you? Prior LLMs there was room for very good engineers and mediocre engineers to work together in 99% of the companies out there. With LLMs, only the "best" engineers will survive, because nobody needs mediocre engineers anymore.
This being HN, I imagine every engineer reading this thinks they are in top the 10-5% of their company/city/country, and therefore they think they are not "mediocre" engineers that can get affected by the introduction of LLMs. Statistically, they are probably wrong. So, it's all about ego. Chances are you are not a rockstar and LLMs will eventually take over your job.
As usual, the only winners here are corporations and executives. Most of us are the last monkeys in the chain, and so we'll get screwed.
The corporations and executives are already winning if you swallowed the concept of 'rockstar' engineer. Sure there are more and less experienced engineers, but even interns can and often do provide good input and spot mistakes made by seniors. The 'rockstar' engineer at most tech companies simply equates to the somewhat autistic guy with a brown nose who's working 15 hour days for a pat on the head from management (and making many mistakes in the process).
For the most part there aren’t 10x engineers
But there are certainly 0.1x engineers
There certainly are 10x engineers just that they get most of the x from turning down bad ideas and saving work.
I've long thought a 10x engineer is one with just the right amount of analysis paralysis - not too much or too little. It's not that they're 10x engineers, it's that everyone else is 0.1x due to a confluence of reasons. And the ones we call 0.1x are 0.01x.
Even if we forget "rockstar", there are certainly different levels of engineers. More experience doesn't automatically mean better either. That is not to say experience doesn't matter. It matters quite a bit. Sure , good interns can sometimes have good feedback or spot mistakes. But not consistently enough.
All of this to say that it's not just experience that makes one a better engineer.
Experience is one of the only objective signals we have, but you're right it's not the only one. I've seen plenty great junior engineers and interns, and plenty of incompetent staff/principal engineers.
> because nobody needs mediocre engineers anymore.
This is giving too much credit to LLM. I think LLMs are great and it is incredibly useful both in personal and professional settings. However, it exist on a separate plane than human workers in the tools category.
Sooner or later, people will find out that LLMs only overlaps with existing human hierarchy (e.g. junior dev X%, senior dev Y%, etc), but almost never 100%. If it was 100% to a certain position, you are probably using the humans wrong to begin with there - since humans have one of the most priced thing that I don't see an single ounce out of LLMs: initiative
It's hard to show initiative without a pulse. Most agents don't have that (yet). But can't be too hard to build.
Exactly. Same with tractors. Once they arrived, nobody benefited except Big Tractor.
Famously a net loss for humanity.
> With LLMs, only the "best" engineers will survive, because nobody needs mediocre engineers anymore.
I don't think this is true.
A good engineer doesn't have infinite throughput. In my opinion the best engineers should be constantly bottlenecked because they solve difficult problems. They don't have time for grunt work. Every company needs less than perfect engineers, AI assisted or not.
Well almost 70% of the developers in the industry can't write a fizz buzz.
But, besides coding skills (which some possess), the engineering, social, and business ones are close to non existent.
Did you pull this 70% out of your ass or from some other place? It's quite obviously not reality.
https://blog.codinghorror.com/why-cant-programmers-program/
There was also another study I cannot find where 56% of engineering graduates struggled to write a fizz buzz.
I think people highly underestimate how long is the average developer, closed in their bubbles of mostly well established software teams that forget that for each of them there's 10 software consultants in southern Europe glueing APIs with trial and error on Java 8 monstrosities.
> Wut? I pilot LLMs all day but there's no way in hell I'd agree to be at the helm of a finance product.
Dunno how much longer that is going to remain true for your specific employer - all the fintech companies I deal with personally have had some sort of AI account for their devs since last year.
Even places like jane street have employees posting blogs (one of which was on HN frontpage about 60m ago) saying they mostly direct agents.
How long do you think your specific employer is going to hold out?
Sorry if I was unclear. I don't work in finance. I do work with agents. I think expert engineers in finance who are guiding agents are adding a lot of value because of their knowledge of finance. Because I lack that knowledge of finance, even given access to agents, I would not accept a role guiding agents in a finance company because I wouldn't be able to guide the agents well and my/our output would be bad.
Unfortunately every software related industry is embracing LLM/Codegen. Your banks, fintechs, insurance. Everyone. Your concerns are the same I'm having, yet it's regularly dismissed or hand-waved away as "don't worry about it the delivery velocity/ROI is worth it"
It's not so much about velocity or quality, both of which LLM do (or will) provide.
The real question is about accountability and liability.
When a major data leak is going to happen, who will they sue or fire ? That is the value engineers provide. They understand, confirm, and take ownership.
This is what I'm wondering too. We've signed a confidentiality agreement with all the big players (as I'm sure all other companies have done), which is supposed to ensure our data is both segregated and not used for training. I don't trust these companies not to do just that; their business is in taking what we have and training their models.
Yeah, I always wonder if they do some type of obfuscation and transformation on the private data and find a way to backdoor the info without technically using it directly.
This question has been easily answered by many companies.
You, the IC, the developer prompting the code extruder, are ultimately responsible for its outputted code and its behaviour.
You may feel pressured to push out thousands of lines of code a day. You may see those thousands of lines refactored several times over the lifespan of a merge request. You may be asked to do this continue this in the long term with all the mental fatigue that entails.
When it's too much for you to sustainably deal with and you turn to using LLMs to review the code, that will still, presumably, fall on you at the end of the day.
The output is your responsibility.
Ostensibly, due-diligence should not change. But people are lazy, just as they've always been around testing/QA/definition-of-done.
I'm not even certain that laziness gets them further along than it used to; I think it's that people have not had their overconfidence painfully corrected yet. Behaviors will re-align pretty fast when people realize that no, they're not going to get away with just pressing a button and saying everything is "good". That is happening right now.
Don't worry, we can throw in all in 55 gallon drums and dump it over a cliff when the time comes.
Just having this discussion with someone about AI in healthcare and how issues are going to be handled.
If a nurse does something incorrectly, they can lose their license. Ensuring that nurse will never be a nurse again. There is a very clear path of accountability and very clear ways to mitigate it.
For instance, if a nurse is drunk and you recognize there is a pattern of people showing up drunk, you institute drug tests and breathalyzers and move on.
While we probably won't have LLM's autonomously performing procedures, they are 100% parsing documentation, reading lab results, making suggestions, etc. And right now, the burden has been placed squarely on the clinicians themselves. It'll feed them them the data, ask if they approve/agree, and then essentially wash their hands of accountability. Let's say an LLM starts incorrectly reading lab results, how is that fixed/remedied? A prompt update? Additional safeguards? Adjusting the temperature? Changing a model?
This is a far different type of engineering that still feels pretty new. Granted, I'm still an amateur in this space (I use Claude Code a decent bit), but it feels really opaque to me right.
> When a major data leak is going to happen, who will they sue or fire ? That is the value engineers provide. They understand, confirm, and take ownership.
This goes for serious incidents, disasters, outages and security breaches.
If there was an investigation and the answer was "a piece of software was vibe coded with AI" why would anyone trust the software vendor after that?
When has any company ever faced consequences from atrociously bad code leaking data or negatively impacting their customers?
Even Solarwinds is still alive.
EU companies are judged guilty of negligence because backups were not totally disconnected (even though distant site) and ransomware did destroy them.
So that is starting to dig deeper than a plain mistake. I guess we will soon-ish witness the first AI slop trial going on, this will be interesting to follow
Are banks that concerned about velocity? Because moving fast and breaking things in the banking sector can get extremely expensive. It's also not a who-gives-a-shit industry like operating a taxi service or hosting images, but a very tightly regulated sector.
I might have been a bit broad with the brush. I can't speak for banks, but I can speak for the the fintech/money-movement space (e.g. Remitly, Wise, Revolut).
It's a race to get first-to-market for backend integrations/features. It's given rise to a culture of "move fast break things" where safety is only for some core features, but absolutely not for the constellation of other services we provide. Failure rates have increased almost a percentage point since Codegen/LLM adoption was mandated from up top.
You would think regulators would be on top of this, but our industry runs on all actors "self reporting" their outages. Most don't unless they can't hide it (>1h)
'Keeping up with regulations' may as well be a separate field from the core stuff. It has the same pressures as any other development effort. Managers will want the integration to the KYC service LLM'd as quickly as possible.
Reg PRs - for the ones with complex requirements what I am seeing is that time to initial PR is very short, and a ping-pong between the reviewer and developer begins, because in my cases (not all) the developer vibe-coded parts, and they didn't really understand the requirements deeply or their code, and it takes multiple iterations for them to fix it. You can argue this is a human problem but this is the net effect I'm seeing.
I am not sure but for complex cases it seems to me that the earlier sum of moderately long PR time + moderately long review time has been replaced by very short PR time + even longer review time. I am not sure if there's a net gain in these cases. Sometimes even if the code is functionally correct, it's verbose enough (e.g., too many intermediate functions) that I think they will impact future reviews.
> That first pillar is still there. Maybe the author isn't aware of the impact they have, but I know, with the evidence of reverted PRs, that when I step outside my area of deep knowledge I can no longer call BS on the agents. Our most capable agent, with access to the same kind of distributed systems the author talks about, is regularly wrong, frequently myopic, and just outright dumb constantly. It's the expertise of engineers on the team that push it back on track.
I'd posit there's another layer. You have domain knowledge, certainly. But more valuable still is the wisdom to find more.
Anthropic and OpenAI can stick financial regulations in the training data all they want, but the AI systems will never learn to anticipate the future, or reach out to clients, partners, or regulators in complicated situations.
> AI systems will never learn to anticipate the future
Citation needed. I don’t see any reason these systems shouldn’t be able to speculate; indeed some would say that’s all they do, even about the past.
I agree with this experience. LLMs are great and save me a lot of time, but they need frequent nudges to avoid going down a completely wrong path. I just don't feel like the management dream of "every engineer has 3 agents working for them full time" is quite a reality yet. I'm not saying it won't get there, or that I feel secure being a software engineer until I'm of retirement age, but I also think it's important to understand the limitation of the tools. You do need to know your codebase. You do need to iterate on small chunks of it at a time. You do need to carefully understand every line of code you're putting into production. LLMs are amazing at generating a lot of proposals, but you need to carefully consider each one.
Most surprising to me about the article was the desire for OP's company to use AI for design docs. I feel like AI-generated design docs are some of the worst -- basically treating English as a programming language. They aren't enjoyable to read, and they often miss the forest for the trees. A human written sketch explaining why we're here and what we're working towards is still meaningful and important. If you want code-level details of every decision and algorithm, we have code for that.
I have mixed feelings on whether these documents are useful LLM inputs. I did a project where I carefully paired with Claude Code on producing a specification that another model would actually implement. I'm not sure it saved me any time, and it was very un-fun. (I kind of blame Opus 4.7 xhigh for this. It ain't speedy.) I feel like I can nitpick code to get exactly what I want, but defining exactly what I want an auto-mode LLM to go and do, in English, is much more difficult. I don't think the PLAN.md I generated would have been useful for a human trying to understand the system (too verbose), and Claude Code still made its usual mistakes that I have reminded it a billion times not to make (t.Context() in tests, not context.Background()!), so I'm just not sure it was worth it. I would say I probably wouldn't do it again in the near future. A rough sketch to get humans on board and to get the high level details worked out, written by hand, and then pairing with the LLM on actually typing in the code seems the most productive to me. But I do try to go outside my comfort zone once in a while to test the edges of these tools. They are very impressive and are worth a lot of the hype. (I know I will never write a YAML file again. I hate it more than anything, and Claude is amazing at it. But I worry I wouldn't feel the same way if I hadn't already had 8 years of k8s experience.)
Yeah I'm constantly shocked at how simultaneously smart and dumb Opus can be. It can tell me a LOT about my codebase but it will miss very critical clarifications that I begin with. And when I call it out it obviously remembered it, it just ignored it.
> I pilot LLMs all day
Love the metaphor. Planes are sophisticated machines capable of auto-piloting, but humans are still needed to ultimately pilot the beast.
You pilot LLMs all day but that might not last.
A lot of companies are investing money on “ai factories” that are join to automate a lot of software development (that is, steer LLMs) on the basis of jira tickets (or linear/trello cards or whatever).
a year ago I would have agreed, but the gap is getting smaller all the time... these things can do 90% of the work, and how many people does a company really need for the remaining 10%? certainly not as many as they needed before
The things can do 90% of the work ... but only if used by the right people.
I've seen first hand what less experienced developers produce using the same models, your 90% accuracy suddenly drops to 50%...
With opus 4.8 we're frankly aproaching the 100% of the work, but only if tasked by the right people. A decade ago I worked as an enterprise architect and left it because I preffered coding. Now I'm an enterprise architect again, and we're at the point where I've setup a Microsoft Fabric and integrated a ADLS Gen2 with a Lakehouse building Dimension and Fact tables for our Business Intelligence people with Cowork. A month ago I didn't know what Dimension and Fact tables were in a datawarehouse and now I've not only setup a flow for it I've made it more accurate than what they had before because I understood how BC365 worked and the previous consultants didn't.
We had a PoC in place to get fabric, it had like 500 hours allocated for what I did in a week with cowork, and my product is actually on secure vnet network with Azure identity security with both a test and a production environment delivering actual data.
Cowork even made the damn powerpoint slideshows for decision makers.
The single saving grace right now is that it apparently isn't easy for everyone to do this yet. But I didn't use a whole lot of my knowledge on software engineering to make any of it happen, not even the pandas and arrow code that moves the data behind the scenes. I mainly used my knowledge of NIS2 compliance and general data architecture in a step-by-step process. To me anyone with common sense should be able of doing this, and I really don't think I'm special... but then I teach other people AI at our company and they can barely get it to create a running program. Which is fine for now, but I have to work another 20ish years before I retire, and by then a lot of young people will have grown up with AI, and like I said, I'm not special. I think the only thing that differentes me is that I mash the buttons until it works but also have decades of security and compliance hammered into me.
[flagged]
"I ended up working in software development roles in the domains of finance, bookkeeping and payment processing, where I had great autonomy and a close and candid relationship with Product Managers and stakeholders.
I learnt a lot about the domain and how to effectively write programs for it: PCI compliance, double-entry ledgers, escrows, reconciliation, payment lifecycles, bank transfer idempotency, etc.
It was, then, obvious that I should focus my career on becoming an expert on that domain to stand out as a professional and differentiate myself in a field that showed signs of an increasing need for domain specialists."
The backend is the bit that "does stuff" so it's the part that needs to be correct.
He said "Last year, I got hired by a company in the finance workspace.".
My career path is suprisingly similar to the author's. Weirdly enough, what he takes as the first pillar to fall is the one I see most undamaged currently.
LLMs routinely fail at our business specifics: Local tax regulations, particularities of the accounting process, specifics of our ledger implementations. They're great at refactoring, translating between languages, tracing bugs on existing code even, but there is always many things subtly wrong iterating and expanding our domain.
This might be because the companies I worked for happen to be tackling complex domains precisely for moat-building reasons. They stay in business explicitly because there's not a book out there you can read to build a clone, the knowhow stays inside.
Also, a fintech whose managers recommend speeding up design docs with AI sounds way too careless to be in the money handling business. It's way, way too easy to end up with millions incorrectly allocated, particularly if you deal with high volumes of small transactions. These bugs are always a bitch to deal with because correcting the logic is just step one, you then have to correct all the wrongly calculated data in immutable DBs, move around the red tape and client comms, and your fix is bound to become a gotcha that new features and observability have to take into account ("remember that there's a bump in the data in february 2 because we had incident X".)
This. Once you're building something that genuinely hasn't been built before, LLMs cannot be trusted with any architectural decisions. I'm building a product based around various physics simulations, so it's purely first principles, but without active research, thinking, and challenging, it produces computational code literally hundreds of orders of magnitude slower WHILE implementing absurd fallbacks and shortcuts that effectively result in a useless calculation.
This is the case perhaps 95% of the time.
Oversight is very important, and architectural thinking cannot yet be outsourced, only execution.
I have had similar when trying it too. I couldn't even drive Claude Opus 4.7 to get PETsc to compile properly (with all the optional dependencies)
LLMs routinely fail at our business specifics: Local tax regulations, particularities of the accounting process, specifics of our ledger implementations.
This is domain expertise - software engineers are not needed for that. Ofc often senior sws are expert in it, but they aren't necessary.
Traditionally its been useful for frictionless production to have engineers to be able to do maybe 90% of their work without consulting the business experts but this is the whole crux of the moment TFA discusses - "tradition" is over.
In this new world its now the job of a senior engineer not to have this domain expertise themselves, but to know how to ensure the agents have it, or can acquire it and it be verifiably correct.
Senior engineers who hang on to the idea that their advanced business domain expertise makes them safe will soon be as dead in the water as juniors who haven't pivoted.
I can't even get Claude or GPT-5 to consistently produce good flows for common use cases, much less domain-specific shit. They have deep vocabulary though, which makes them sound better informed than they are.
They are very good at writing code and debugging visible errors- but that's like 50% the harness.
> LLMs routinely fail at our business specifics: Local tax regulations, particularities of the accounting process, specifics of our ledger implementations.
Would a skill which forces you and LLM to reach a shared understanding of the product features and the regulations those features are supposed to capture be of help here? The main idea is we provide documents to the LLM and it asks lot of questions which clear ambiguity and possible misconceptions the LLM might have. I would suggest please take a look at skills. They are really helpful.
> The main idea is we provide documents to the LLM and it asks lot of questions which clear ambiguity and possible misconceptions the LLM might have
This kind of works but the difficulty is that you have to be very explicit about everything. It was mentioned in a spec document that a particular excel file is treated as a source of truth throughout the whole company and it is treated as an append only database. The agent still decided to add a check to see if a previous row was modified. It pushed back on its decision when asked why it decided to do so. "What if someone entered it wrong and had to correct it"? Valid question but it's not my teams responsibility to check for it
This check makes sense from a traditional development view point and that's why the agent did it. I would say it's good practice too but it's beyond the scope of the project it was working on. If what you are doing is beyond the norm you have to watch out for things like this
Sure but finding their shortcomings and patching them with skills takes real trial and error. They are incapable of identifying their own shortcomings for you.
>> LLMs routinely fail at our business specifics: Local tax regulations, particularities of the accounting process, specifics of our ledger implementations.
My company also deals with a lot of complex regulations and domain-specific system implementations, which AIs used to struggle with. We were able to solve the problem with well-organized claude.md/agents.md files. On top of that we also implemented supermemory.ai, so newly made decisions are always recalled by AI agents when starting new sessions.
I always remember of the infamous Steve Jobs quote "Ideas are cheap". If execution is everything, and frontier LLMs solve execution, then ideas are the gateway to abundance now, but abundance alone does not guarantee "stickiness".
What I think is often overlooked is the human "Willingness" and "Care" of staying with the thing for the lack of a better term. What I mean by that is that a lot of people just don't care enough, or don't want to, build, maintain, and own things. Sure you can ship V1 faster, but will you remain on the grind?
I think a great example of what probably will happen is found in Suno, the AI Music thing. I don't know if y'all have tried it, but it now produces really good stuff. What's happening there? A lot of people play with their own little universe and get tired quickly, move away from it, and only a few prolific creators stay and turn it into a "job like" environment.
We may have shifted the scale and the economics of "delegation" and "execution" but I think there are still a lot of other factors to consider.
> Suno, the AI Music thing. I don't know if y'all have tried it, but it now produces really good stuff
I played with it a bit, and no, it doesn't! And I am talking as someone with limited music culture, musicians are likely to be even more critical.
For the first few tries, it sounds impressive and the tunes are catchy. It used to sound wrong in the background but they mostly (but not completely) fixed that. However, after a few dozen songs, it starts to always sound the same. It is all generic stuff, the songs tell no story, it is a bit like the kind of music that accompany corporate advertisement. You can try to be more precise in your prompt, but I never had any success, it will just ignore most of the details that could make your song interesting.
The most interesting result I had was actually when I managed to get it off rails, a bug more or less. I asked it to mix two very different genres together, and it made something unsettling in a way I don't remember hearing before. But as always, further working on it proved extremely difficult, as it always tried to go back to making generic stuff, ignoring the details you give it.
Suno can do remixes though. And it is a bit like with code. LLMs are very good at porting, when you already have something that works, it can make it work in another language. But if you just have an idea, it will screw up at anything original. If you want a LLM to implement your idea properly, you have to give it so much guidance that it amounts to writing the code yourself, while struggling with the ambiguousness of natural languages.
re: SUNO
i actually was discussing that with a guy i met the other day, an old school producer, did succesful stuff 30 years ago. He used SUNO to reinterpret old and ideas of his, in his judgement it did an excellent job and lets him create many songs daily if he want.
Sounds familliar? the good old "let AI be steered by experienced X and boost productivity".
All in all, gun to the head, i think i am so critical because to use these tools is surrendering to big corpos. It is not a democratic tool. If it was i would probably be using it. I have finally given up and started messing with local models (well, i did already with images) but general local models are useless.
OR maybe it's me? i cannot for one moment let go and converse with the machine. I can give order to the machine.
The tech is fantastic, but the fact that it's in the hand of corpos with all interests in never letting us be able to do shit without them, makes me one hundred and one percent against it.
I think this is a question of how much control the user is able to have over the end product. Music creation in particular is very difficult... I've produced music for 4-5 years, and the granularity with which one has to control the finest pieces is often mindblowingly frustrating. It takes years to develope a decent ear for mixing.
By giving up that control, you do get to a quality end result sooner, but that end result can only be an approximation to your original vision, since you're giving up the control required to shape the sound to that granular level.
Additionally, without the knowledge of how you got from A to B, you don't know what else is possible (or impossible.) In the process of doing something manually, you may stumble across a particular setting or effect that creates something you never even considered. And now, that is knowledge you can use on the next project.
Suno is completely incapable of producing heavy metal. I can't speak for other genres bc I don't listen to them, but what it produces is completely hollow and devoid of what makes metal metal. I also think most metal fans will categorically reject AI-made metal on principle.
just verified, it cant make a decent techno track, nor a drone track nor anything experimental. Its creativity is subpar, it feels like listening to a producer that knows where things go but is tired of playing, zero interest in creating/ performing, it gives off that kind of vibe
I mean, even if could produce generic metal would it produce Igorrr? Meshugga? Tim Henson? Baby Metal? All of these are driven by other things then just producing metal. I agree pure AI music would properly rejected unless there was some point to it. I could see it have some part, but then as a weird instrument. Take a model for music, randomly mutate internal weights and then let it produce a drum beat. Keep doing that unless you hit some limit and perhaps that is interesting.
Metal, punk, hardcore - any type of heavy music, really, should reject AI-made slop. If you’re a fan and/or maker of them and are not just wearing the genres as an aesthetic, you fully know they are a rejection of corporate and governmental control.
Yeah I have played with Suno a lot and I find that no matter how I change the genre, lyrics, etc. there's some underlying quality I can't quite name that my brain recognizes and quickly gets tired of. It's fun in a novelty sense, for now.
> But as always, further working on it proved extremely difficult, as it always tried to go back to making generic stuff, ignoring the details you give it.
It's like any LLM, it's not a tool for if you know exactly what you want with all these knobs and fine grained controls.
> The most interesting result I had was actually when I managed to get it off rails, a bug more or less. I asked it to mix two very different genres together, and it made something unsettling in a way I don't remember hearing before.
I don't think that's a bug or unexpected, it's what AI is good for. I do these (very) old Blues covers of modern songs and it's terrific at that sort of conversion thing.
In 2024 some people were saying, illustrators will be fine, the models can't even get the number of fingers right! They were wrong.
> If execution is everything, and frontier LLMs solve execution, then ideas are the gateway to abundance now, but abundance alone does not guarantee "stickiness".
They don't "solve" execution.
If you're willing to push them enough, and put in place the system that they can actually get working code, they can solve execution - but that IS engineering!!
They are far from doing that by default now (replacing engineering).
Maybe in 3 years. They're moving fast.
But you can't ask them to build you a better Rust compiler, sit back and watch, and get a result today.
Totally, I meant that more in the lenses of how folks are perceiving it. They solve the execution part of the "one shot" aspect mentioned in the post. You still need to do a lot of plumbing, orchestration, supervision, etc. I think it will get cheaper and cheaper over time, though not magical enough to one shot a Rust compiler from "write a Rust compiler make no mistakes" haha.
Today is when ground needs to be broke on the data centers to run it in 3 years.
Suno is a good example. I've written lyrics for a lot of songs and then "produced" them with Suno, a process that involves dozens to hundreds of remix/cover/extend revisions or a lot of time in their editor to get it sounding the way I want it to. The songs are songs that I like and will listen to in my playlist but they haven't gotten much traction on Suno's algorithm. I haven't tried to promote them much elsewhere either but when I have posted them they get a few likes at best. I'm not disappointed because I was creating the music for myself and just sharing it as a side effect but what I take away from this is that getting people to pay attention to and enjoy something that you've created takes a lot of work. You have to market it, get it in front of them, get them to pay attention to it and I'm convinced you also need to give them a reason to like it by associating it with something whether that's a video, a story, a persona or some other vibe. If you want it to "stick" you need to do all of that over and over again for the same audience so that they learn it.
That is what takes determination and why you have to really care about the thing you are trying to sell to people. You have to stick to it before they will stick to it.
Same here, I vibe coded my perfect alarm & reminders & productivity app for Android, (Promptly AI link below) that does TTS and Gemini calls and other things that rapacious alarm-clock marketing masters charge dozens of bucks per month for, but at some point the day job and dislike of the marketing grind is just too much, summer is here and yeah...
https://play.google.com/store/apps/details?id=com.sixteenam....
> I always remember of the infamous Steve Jobs quote "Ideas are cheap". If execution is everything, and frontier LLMs solve execution, then ideas are the gateway to abundance now, but abundance alone does not guarantee "stickiness".
https://x.com/chamath/status/2033385903520129161
> I think a great example of what probably will happen is found in Suno, the AI Music thing. I don't know if y'all have tried it, but it now produces really good stuff. What's happening there? A lot of people play with their own little universe and get tired quickly, move away from it, and only a few prolific creators stay and turn it into a "job like" environment.
https://en.wikipedia.org/wiki/Sturgeon%27s_law
Sturgeon's law states, "Ninety percent of everything is crap". The adage was coined by American science fiction author and critic Theodore Sturgeon while defending the merits of the genre. Sturgeon observed that most works in any field were low quality. Therefore, science fiction was not uniquely inferior.
Could you elaborate on the AI Music tool? My impression was that it's used as a one-shot generation tool. I don't know much about music but I imagine artists need intermediary steps, track separation, instrument customization and other stuff I'm oblivious about. Without these, it's hard for me to imagine it being used for professional work.
The frontier music models, the paid/pro Opus 4.8 equivalent ones, are more capable now, and Suno has a "harness" like Claude Code on their Studio tool that lets you iterate on the generation by doing stem splitting, track separation, edits that stay within the tempo, rhythmic structure, etc.
I guess we have very different ideas around what makes good music. Every single Suno produced song sounds like a 60kbps extremely compressed mp3 while also having extremely generic, uninspiring structures and complete lack of interesting sonic/instrumental layers.
It's great that people find joy in it, but as someone that is critical of both music production and fidelity, the current offerings fall incredibly short of anything I would ever want to listen to.
Sumo produces plausible cheesy stuff that is otherwise sonically awful, ringing alongside the full spectrum due to how it works. As a musician I would not use it - I like to keep some creative power. Some people use it around me for samples… and then their tracks ring. But it works for them as they be advertising producers. Mind u - I’ve used paid version and I know one or two about music production.
As an information architect I find it amazing it works so good, but is useless to me except being a great think to play with… a toy really. I’m much more fascinated by Strudel.cc and LLMs do a great job to educate me into it, myself being mostly an autodidact.
As a dev I struggle to maintain coherence with Claude Code even though I’ve piped more than 10b tokens since Jan. Certain trivial stuff is easily remedied but even more devil lives in abundance of details now. So the task moves one level above in terms of abstraction, but is not solved.
If guys were good at typing one and the same thing in one and the same lang, which is nothing wrong about given how crafts went for ages, then they will be struggling to compete with the GPTs. But if they are in the architectural and operational perspective … well - work and demand just increased, so please stop whining.
> I don't know if y'all have tried it, but it now produces really good stuff.
Does it? It produces passable stuff that is fine. However the lack of passion and care completely disinterests me.
Code has never been the execution of the ideas is cheap mantra.
It is the whole business flow chain of value to the end user what is valuable.
> I don't know if y'all have tried it
No. I assumed that at best it will be not better than average human-made music available to listeners.
> but it now produces really good stuff.
Does it? Do you have examples?
(note: I actually do not care about all "hand-made" and have no preference for once-off over serially made products)
suno produces 7m "professional" songs per day. Can't think of a better example of a slop generator. Many songs that will never get more than a handful of listens if it all.
True of human-made things as well. Most video essays don't get more than a dozen views, most gameplay streams similarly. People playing their guitar and uploading, same. SoundCloud, YouTube, twitch. Human-made app store apps is the same story. Most are not downloaded by even 100 people. Most Github repos don't even get a handful of stars.
LLMs don’t “solve” execution at all. They aid and accelerate it.
Don't kid yourself.
The high watermark of what can be "solved" (read: one shotted) is rising, and will continue to rise. Look at the gig economy (Fiver etc) for simple programming/design tasks, LLMs have taken over completely with their execution.
Agents are getting good but professing they are surpassing you in domain and architectural knowledge with no special prompting is basically self reporting at this point. That could be your job wasn't that complicated or your personal knowledge wasn't that strong, either way, same result.
Don't get me wrong, I am sure we will get to all three of these pillars, probably by next year. I am not naive.
I've posted this before but worth posting again:
I work in DevOps at a firm that has been very enthusiastic about using LLMs (in the good sense).
The phases were basically:
- try out having the LLM do "a lot"
- now even more
- now run multiple agents
- back to single agents but have the agents build tools
- tools that are deterministic AND usable by both the humans (EDIT: and the LLMs)
The reasons:
1. Deterministic tools (for both deployments and testing) get you a binary answer and it's repeatable
2. In the event of an outage, you can always fall back to the tool that a human can run
3. It's faster. A quick script can run in <30 seconds but "confabulating" always seemed to take 2-3 minutes.
Really, we are back to this article: https://spawn-queue.acm.org/doi/10.1145/3194653.3197520 aka "make a list of tasks, write scripts for each task, combine the scripts into functions, functions become a system"
-- END of original post --
What I would add:
if you let LLMs do whatever they want, they will happily make code. You can add tests to confirm that the tests work (which you used to do with human code, right?). You can also read the code.
When you read the code, you'll find that they sometimes do totally bananas things that still produce working code (I've seen humans do this too but that's another story).
In other words, you still need to make sure the system being built makes sense.
More succinctly:
Coding may be dead but software engineering is alive and kicking.
Domain knowledge and architectural skills are not gone. I can say even Opus 4.7 and GPT 5.5 get domain-specific stuff wrong. I use both, because when I am not sure I ask both and also check with Gemini. But these days, I ask those even when I am sure - its like I get something confirmed from a peer. And yes, you have to be the gate keeper - the speed breaker in a way - LLMs still lack a lot of context. And even if they get more context, they will end up costing a lot and still have no accountability. In accounting, one wrong entry and the whole system can be seen as "unreliable" - thats why you are needed. The interesting part is "who takes over" - accountants who become coders, or coders who become accountants. And the latter looks more likely, in any profession. And when that happens - the bar will be raised in these other white-collar professions too, just like what happening in tech.
Opus is getting good at architecture - I need lesser "pushbacks" either because I have learnt to say the right thing or it has learnt to do the right thing - I do not know which one.
> I don't know what to do.
Ride the wave. You rode it when websites/webapps were the wave. I came into software industry before internet, kept changing my horse. You are never too old to learn new tricks. The new wave create new kind of work and workers. Be one of them. Ride the beast, master the tools. It's the same game again.
This here.
Overall society feels more turbulent, but this is otherwise all the same song and dance all over again.
The 90s and 00s had this wave of "object oriented programming changes everything". Hey we're doing this thing that's been done successfully 100s of times before, but now it's OO. Writing some code in involving an airplane? Just purchase this omni-airplane object that does everything for airplanes (an actual thing I was told in college).
That's weird OO isn't the be all end all? Code gen, get this Ruby on rails running. Look at me building this website in two seconds. Code gen everywhere.
Huh, that's going to a funny place... TDD. If you aren't TDDing then you're such a bad engineer that you should be locked in prison (real conversation I observed). Oh wait, not TDD, BDD. That fixes it.
Lean, no Agile, no agile like with a small a ... but it was first, no scrum, no xml wait that was last decade, json, and finally SAFe.
Hey, have you seen this chat bot thingy?
Every iteration brings good stuff if you're paying attention. But it also brings a lot of hype and anxiety. Experiment and learn.
The one thing that's remained constant for me is that nearly everyone would rather die than to think carefully about the consequences of their dreams coming true. And as long as that remains true they'll continue to pay for someone else to ride the hype dragon on their behalf.
> Overall society feels more turbulent, but this is otherwise all the same song and dance all over again.
The thing is... everything you mentioned had only brought the need to retrain.
This new hotness AI? It's bringing actual layoffs, and not just of the boom bust cycle kind, but permanent, industrial-revolution kind that lasts for decades.
It is?
Covid overhiring, no more 0% interest rates, that one accounting change, and companies needing a "growth" sounding way to announce layoffs. Maybe that's bringing actual layoffs in the name of AI?
> The company is now hiring again for a few roles and domain familiarity is not a strong differentiator anymore. We used to list "Software Engineer - Area". Now it's just "Software Engineer" and the team assignment comes after the offer is accepted.
> Of course, this is good for brilliant engineers that never had the chance to get deep into the domain and now have better chances at getting a job, but it's also sad to think that other brilliant engineers that spent their lives collecting domain knowledge are now competing on the same lane.
If the author's vision of the future is correct, then competent software engineers are safe. Domain knowledge can be learnt much quicker than how to apply good engineering principles.
Engineers whose main competitive advantage is domain knowledge are probably not that brilliant at engineering. They might still find employment in other areas of the industry where they accumulated domain knowledge.
> Domain knowledge can be learnt much quicker than how to apply good engineering principles.
There was an entire thread a week ago about how domain expertise has always been the real moat: https://news.ycombinator.com/item?id=48340411
And I'd still question it. The experience of just… knowing how a good architecture looks like without being able to really put it in words is what makes a good engineer to me. These people can pick up relevant regulations or industry terms and deliver value quickly enough.
> If the author's vision of the future is correct, then competent software engineers are safe. Domain knowledge can be learnt much quicker than how to apply good engineering principles.
I think this is true in some things and less true in others.
It's a pretty high moat getting into stuff like simulation software because the people working on numerical methods overwhelmingly have PhDs and it's a mixed skill set. Domain expertise here requires you to know maths to a high level. Even mechanical engineers often struggle here; it's often applied mathematicians and physicists turned devs that work on this stuff.
I worked on a fairly gnarly signal processing thing a while back that required bringing together knowledge of physics and software and maths and I found explaining it to people was tricky as their eyes glazed over at some point because their knowledge typically only covered one part of those.
How is "without being able to really put it in words" a mark of experience? Surely an engineer should be able to justify why an architecture should be arranged the way it is!
There are plenty of deeply skilled, experienced people (in all fields, not just ours) who struggle to explain that knowledge to other. Being a practitioner and being a teacher aren't the same skill.
It's perfectly possible to put that sort of knowledge into words, but not in a condensed "recipe" that can be explained in a meeting, that will go into a single Hacker News comment, that will cover all cases, or that will satisfy LLM users looking for the easy way out.
Pretty much every area of knowledge is full of those. That's why people publish books, that's why people go to college or get PhDs, that's why people with experience gets hired.
You're not wrong that a rationale is required.
But the master knowing when to break the rules because of tacit knowledge without being able to explain it is a real effect
>>"Domain knowledge can be learnt much quicker than how to apply good engineering principles."
I'm not sure that's universally true. Good software engineers who are arrogant about easily acquired domain knowledge have been the downfall of many an ERP system.
There's SO much IT that's literally all about putting business rules into the system.
> Good software engineers who are arrogant about easily acquired domain knowledge
This is a problem of arrogance, not of domain expertise.
Having worked in a few different industries, I'd wager that for the vast majority of them, a competent person can probably learn 80% of the required domain knowledge in under 6 months. For the latter 20%, as long as the person is not arrogant, they will seek help from colleagues who have been around for longer.
On the other hand, solid engineering principles will take 10-15 years of actually experimenting and learning in practice what makes a system resilient and durable.
> Domain knowledge can be learnt much quicker than how to apply good engineering principles.
Partially disagree. Broad-strokes domain knowledge can be learned quickly, but honing that domain knowledge with nuance and consideration for complexity, particularly for organisations that are unique and are not often thought of as 'software development houses', can take years if not decades.
Yet I still see (and code review) 'professional' software developers that don't follow good software engineering practice.
> Engineers whose main competitive advantage is domain knowledge are probably not that brilliant at engineering.
The same is also true of engineers without domain knowledge, certainly in my experience. Maybe we just got unlucky...
>Domain knowledge can be learnt much quicker than how to apply good engineering principles.
Can it? I'm of the opposite opinion. You can improve methodology much faster than gaining specialized knowledge.
You can enforce and fast-track the former because it's a matter of approach.
The latter is subject to the person's learning affinity, capacity and availability at the time and can't be forced beyond reasonable facilitation. It also builds on itself, with the corollary that there's a much steeper curve early on.
The development and acquisition of valuable domain knowledge is a hard, risky, expensive and slow process. Because the valuable domain knowledge isn't yesterday's, it's today's and tomorrow's. In fields where domain knowledge matters, it is also deeply intertwined with engineering - you won't task Jeff Dean to develop Unreal Engine from scratch.
With that said, there are still many SWE principles that are not fully internalized or adequately practiced by domain knowledge experts, and that will remain the case as much as domain knowledge remains valuable, because software engineering is yet but another domain.
This same complaint comes up on the topic of generic coding interviews, although shadowed behind the bigger complaints about simply disliking them. When people develop domain expertise they want to use that as a moat around their job. They want interviews to focus on stories about the things they’ve been exposed to on their past jobs, not test their abilities.
If you’ve been lucky enough to get jobs that expose you to the right things then you have a big advantage when the interviewers are looking for those specific things instead of your generic abilities or potential. It feels nice because you’re competing against a much smaller pool of people.
Unless you are not lucky enough to have been exposed to those specific domains yet. You can be a great engineer and even someone who learns quickly, but if you can’t point to the lines on your resume that match the job description then nothing else matters when the interviewers are playing experience bingo with your resume.
The move to generic coding interviews changed that. It was no longer enough to say that you had exposure to a topic at a past job. You had to show your coding skills, too. It wasn’t enough to ride on your credentials any more, which was highly frustrating to the well-credentialed.
However if you didn’t have the exact experience then the world of job opportunities becomes much larger. The people I know who like coding interviews the most (other than the rare competitive programming enjoyer) are people who are highly talented but came from less credentialed backgrounds: They don’t have an amazing university on their resume, they had to work at some company you’ve never heard of in their small town, but they are great at programming and just want a chance to prove that so they can move up to better companies. They’re never going to be picked by a company that’s looking for exact domain experience, but as companies open up job listings to people without that exact experience they have a chance to prove themselves.
The other people who relied on that domain experience to lock other candidates out of the hiring process don’t like it at all, though.
How is software engineering not a domain? If other domains can be easily learnt, sure this one can too
> Domain knowledge can be learnt much quicker than how to apply good engineering principles.
What kind of domains did you have in mind?
That's the right question. I don't like this dichotomy between domain and engineering. It seems to come from people who just build different CRUD apps for mobile and websites for businesses in different industries and that's what they call domain.
Not like a webdev entering game engine design or a database engineer entering computer vision research, or someone working in embedded hard-realtime systems switching to making video editing GUIs.
That's a fair question. I suspect highly specialised industries are harder (rocket and space, defense, nuclear, etc), but for things like finance (most of it, anyway) and retail, which IME make the bulk of the tech jobs out there, it's certainly nothing out of this world.
That's an extraordinarily rosy view of the future.
I'm old enough to remember the dot-com crash, specifically the years afterwards. In 2002-2003, the unemployment rate of software engineers was something like 40%. In fact, the only reason it wasn't higher was because of the number of people who had permanently left the field to become plumbers (or other trades).
I think this is going to be worse. In the dot-com crash, what really happened is that non-businesses got funded and it basically the capital markets ceased to function to a large degree. That's not what's happening now. Yes, huge amounts of money are going into AI companies but the change is more structural.
Other industries have gone through this. In the 1980s a bunch of industries were intentionally destroyed or offshored in areas that have never recovered. This has continuing social, economic and political impacts. I think people are being naive here thinking this can't or won't happen in tech.
> I think people are being naive here thinking this can't or won't happen in tech.
What would this future look like? Software developer salaries burrowing into the ground?
There's quite a large cushion for software salaries to decline before permanent structural unemployment were to set in.
It's not really feasible for "normal" businesses to hire developers at current salaries.
Tech companies will probably shrink in headcount, but all the non-tech kind of businesses can increase developer headcount.
Current Tech salaries are far above other fields while requiring (used to) significantly less training or time investment to get into.
Phase 1 is more likely that software comp will normalize with other professions, and more hiring will happen at the fringes rather than being concentrated in a few big companies.
That isn't going to happen. To me what is going on is that no one really reads anything positive so there is all the incentives to write as hyperbolic + negative as possible to try to rise through the noise.
The reality is this all the standard lump of labor fallacy. I am not a software engineer but it is obvious to me at some point I will be using claude code or whatever to automate tasks. I won't be taking software engineering jobs, I will be using code to do what is done manually today that you wouldn't bother paying a software engineer to handle.
Today's software engineers will just be higher up the stack from me the same way they are today.
In 20 years, many of us will be working in sectors of the economy that don't exist today.
The idea we get something as powerful as AI and it doesn't create new businesses and sectors is just stupid.
Imagine telling someone in 1997 they are going to be getting deliveries from Amazon all the time in the mail. What kind of idiot would believe this? I don't even read that many books!
There will be a handful of people who make stratospheric compensation, a bit like we have now.
Everyone else will have extreme job uncertainty, getting laid off multiple times, losing compensation as a result (ie equity vesting) with compensation that at first stagnates and then starts to slowly decline in real terms.
A lot of the big tech companies will likely spend less effort on non-core activities. Think of all the things Google does. Anything that's purely internal will be gutted staffing-wise because it's the safest testbed for shifting the engineer-AI balance on teams before rolling it out further.
If you listen to non-tech people now you hear tales of applying for hundreds of jobs and getting no response. That will become more normal. What's worse is that AI seems to be to blame here. Companies all use the same AI ATS systems and I've seen allegations that candidate scoring gets cached for upwards of a year. So if the system happens to give you a bad score, literally nobody will see your application because you'll get filtered out before any human sees you.
I was watching a VC give a talk from some conference in France and the general sentiment is that no companies are being funded with teams greater than 5. Why? AI. So don't think you can startup your way out of this slump unless you're somebody who has the connections and CV to get funded anyway, in which case you might well have some of those stratospheric options anyway, at least for now.
This is exactly opposite my experience in my 25 year career.
The best people I've worked with were the people who learned the ins and outs of the business they were making software for, not the people who learned how to write code really well or read logs or learn software architecture patterns. Those people (and I've been one of those people) often go around looking for nails for their hammers rather than really focusing on the customer need.
It takes a really sharp brain to pick up and learn an area of expertise that has nothing to do with software development, and figure out how software development makes that domain better.
Only if the domain is shallow and mostly digital.
Applied to real world complex businesses good luck.
> Maybe I should consider transforming my woodworking hobby into a profession...
Whatever your feelings on the future of the industry are, it's hard to imagine you'll find more professional success in artisan woodworking than artisan software.
Custom furniture/cabinetry is already a pretty tough market, and woodworking is such a common programmer hobby that if a significant chunk of us decided to make a go of it the market would get heavily oversupplied pretty fast :).
I’ve had people tell me I should try selling some of the furniture I make and my response is always that I made the mistake of turning a hobby into a career once, I don’t intend to make that mistake again, and at least software still pays pretty well.
I'm threading this now and have paired AI-assisted development with woodworking knowledge. Partially chose to work on this because I wanted to build in a domain that the models might have a tougher time understanding.
Parallels and interests overlap everywhere between programming and woodworking; decisions about tooling, tolerances, sequencing, and what can be easily fixed later.
The models get rectangles pretty well and has been fun exploring a parametric casework planner for my own shop.
Depends what you mean by woodworking
I work with a guy who does decking (gardens, caravans, etc) and builds sheds, fences, things like that and he does very well indeed (he's also incredibly good at it to be fair)
Most people would just call that construction
Construction working entirely with wood
If only there was another word for that...?
Wood construction is not typically considered woodworking, although there is often a lot of overlap. But the skills needed to make furniture are pretty different from the skills needed to make decks, fences, etc.
Carpentry is not the same thing as woodworking, to be fair. The latter has the connotation of making furniture, trim, and other such items that people want to look nice. Carpentry does not necessarily have that connotation. It's a kind of "all squares are rectangles, but not all rectangles are squares" situation.
I have a historic house with a hand carved/ uniquely shaped door. The jamb rotted and we paid a woodworker $4k to create a replacement. The door itself would easily cost $25k to replace. So, move to a major historic area with hand carved doors and you could make some decent money.
You assume there are enough of these jobs to go around and that you can just show up and do some extremely intricate work. Repairing historic doors and more elaborate woodworking isn’t easy to learn as the knowledge mostly doesn’t exist online anywhere, I also own a historic house and often ask the top tier LLMs for details e.g. about my staircase, they always give wrong answers as this knowledge is simply too exotic and not in their training set. And no one online talks about these things, 99 % of woodworking videos on YouTube are focus on beginners, you can’t replace a professional education watching videos and reading books. That will protect woodworkers with these skills of course but it’s wrong to assume you can just break into this market and be successful, most devs with woodworking hobbies are really shit at their craft and struggle to create even a regular elaborate cabinet, no way they will be able to compete with good craftsmen for these few lucrative projects.
look at layoffs.fyi. chances are he will be laid off pretty soon. and if not tomorrow, give it couple extra years until AI gets even better. it is one-way road, down the hill.
not woodworking. farming. get a pot of land and grow your own food. do not participate in economy at all. that's the only survival.
My comment was about the fact that even if you're laid off, you're more likely to find success in artisanal software than artisanal woodworking. That statement is not an assertion that you're guaranteed success, just that it's more likely to sustain yourself than woodworking is.
Layoffs also don't really tell you anything. Is it actually LLMs that are causing layoffs or is it deteroriating economic conditions and uncertainty amidst war, oil shocks, etc.? Is it junior employees being laid off, or seniors? If it's the former, someone with 10+ years of professional experience might not have reason to be concerned. I happen to believe that, LLMs or not, the software development field already had far too many jobs, employing a large number of clueless people who contributed somewhere between zero and negative value to their organizations, and that it was overdue for a correction anyways.
true, agree on later point.
but for "woodwork" / personal-farm still belive he is better off than software. at least he will be employed and have food on the table.
You are not allowed to have land without participating in the economy. The government forced you to acquire land by buying it, and to pay taxes in dollars.
I mean you can sell surplus to market. but key point you do not pay taxes on food you grown on your backyard and eat yourself! nor you are subject to any market collapse. as long as sun shines and raind pours, your food grows in your backyard, no matter S&P or inflation rates.
> get a pot of land and grow your own food
Rejecting industrialized society is actually very expensive
"industrialized society" just rejected 160,000 sofware engineers this year. other industries are no better. you are either wage-slave barely making it. or getting laid off, as those people are not needed.
There were layoffs yes. The solution is to pursue something that on a small scale is net negative financially? Get out of here.
Artisanal food for hipsters is always going to be a market. People are willing to pay a premium for locally or regionally grown produce, fruit, eggs and meat.
However, it's a risky business so I'd only recommend getting started if you either (!) are FIRE already even after sinking 3 million bucks into purchasing land and machinery as well as constructing all the buildings or if you join a cooperative/union or if you got experienced farmers in your family.
Everything else - especially following "prepper" influencers shilling books and holding more public speeches to shill for said books than they are actually working on their farm - is a recipe for certain disaster.
If in doubt... first try raising a few dozen chickens in your yard as a starting point.
> Whatever your feelings on the future of the industry are, it's hard to imagine you'll find more professional success in artisan woodworking than artisan software.
A small percentage of the market, maybe a fraction of a percent, are still willing to pay for hand-built goods - bonus if it's thoroughly modern but retro (steam-punk keyboards, maybe).
Exactly zero percent of the market is willing to pay for hand-built software.
> Exactly zero percent of the market is willing to pay for hand-built software.
You took this statistic out of your rear end?
It is fairly obvious that the majority of people who buy software (>99%) don't really care how it's built. They care a lot about the outcome of using it, they care a little bit about whether there are bugs or not, and they care about the cost a lot, but beyond that nothing seems to matter to the purchaser. Even obvious things like whether or not there are tests, documentation, SLAs for fixes, or backwards compatibility between versions don't really seem to matter much.
That doesn't mean you couldn't carve out a niche providing hand built software to people it does matter to, because the software industry is large, but saying 'zero percent of the market isn't willing to pay for it' isn't really wrong. It's just a rounding error that does care.
(One massive caveat though ... the argument assumes that 'hand built' means 'higher quality than AI-assisted', and that's probably not true for >99% of developers.)
> You took this statistic out of your rear end?
We are less than a year into good-enough coding agents, and as of right now there is not a single job opening I see that offers a salary for non-AI output.
[flagged]
That odor you are smelling is entirely generated on your end.
you are saying this based on your own experience but YMMV, it is not universally true, specially not in developing countries
> you are saying this based on your own experience but YMMV, it is not universally true, specially not in developing countries
My experience of job postings advertised is exactly the same as everyone else's for the same filters.
This is not a "my personal feeling is that...", this is "I can't find an advertisement, posting or role that doesn't demand, instruct or promise that the successful candidate would be working closely with AI".
We're less than a year in, and I do not see dev jobs advertised on (for example) indeed.com with any sort of criteria omitting AI.
Imagine what it would look like in 5 years.
I have never used indeed.com before, but I just took a look and the very first software engineer posting I looked at doesn't make a single mention of AI. You have a penchant for making easily falsifiable assertions.