2025: The Year in LLMs

250 points by simonw 5 hours ago

This is a good tooling survey of the past year. I have been watching it as a developer re-entering the job market. The job descriptions closely parallel the timeline used in the post. That's bizarre to me because these approaches are changing so fast. I see jobs for "Skill and Langchain experts with production-grade 0>1 experience. Former founders preferred". That is an expertise that is just a few months old and startups are trying to build whole teams overnight with it. I'm sure January and February will have job postings for whatever gets released that week. It's all so many sand castles.

waldrews - 4 hours ago

Remember, back in the day, when a year of progress was like, oh, they voted to add some syntactic sugar to Java...

throwup238 - 4 hours ago

> they voted to add some syntactic sugar to Java...
I remember when we just wanted to rewrite everything in Rust.
Those were the simpler times, when crypto bros seemed like the worst venture capitalism could conjure.
- OGEnthusiast - 3 hours ago
  
  Crypto bros in hindsight were so much less dangerous than AI bros. At least they weren't trying to construct data centers in rural America or prop up artificial stocks like $NVDA.
  - SauntSolaire - 2 hours ago
    
    Instead they were building crypto mining warehouses in rural America and propping up artificial currencies like BTC.
  - zahlman - 2 hours ago
    
    Speaking of which, we never found out the details (strike price/expiration) of Michael Burry's puts, did we? It seems he could have made bank if he'd waited one more month...
    
    kamranjon - an hour ago
    
    I think they expire in March 2026 if the NVIDIA stock drops to $140 a share? Something close to that I think.
  - quaintpartridge - 2 hours ago
    
    They were, just not as many. https://www.wired.com/story/the-worlds-biggest-bitcoin-mine-...

AndyNemmity - 4 hours ago

These are excellent every year, thank you for all the wonderful work you do.

tkgally - 3 hours ago

Same here. Simon is one of the main reasons I’ve been able to (sort of) keep up with developments in AI.
I look forward to learning from his blog posts and HN comments in the year ahead, too.
- password4321 - 4 minutes ago
  
  Don't forget you can pay Simon to keep up with less!
  > At the end of every month I send out a much shorter newsletter to anyone who sponsors me for $10 or more on GitHub
  https://simonwillison.net/about/#monthly

websiteapi - 3 hours ago

I'm curious how all of the progress will be seen if it does indeed result in mass unemployment (but not eradication) of professional software engineers.

ori_b - 3 hours ago

My prediction: If we can successfully get rid of most software engineers, we can get rid of most knowledge work. Given the state of robotics, manual labor is likely to outlive intellectual labor.
- beardedwizard - 2 hours ago
  
  "Given the state of robotics" reminds me a lot of what was said about llms and image/video models over the past 3 years. Considering how much llms improved, how long can robotics be in this state?
  I have to think 3 years from now we will be having the same conversation about robots doing real physical labor.
  "This is the worst they will ever be" feels more apt.
  - - 2 hours ago
    
    [deleted]
  - chii - an hour ago
    
    but robotics had the means to do majority of the physical labour already - it's just not worth the money to replace humans, as human labour is cheap (and flexible - more than robots).
    With knowledge work being less high-paying, physical labour supply should increase as well, which drops their price. This means it's actually less likely that the advent of LLM will make physical labour more automated.
  - Davidzheng - an hour ago
    
    Robotics is coming FAST. Faster than LLM progress in my opinion.
    
    wh0knows - an hour ago
    
    Curious if you have any links about the rapid progression of robotics (as someone who is not educated on the topic).
    It was my feeling with robotics that the more challenging aspect will be making them economically viable rather than simply the challenge of the task itself.
- BobbyJo - an hour ago
  
  I would have agreed with this a few months ago, but something Ive learned is that the ability to verify an LLMs output is paramount to its value. In software, you can review its output, add tests, on top of other adversarial techniques to verify the output immediately after generation.
  With most other knowledge work, I don't think that is the case. Maybe actuarial or accounting work, but most knowledge work exists at a cross section of function and taste, and the latter isn't an automatically verifiable output.
  - throw1235435 - 25 minutes ago
    
    I also believe this - I think it will probably just disrupt software engineering and any other digital medium with mass internet publication (i.e. things RLVR can use). For the short term future it seems to need a lot of data to train on, and no other profession has posted the same amount of verifiable material. The open source altruism has disrupted the profession in the end; just not in the way people first predicted. I don't think it will disrupt most knowledge work for a number of reasons. Most knowledge professions have "credentials' (i.e. gatekeeping) and they can see what is happening to SWE's and are acting accordingly. I'm hearing it firsthand at least locally in things like law, even accounting, etc. Society will ironically respect these professions more for doing so.
    Any data, verifiability, rules of thumb, tests, etc are being kept secret. You pay for the result, but don't know the means.
simonw - 3 hours ago

I nearly added a section about that. I wanted to contrast the thing where many companies are reducing junior engineering hires with the thing where Cloudflare and Shopify are hiring 1,000+ interns. I ran out of time and hadn't figured out a good way to frame it though so I dropped it.

the_mitsuhiko - 3 hours ago

> The (only?) year of MCP

I like to believe, but MCP is quickly turning into an enterprise thing so I think it will stick around for good.

simonw - 3 hours ago

I think it will stick around, but I don't think it will have another year where it's the hot thing it was back in January through May.
- Alex-Programs - 2 hours ago
  
  I never quite got what was so "hot" about it. There seems to be an entire parallel ecosystem of corporates that are just begging to turn AI into PowerPoint slides so that they can mould it into a shape that's familiar.

npalli - 3 hours ago

Great summary of the year in LLMs. Is there a predictions (for 2026) blogpost as well?

simonw - 3 hours ago

Given how badly my 2025 predictions aged I'm probably going to sit that one out! https://simonwillison.net/2025/Jan/10/ai-predictions/
- zahlman - 2 hours ago
  
  Making predictions is useful even when they turn out very wrong. Consider also giving confidence levels, so that you can calibrate going forward.
- DANmode - 2 hours ago
  
  Don’t be a bad sport, now!!
- 3 hours ago

[deleted]

didip - 2 hours ago

Indeed. I don't understand why Hacker News is so dismissive about the coming of LLMs, maybe HN readers are going through 5 stages of grief?

But LLM is certainly a game changer, I can see it delivering impact bigger than the internet itself. Both require a lot of investments.

crystal_revenge - an hour ago

> I don't understand why Hacker News is so dismissive about the coming of LLMs
I find LLMs incredibly useful, but if you were following along the last few years the promise was for “exponential progress” with a teaser world destroying super intelligence.
We objectively are not on that path. There is no “coming of LLMs”. We might get some incremental improvement, but we’re very clearly seeing sigmoid progress.
I can’t speak for everyone, but I’m tired of hyperbolic rants that are unquestionably not justified (the nice thing about exponential progress is you don’t need to argue about it)
- aoeusnth1 - 40 minutes ago
  
  We're very clearly seeing exponential progress - even above trend, on METR, whose slope keeps getting revised to a higher and higher estimate each time. Explain your perspective on the objective evidence against exponential progress?
cebert - 2 hours ago

Many people feel threatened by the rapid advancements in LLMs, fearing that their skills may become obsolete, and in turn act irrationally. To navigate this change effectively, we must keep open minds, keep adaptable, and embrace continuous learning.
- chii - an hour ago
  
  > in turn act irrationally
  it isn't irrational to act in self-interest. If LLM threatens someone's livelihood, it matters not that it helps humanity overall one bit - they will oppose it. I don't blame them. But i also hope that they cannot succeed in opposing it.
  - Davidzheng - an hour ago
    
    It's irrational to genuinely hold false beliefs about capabilities of LLMs. But at this point I assume around half of the skeptics are emotionally motivated anyway.
- nickphx - an hour ago
  
  rapid advancements in what? hallucinations..? FOMO marketing? certainly nothing productive.
zvolsky - an hour ago

The idea of HN being dismissive of impactful technology is as old as HN. And indeed, the crowd often appears stuck in the past with hindsight. That said, HN discussions aren't homogeneous, and as demonstrated by Karpathy in his recent blogpost "Auto-grading decade-old Hacker News", at least some commenters have impressive foresight: https://karpathy.bearblog.dev/auto-grade-hn/
asielen - 24 minutes ago

It is an over correction because of all the empty promises of LLMs. I use Claude and chatgpt daily at work and am amazed at what they can do and how far they can come.
BUT when I hear my executive team talk and see demos of "Agentforce" and every saas company becoming an AI company promising the world, I have to roll my eyes.
The challenge I have with LLMs is they are great at creating first draft shiny objects and the LLMs themselves over promise. I am handed half baked work created by non technical people that now I have to clean up. And they don't realize how much work it is to take something from a 60% solution to a 100% solution because it was so easy for them to get to the 60%.
Amazing, game changing tools in the right hands but also give people false confidence.
Not that they are not also useful for non-technical people but I have had to spend a ton of time explaining to copywriters on the marketing team that they shouldn't paste their credentials into the chat even if it tells them to and their vibe coded app is a security nightmare.
probably_wrong - 38 minutes ago

Speaking for myself: because if the hype were to be believed we should have no relational databases when there's MongoDB, no need for dollars when there's cryptocoins, all virtual goods would be exclusively sold as NFTs, and we would be all driving self-driving cars by now.
LLMs are being driven mostly by grifters trying to achieve a monopoly before they run out of cash. Under those conditions I find their promises hard to believe. I'll wait until they either go broke or stop losing money left and right, and whatever is left is probably actually useful.
- simonw - 35 minutes ago
  
  The way I've been handling the deafening hype is to focus exclusively on what the models that we have right now can do.
  You'll note I don't mention AGI or future model releases in my annual roundup at all. The closest I get to that is expressing doubt that the METR chart will continue at the same rate.
  If you focus exclusively on What actually works the LLM space id a whole lot more interesting and less frustrating.
vunderba - 14 minutes ago

> I don't understand why Hacker News is so dismissive about the coming of LLMs.
Eh. I wouldn’t be so quick to speak for the entirety of HN. Several articles related to LLMs easily hit the front page every single day, so clearly there are plenty of HN users upvoting them.
I think you're just reading too much into what is more likely classic HN cynicism and/or fatigue.
Night_Thastus - an hour ago

LLMs hold some real utility. But that real utility is buried under a mountain of fake hype and over-promises to keep shareholder value high.
LLMs have real limitations that aren't going away any time soon - not until we move to a new technology fundamentally different and separate from them - sharing almost nothing in common. There's a lot of 'progress-washing' going on where people claim that these shortfalls will magically disappear if we throw enough data and compute at it when they clearly will not.
- Gigachad - an hour ago
  
  Pretty much. What actually exists is very impressive. But what was promised and marketed has not been delivered.
  - coffeebeqn - 16 minutes ago
    
    Yes and most of the investment has been kind of post-GPT4 betting that things will get exponentially more impressive
  - visarga - 28 minutes ago
    
    I think the missing ingredient is not something the LLMs lack, but something we as developers don't do - we need to constrain, channel, and guide agents by creating reactive test environments around them. Not vibes, but hard tests, they are the missing ingredient to coding agents. You can even use AI to write most of these tests but the end result depends on how well you structured your code to be testable.
    If you inherit 9000 tests from an existing project you can vibe code a replacement on your phone in a holiday, like Simon Willison's JustHTML port. We are moving from agents semi-randomly flailing around to constraint satisfaction.
  - rustystump - 35 minutes ago
    
    Markets never deliver. That isnt new, i do think llms are not far off from google in terms of impact.
    Search, as of today, is inferior to frontier models as a product. However, best case still misses expected returns by miles which is where the growsing comes from.
    Generative art/ai is still up in the air for staying power but id predict it isnt going away.
snigsnog - 2 hours ago

The internet and smartphones were immediately useful in a million different ways for almost every person. AI is not even close to that level. Very to somewhat useful in some fields (like programming) but the average person will easily be able to go through their day without using AI.
The most wide-appeal possibility is people loving 100%-AI-slop entertainment like that AI Instagram Reels product. Maybe I'm just too disconnected with normies but I don't see this taking off. Fun as a novelty like those Ring cam vids but I would never spend all day watching AI generated media.
- raincole - an hour ago
  
  The early internet and smartphones (the Japanese ones, not iPhone) were definitely not "immediately" adopted by the mass, unlike LLM.
  If "immediate" usefulness is the metric we measure, then the internet and smartphones are pretty insignificant inventions compared to LLM.
  (of course it's not a meaningful metric, as there is no clear line between a dumb phone and a smart phone, or a moderately sized language model and a LLM)
- JumpCrisscross - 2 hours ago
  
  > AI is not even close to that level
  Kagi’s Research Assistant is pretty damn useful, particularly when I can have it poll different models. I remember when the first iPhone lacked copy-paste. This feels similar.
  (And I don’t think we’re heading towards AGI.)
- nen-nomad - an hour ago
  
  ChatGPT has roughly 800 million weekly active users. Almost everyone around me uses it daily. I think you are underestimating the adoption.
- SgtBastard - an hour ago
  
  … the internet was not immediately useful in a million different ways for almost every person.
  Even if you skip ARPAnet, you’re forgetting the Gopher days and even if you jump straight to WWW+email==the internet, you’re forgetting the mosaic days.
  The applications that became useful to the masses emerged a decade+ after the public internet and even then, it took 2+ decades to reach anything approaching saturation.
  Your dismissal is not likely to age well, for similar reasons.
  - chii - an hour ago
    
    the "usefulness" excuse is irrelevant, and the claim that phones/internet is "immediately useful" is just a post hoc rationalization. It's basically trying to find a reasonable reason why opposition to AI is valid, and is not in self-interest.
    The opposition to AI is from people who feel threatened by it, because it either threatens their livelihood (or family/friends'), and that they feel they are unable to benefit from AI in the same way as they had internet/mobile phones.
- fragmede - an hour ago
  
  > The internet and smartphones were immediately useful in a million different ways for almost every person. AI is not even close to that level.
  Those are some very rosy glasses you've got on there. The nascent Internet took forever to catch on. It was for weird nerds at universities and it'll never catch on, but here we are.
- what-the-grump - 28 minutes ago
  
  A year after the iPhone came out… it didn’t have an App Store, barely was able to play video, barely had enough power to last a day. You just don’t remember or were not around for it.
  A year after llms came out… are you kidding me?
  Two years?
  10 years?
  Today, by adding an MCP server to wrap the same API that’s been around forever for some system, makes the users of that system prefer NLI over the gui almost immediately.
- staticassertion - an hour ago
  
  > Very to somewhat useful in some fields (like programming) but the average person will easily be able to go through their day without using AI.
  I know a lot of "normal" people who have completely replaced their search engine with AI. It's increasingly a staple for people.
  Smartphones were absolutely NOT immediately useful in a million different ways for almost every person, that's total revisionist history. I remember when the iPhone came out, it was AT&T only, it did almost nothing useful. Smartphones were a novelty for quite a while.
- an hour ago

[deleted]

andrewinardeer - an hour ago

Thank you. Enjoyed this read.

AI slop videos will no doubt get longer and "more realistic" in 2026.

I really hope social media companies plaster a prominent banner over them which screams, "Likely/Made by AI" and give us the option to automatically mute these videos from our timeline. That would be the responsible thing to do. But I can't see Alphabet doing that on YT, xAI doing that on X or Meta doing that on FB/Insta as they all have skin in the video gen game.

- 3 hours ago

[deleted]

vanderZwan - 2 hours ago

Speaking of new year and AI: my phone just suggested "Happy Birthday!" as the quick-reply to any "Happy New Year!" notification I got in the last hours.

I'm not too worried about my job just yet.

pants2 - an hour ago

It won't help to point out the worst examples. You're not competing with an outdated Apple LLM running on a phone. You're competing with Anthropic frontier models running on a multimillion dollar rack of servers.

sanreau - 3 hours ago

> Vendor-independent options include GitHub Copilot CLI, Amp, OpenHands CLI, and Pi

...and the best of them all, OpenCode[1] :)

[1]: https://opencode.ai

d4rkp4ttern - 33 minutes ago

Can OpenCode be used with the Claude Max or ChatGPT Pro subscriptions, i.e., without per-token API charges?
- simonw - 30 minutes ago
  
  Apparently it does work with Claude Max: https://opencode.ai/docs/providers/#anthropic
  I don't see a similar option for ChatGPT Pro. Here's a closed issue: https://github.com/sst/opencode/issues/704
simonw - 3 hours ago

Good call, I'll add that. I think I mentally scrambled it with OpenHands.
- the_mitsuhiko - 3 hours ago
  
  Thanks for adding pi to it though :)
logicprog - 2 hours ago

I don't know why you're downloaded, OpenCode is by far the best.
nineteen999 - 3 hours ago

How did I miss this until now! Thank you for sharing.

syndacks - 2 hours ago

I can’t get over the range of sentiment on LLMs. HN leans snake oil, X leans “we’re all cooked” —- can it possibly be both? How do other folks make sense of this? I’m not asking for a side, rather understanding the range. Does the range lead you to believe X over Y?

johnfn - an hour ago

I believe the spikiness in response is because AI itself is spiky - it’s incredibly good at some classes of tasks, and remarkably poor at others. People who use it on the spikes are genuinely amazed because of how good it is. This does nothing but annoy the people who use it in the troughs, who become increasingly annoyed that everyone seems to be losing their mind over something that can’t even do (whatever).
thisoneisreal - an hour ago

My take (no more informed than anyone else's) is that the range indicates this is a complex phenomenon that people are still making sense of. My suspicion is that something like the following is going on:
1. LLMs can do some truly impressive things, like taking natural language instructions and producing compiling, functional code as output. This experience is what turns some people into cheerleaders.
2. Other engineers see that in real production systems, LLMs lack sufficient background / domain knowledge to effectively iterate. They also still produce output, but it's verbose and essentially missing the point of a desired change.
3. LLMs also can be used by people who are not knowledgeable to "fake it," and produce huge amounts of output that is basically besides-the-point bullshit. This makes those same senior folks very, very resentful, because it wastes a huge amount of their time. This isn't really the fault of the tool, but it's a common way the tool gets used and so it gets tarnished by association.
4. There is a ridiculous amount of complexity in some of these tools and workflows people are trying to invent, some of which is of questionable value. So aside from the tools themselves people are skeptical of the people trying to become thought leaders in this space and the sort of wild hacks they're coming up with.
5. There are real macro questions about whether these tools can be made economical to justify whatever value they do produce, and broader questions about their net impact on society.
6. Last but not least, these tools poke at the edges of "intelligence," the crown jewel of our species and also a big source of status for many people in the engineering community. It's natural that we're a little sensitive about the prospect of anything that might devalue or democratize the concept.
That's my take for what it's worth. It's a complex phenomenon that touches all of these threads, so not only do you see a bunch of different opinions, but the same person might feel bullish about one aspect and bearish about another.

syndacks - 2 hours ago

zahlman - 2 hours ago

I'm not really convinced that anywhere leans heavily towards anything; it depends which thread you're in etc.
It's polarizing because it represents a more radical shift in expected workflows. Seeing that range of opinions doesn't really give me a reason to update, no. I'm evaluating based on what makes sense when I hear it.

aussieguy1234 - 3 hours ago

> The year of YOLO and the Normalization of Deviance #

On this including AI agents deleting home folders, I was able to run agents in Firejail by isolating vscode (Most of my agents are vscode based ones, like Kilo Code).

I wrote a little guide on how I did it https://softwareengineeringstandard.com/2025/12/15/ai-agents...

Took a bit of tweaking, vscode crashing a bunch of times with not being able to read its config files, but I got there in the end. Now it can only write to my projects folder. All of my projects are backed up in git.

smileson2 - 2 hours ago

forgot to mention the first murder-suicide instigated by chatgpt

DANmode - 2 hours ago

These are his highlights as a killer blogger,
not AI’s highlights.
Easy with the hot take.

sho_hn - 3 hours ago

Not in this review: Also the record year in intelligent systems aiding in and prompting human users into fatal self-harm.

Will 2026 fare better?

simonw - 3 hours ago

I really hope so.
The big labs are (mostly) investing a lot of resources into reducing the chance their models will trigger self-harm and AI psychosis and suchlike. See the GPT-4o retirement (and resulting backlash) for an example of that.
But the number of users is exploding too. If they make things 5x less likely to happen but sign up 10x more people it won't be good on that front.
andai - 3 hours ago

Also essential self-fulfilment.
But that one doesn't make headlines ;)
- sho_hn - 3 hours ago
  
  Sure -- but that's fair game in engineering. I work on cars. If we kill people with safety faults I expect it to make more headlines than all the fun roadtrips.
  What I find interesting with chat bots is that they're "web apps" so to speak, but with safety engineering aspects that type of developer is typically not exposed to or familiar with.
  - simonw - 3 hours ago
    
    One of the tough problems here is privacy. AI labs really don't want to be in the habit of actively monitoring people's conversations with their bots, but they also need to prevent bad situations from arising and getting worse.
    
    walt_grata - 2 hours ago
    
    Until AI labs have the equivalent of an SLA for giving accurate and helpful responses it don't get better. They've not even able to measure if the agents work correctly and consistently.
measurablefunc - 3 hours ago

The people working on this stuff have convinced themselves they're on a religious quest so it's not going to get better: https://x.com/RobertFreundLaw/status/2006111090539687956
inquirerGeneral - an hour ago

[dead]

DrewADesign - 3 hours ago

You’re absolutely right! You astutely observed that 2025 was a year with many LLMs and this was a selection of waypoints, summarized in a helpful timeline.

That’s what most non-tech-person’s year in LLMs looked like.

Hopefully 2026 will be the year where companies realize that implementing intrusive chatbots can’t make better ::waving hands:: ya know… UX or whatever.

For some reason, they think its helpful to distractingly pop up chat windows on their site because their customers need textual kindergarten handholding to … I don’t know… find the ideal pocket comb for their unique pocket/hair situation, or had an unlikely question about that aerosol pan release spray that a chatbot could actually answer. Well, my dog also thinks she’s helping me by attacking the vacuum when I’m trying to clean. Both ideas are equally valid.

And spending a bazillion dollars implementing it doesn’t mean your customers won’t hate it. And forcing your customers into pathways they hate because of your sunk costs mindset means it will never stop costing you more money than it makes.

I just hope companies start being honest with themselves about whether or not these things are good, bad, or absolutely abysmal for the customer experience and cut their losses when it makes sense.

Night_Thastus - 2 hours ago

They need to be intrusive and shoved in your face. This way, they can say they have a lot of people using them, which is a good and useful metric.
zahlman - 2 hours ago

As much as I side with you on this one, I really don't think this submission is the right place to rant about it.
ronsor - 2 hours ago

> For some reason, they think its helpful to distractingly pop up chat windows on their site...
Companies have been doing this "live support" nonsense far longer than LLMs have been popular.
- DrewADesign - an hour ago
  
  There was also source point pollution before the Industrial Revolution. Useless, forced, irritating chat was ‘nowhere close’ to as aggressive or pervasive as it is now. It used to be a niche feature of some CRMs and now it’s everywhere.
  I’m on LinkedIn Learning digging into something really technical and practical and it’s constantly pushing the chat fly out with useless pre-populated prompts like “what are the main takeaways from this video.” And they moved their main page search to a little icon on the title bar and sneakily now what used to be the obvious, primary central search field for years sends a prompt to their fucking chatbot.

techpression - 2 hours ago

Nothing about the severe impact on the environment, and the hand waviness about water usage hurt to read. The referenced post was missing every single point about the issue by making it global instead of local. And as if data center buildouts are properly planned and dimensioned for existing infrastructure…

Add to this that all the hardware is already old and the amount of waste we’re producing right now is mind boggling, and for what, fun tools for the use of one?

I don’t live in the US, but the amount of tax money being siphoned to a few tech bros should have heads rolling and I really don’t want to see it happening in Europe.

But I guess we got a new version number on a few models and some blown up benchmarks so that’s good, oh and of course the svg images we will never use for anything.

simonw - 2 hours ago

"Nothing about the severe impact on the environment"
I literally said:
"AI data centers continue to burn vast amounts of energy and the arms race to build them continues to accelerate in a way that feels unsustainable."
AND I linked to my coverage from last year, which is still true today (hence why I felt no need to update it): https://simonwillison.net/2024/Dec/31/llms-in-2024/#the-envi...

skydhash - 3 hours ago

[flagged]

dang - 2 hours ago

Could you please stop posting dismissive, curmudgeonly comments? It's not what this site is for, and destroys what it is for.
We want curious conversation here.
https://news.ycombinator.com/newsguidelines.html
n2d4 - 2 hours ago

This is extremely dismissive. Claude Code helps me make a majority of changes to our codebase now, particularly small ones, and is an insane efficiency boost. You may not have the same experience for one reason or another, but plenty of devs do, so "nothing happened" is absolutely wrong.
2024 was a lot of talk, a lot of "AI could hypothetically do this and that". 2025 was the year where it genuinely started to enter people's workflows. Not everything we've been told would happen has happened (I still make my own presentations and write my own emails) but coding agents certainly have!
- bandrami - 2 hours ago
  
  Did you ship more in 2025 than in 2024?
  - GCUMstlyHarmls - 2 hours ago
    
    Shipping in 2025: https://x.com/trq212/status/2001848726395269619
  - DANmode - 2 hours ago
    
    I definitely did.
    Objectively 0->1 lots of backlog.
  - wickedsight - 2 hours ago
    
    I definitely did.
- skydhash - 2 hours ago
  
  And this is one of the vague "AI helped me do more".
  This is me touting for Emacs
  Emacs was a great plus for me over the last year. The integration with various tooling with comint (REPL integration), compile (build or report tools), TUI (through eat or ansi-term), gave me a unified experience through the buffer paradigm of emacs. Using the same set of commands boosted my editing process and the easy addition of new commands make it easy to fit my development workflow to the editor.
  This is how easy it is to write a non-vague "tool X helped me" and I'm not even an English native speaker.
  - thunky - 7 minutes ago
    
    > This is how easy it is to write a non-vague "tool X helped me" and I'm not even an English native speaker.
    Your example is very vague.
    See if you can spot the problem in my review of Excel in your style:
    "It's great and I like how it's formula paradigm gave me a unified experience. It's table features boosted my science workflows last year".
  - n2d4 - an hour ago
    
    That paragraph could be the truth, or it could be a lie. Maybe Emacs really did make you more efficient, or you made it all up, I don't know. Best I can do is trust you.
    If you don't trust me, I can't conclusively convince you that AI makes me more efficient, but if you want I'm happy to hop on a screen-share and elaborate in what ways it has boosted my workflow. I'm offering this because I'm also curious what your work looks like where AI cannot help at all.
    E-mail address is on my profile!
MattRix - 3 hours ago

I’m not sure how to tell you how obvious it is you haven’t actually used these tools.
- skydhash - 3 hours ago
  
  Why do people assume negative critique is ignorance?
  - sothatsit - 2 hours ago
    
    You did not make a negative critique. You completely dismissed the value of coding agents on the basis that the results are not predictable, which is both obvious and doesn’t matter in practice. Anyone who has given these tools a chance will quickly realise that 1) they are actually quite predictable in doing what you ask them to, and 2) them being non-deterministic does not at all negate their value. This is why people can immediately tell you haven’t used these tools, because your argument as to why they’re useless is so elementary.
  - dmd - 3 hours ago
    
    People denied that bicycles could possibly balance even as others happily pedaled by. This is the same thing.
    
    blibble - 2 hours ago
    
    people also said that selling jpegs of monkeys for millions of dollars was a pump and dump scam, and would collapse
    they were right
    
    sothatsit - 2 hours ago
    
    JPEGs with no value other than fake scarcity is very different to coding agents that people actively use to ship real code.
    
    rhubarbtree - 2 hours ago
    
    It’s possible this is correct.
    It’s also possible that people more experienced, knowledgable and skilled than you can see fundamental flaws in using LLMs for software engineering that you cannot. I am not including myself in that category.
    I’m personally honestly undecided. I’ve been coding for over 30 years and know something like 25 languages. I’ve taught programming to postgrad level, and built prototype AI systems that foreshadowed LLMs, I’ve written everything from embedded systems to enterprise, web, mainframes, real time, physics simulation and research software. I would consider myself an 7/10 or 8/10 coder.
    A lot of folks I know are better coders. To put my experience into context: one guy in my year at uni wrote one of the world’s most famous crypto systems; another wrote large portions of some of the most successful games of the last few decades. So I’ve grown up surrounded by geniuses, basically, and whilst I’ve been lectured by true greats I’m humble enough to recognise I don’t bleed code like they do. I’m just a dabbler. But it irks me that a lot of folks using AI profess it’s the future but don’t really know anything about coding compared to these folks. Not to be a Luddite - they are the first people to adopt new languages and techniques, but they also are super sceptical about anything that smells remotely like bullshit.
    One of the most wise insights in coding is the aphorism“beware the enthusiasm of the recently converted.” And I see that so much with AI. I’ve seen it with compilers, with IDEs, paradigms, and languages.
    I’ve been experimenting a lot with AI, and I’ve found it fantastic for comprehending poor code written by others. I’ve also found it great for bouncing ideas. And the code it writes, beyond boiler plate, is hot garbage. It doesn’t properly reason, it can’t design architecture, it can’t write code that is comprehensible to other programmers, and treating it as a “black box to be manipulated by AI” just leads to dead ends that can’t be escaped, terrible decisions that will take huge amounts of expert coding time to undo, subtle bugs that AI can’t fix and are super hard to spot, and often you can’t understand their code enough to fix them, and security nightmares.
    Testing is insufficient for good code. Humans write code in a way that is designed for general correctness. AI does not, at least not yet.
    I do think these problems can be solved. I think we probably need automated reasoning systems, or else vastly improved LLMs that border on automated reasoning much like humans do. Could be a year. Could be a decade. But right now these tools don’t work well. Great for vibe coding, prototyping, analysis, review, bouncing ideas.
    
    tehnub - 3 hours ago
    
    People did?
    
    measurablefunc - 3 hours ago
    
    Bicycles don't balance, the human on the bicycle is the one doing the balancing.
    
    dmd - 3 hours ago
    
    Yes, that is the analogy I am making. People argued that bicycles (a tool for humans to use) could not possibly work - even as people were successfully using them.
    
    measurablefunc - 3 hours ago
    
    People use drugs as well but I'm not sure I'd call that successful use of chemical compounds without further context. There are many analogies one can apply here that would be equally valid.
    
    moralestapia - 3 hours ago
    
    [flagged]
    
    skydhash - 3 hours ago
    
    Please tell me which one of the headings is not about increased usage o LLMs and derived tools and is about some improvement in the axes of reliability or or any kind of usefulness.
    Here is the changelog for OpenBSD 7.8:
    https://www.openbsd.org/78.html
    There's nothing here that says: We make it easier to use it more of it. It's about using it better and fixing underlying problems.
    
    simonw - 3 hours ago
    
    The coding agent heading. Claude Code and tools like it represent a huge improvement in what you can usefully get done with LLMs.
    Mistakes and hallucinations matter a whole lot less if a reasoning LLM can try the code, see that it doesn't work and fix the problem.
    
    walt_grata - 3 hours ago
    
    If it actually does that without an argument. I can't believe I have to say that about a computer program
    
    skydhash - 2 hours ago
    
    > The coding agent heading. Claude Code and tools like it represent a huge improvement in what you can usefully get done with LLMs.
    Does it? It's all prompt manipulation. Shell script are powerful yes, but not really huge improvement over having a shell (REPL interface) to the system. And even then a lot of programs just use syscalls or wrapper libraries.
    > can try the code, see that it doesn't work and fix the problem.
    Can you really say that does happens reliably?
    
    dham - 2 hours ago
    
    You're welcome to try the LLM's yourself and come up with your own conclusions. By what you've posted it doesn't look like you've tried the anything in the last 2 years. Yes LLM's can be annoying, but there has been progress.
    
    simonw - 2 hours ago
    
    Depends on what you mean by "reliably".
    If you mean 100% correct all of the time then no.
    If you mean correct often enough that you can expect it to be a productive assistant that helps solve all sorts of problems faster than you could solve them without it, and which makes mistakes infrequently enough that you waste less time fixing them than you would doing everything by yourself then yes, it's plenty reliable enough now.
    
    noodletheworld - 2 hours ago
    
    I know it seems like forever ago, but claude code only came out in 2025.
    Its very difficult to argue the point that claude code:
    1) was a paradigm shift in terms of functionality, despite, to be fair, at best, incremental improvements in the underlying models.
    2) The results are an order of magnitude, I estimate, better in terms of output.
    I think its very fair to distill “AI progress 2025” to: you can get better results (up to a point; better than raw output anyway; scaling to multiple agents has not worked) without better models with clever tools and loops. (…and video/image slop infests everything :p).
    
    bandrami - 2 hours ago
    
    Did more software ship in 2025 than in 2024? I'm still looking for some actual indication of output here. I get that people feel more productive but the actual metrics don't seem to agree.
    
    skydhash - 2 hours ago
    
    I'm still waiting for the Linux drivers to be written because of all the 20x improvements that AI hypers are touting. I would even settle for Apple M3 and M4 computers to be supported by Asahi.
    
    noodletheworld - 2 hours ago
    
    I am not making any argument about productivity about using AI vs. not using AI.
    My point is purely that, compared to 2024, the quality of the code produced by LLM inference agent systems is better.
    To say that 2025 was a nothing burger is objectively incorrect.
    Will it scale? Is it good enough to use professionally? Is this like self driving cars where the best they ever get is stuck with an odd shaped traffic cone? Is it actually more productive?
    Who knows?
    Im just saying… LLM coding in 2024 sucked. 2025 was a big year.
  - kakapo5672 - 2 hours ago
    
    Whenever someone tells me that AI is worthless, does nothing, scam/slop etc, I ask them about their own AI usage, and their general knowledge about what's going on.
    Invariably they've never used AI, or at most very rarely. (If they used AI beyond that, this would be admission that it was useful at some level).
    Therefore it's reasonable to assume that you are in that boat. Now that might not be true in your case, who knows, but it's definitely true on average.
    
    snigsnog - 2 hours ago
    
    It's not worthless, it's just not worldchanging as is even in the fields where it's most useful, like programming. If the trajectory changes and we reach AGI then this changes too but right now it's just a way to
    - fart out demos that you don't plan on maintaining, or want to use as a starting place
    - generate first-draft unit tests/documentation
    - generate boilerplate without too much functionality
    - refactor in a very well covered codebase
    It's very useful for all of the above! But it doesn't even replace a junior dev at my company in its current state. It's too agreeable, makes subtle mistakes that it can't permanently correct (GEMINI.md isn't a magic bullet, telling it to not do something does not guarantee that it won't do it again), and you as the developer submitting LLM-generated code for review need to review it closely before even putting it up (unless you feel like offloading this to your team) to the point that it's not that much faster than having written it yourself.
  - LewisVerstappen - 2 hours ago
    
    because your "negative critique" is just idiotic and wrong
senordevnyc - 2 hours ago

This comment is legitimately hilarious to me. I thought it was satire at first. The list of what has happened in this field in the last twelve months is staggering to me, while you write it off as essentially nothing.
Different strokes, but I’m getting so much more done and mostly enjoying it. Can’t wait to see what 2026 holds!
- ronsor - 2 hours ago
  
  People who dislike LLMs are generally insistent that they're useless for everything and have infinitely negative value, regardless of facts they're presented with.
  Anyone that believes that they are completely useless is just as deluded as anyone that believes they're going to bring an AGI utopia next week.

justatdotin - 2 hours ago

[flagged]

simonw - 2 hours ago

Got a good news story about that one? I'm always interested in learning more about this issue, especially if it credibly counters the narrative that the issue is overblown.
- justatdotin - 2 hours ago
  
  [flagged]
  - dang - 2 hours ago
    
    I'm not sure what the issue is here but it's not ok to cross into personal attack on HN. We ban accounts that do that, so please don't do it again.
    https://news.ycombinator.com/newsguidelines.html
    
    justatdotin - an hour ago
    
    how is that a personal attack?
    a personal attack would be eg calling him a DC.
    all I did was point out the intellectual dishonesty of his argument. that's an attack on his intellectually dishonest argument, not his person.
    by all means go ahead and ban me
    
    dang - 20 minutes ago
    
    "I will not pretend you are engaging honestly" is well into the realm of personal attack, and you can't do that here.
    Ditto for "I am very disappointed about your BULLSHIT" in the GP comment.
  - simonw - 2 hours ago
    
    What's not credible about Andy Masley's work on this?
    (For anyone else reading this thread: my comment originally just read "Got a good news story about that one?" - justatdotin posted this reply while I was editing the comment to add the extra text.)

anonnon - 2 hours ago

Why do the mods allow Simon to spam HN with his blogposts and his comments, which he often posts just for the sake of including a link back to his blog? Seriously, go look at his post history and see how often he includes a link to his blog, however tangentially related, when he posts a comment. I actually flagged this submission, which I never do, and encourage others to do likewise.

dang - 2 hours ago

He's one of the most valuable writers on LLMs, which are one of the major topics at present. That's not spam.
- anonnon - an hour ago
  
  > He's one of the most valuable writers on LLMs
  Is he, really? Most of his blog posts are little more than opportunistic, buttressing commentary on someone else's blog post or article, often with a bit of AI apologia sprinkled in (for example, marginalizing people as paranoid for not taking AI companies at their word that they aren't aggressively scraping websites in violation of robots.txt, or exfiltrating user data in AI-enbaled apps).
simonw - 2 hours ago

Probably because my content gets a lot more upvotes than it does flags.
If this post was by anyone other than me would you have any problems with its quality?
firexcy - 21 minutes ago

I appreciate his work for being more informative and organized than average AI-related content. Without his blogging, it would be a struggle to navigate the bombastic and narcissistic Twitter/Reddit posts for AI updates. The barrier to entry for AI reporting is so low that you just need to give a bit more care to be distinguished, and he is getting the deserved attention for doing exactly that in a systematical and disciplined manner. (I do believe many on HN are more than capable but not interested in doing the same.) Personally, I sometimes find his posts more congratulatory or trivial than I like, but I have learned to take what I want and ignore what I don’t.

blutoot - an hour ago

I hope 2026 will be the year when software engineers and recruiters will stop the obsession with leetcode and all other forms of competitive programming bullshit

agentifysh - 3 hours ago

What an amazing progress in just short time. The future is bright! Happy New Year y'all!

castwide - 3 hours ago

2025: The Year in LLMs

I will never stop treating hallucinations as inventions. I dare you to stop me. i double dog dare y