Is it a bubble?
oaktreecapital.com278 points by saigrandhi a day ago
278 points by saigrandhi a day ago
> In many advanced software teams, developers no longer write the code; they type in what they want, and AI systems generate the code for them.
What a wild and speculative claim. Is there any source for this information?
At $WORK, we have a bot that integrates with Slack that sets up minor PRs. Adjusting tf, updating endpoints, adding simple handlers. It does pretty well.
Also in a case of just prose to code, Claude wrote up a concurrent data migration utility in Go. When I reviewed it, it wasn't managing goroutines or waitgroups well, and the whole thing was a buggy mess and could not be gracefully killed. I would have written it faster by hand, no doubt. I think I know more now and the calculus may be shifting on my AI usage. However, the following day, my colleague needed a nearly identical temporary tool. A 45 minute session with Claude of "copy this thing but do this other stuff" easily saved them 6-8 hours of work. And again, that was just talking with Claude.
I am doing a hybrid approach really. I write much of my scaffolding, I write example code, I modify quick things the ai made to be more like I want, I set up guard rails and some tests then have the ai go to town. Results are mixed but trending up still.
FWIW, our CEO has declared us to be AI-first, so we are to leverage AI in everything we do which I think is misguided. But you can bet they will be reviewing AI usage metrics and lower wont be better at $WORK.
> we are to leverage AI in everything we do
Sounds like the extremely well-repeated mistake of treating everything like a nail because hammers are being hyped up this month.
You should periodically ask Claude to review random parts of code to pump your metrics.
Has the net benefit that it points out things that are actually wrong and overlooked.
AI reviews have the benefit of making me feel like an idiot in one bullet point and then a genius in the next.
But also points out tons of your deliberate design choices as bugs, and will recommend removing things it doesnt understand.
Great time to research if those choices are still valid or if there's a better way. In any regard, its just an overview, not a total rewrite from the AI's perspective.
just like any junior dev
consider rewriting in rust
that's gonna be painful, as the borrow checker really trips up LLMs
I do a lot of LLM work in rust, I find the type system is a huge defense against errors and hallucinations vs JavaScript or even Typescript.
why periodically? Just set it up in an agentic workflow and have it work until your token limit is hit.
If companies want to value something as dumb as LoC then they get what they incentivized
This is a great response, even for a blue collar worker understanding none of its complexities (I have no code creation abilities, whatsoever — I can adjust parameters, and that's about it... I am a hardware guy).
My layperson anecdote about LLM coding is that using Perplexity is the first time I've ever had the confidence (artificial, or not) to actually try to accomplish something novel with software/coding. Without judgments, the LLM patiently attempts to turn my meat-speak into code. It helps explain [very simple stuff I can assure you!] what its language requires for a hardware result to occur, without chastising you. [Raspberry Pi / Arduino e.g.]
LLMs have encouraged me to explore the inner workings of more technologies, software and not. I finally have the knowledgeable apprentice to help me with microcontroller implementations, albeit slowly and perhaps somewhat dangerously [1].
----
Having spent the majority of my professional life troubleshooting hardware problems, I often benefit from rubber ducky troubleshooting [0], going back to the basics when something complicated isn't working. LLMs have been very helpful in this roleplay (e.g. garage door openers, thermostat advanced configurations, pin-outs, washing machine not working, etc.).
[0] <https://en.wikipedia.org/wiki/Rubber_duck_debugging>
[1] "He knows just enough to be dangerous" —proverbial electricians
¢¢
what really comes through in this description is a fear of judgement from other people, which I think is extremely relatable for anyone who's ever posted a question on stack overflow. I don't think it's a coincidence that the popularity of these tools is coinciding with a general atmosphere of low trust and social cohesion in the US and other societies this last decade
It took me a while to realize you were using "$WORK" as a shell variable, not as a reference to Slack's stock ticker prior to its acquisition by $CRM.
> FWIW, our CEO has declared us to be AI-first, so we are to leverage AI in everything we do which I think is misguided. But you can bet they will be reviewing AI usage metrics and lower wont be better at $WORK.
I've taken some pleasure in having GitHub copilot review whitespace normalization PRs. It says it can't do it, but I hope I get my points anyway.
> it wasn't managing goroutines or waitgroups well, and the whole thing was a buggy mess and could not be gracefully killed
First pass on a greenfield project is often like that, for humans too I suppose. Once the MVP is up, refactor with Opus ultrathink to look for areas of weakness and improvement usually tightens things up.
Then as you pointed out, once you have solid scaffolding, examples, etc, things keep improving. I feel like Claude has a pretty strong bias for following existing patterns in the project.
I think your experience matches well with mine. There are certain workloads and use cases where these tools really do well and legitimately save time; these tend to be more concise tasks and well defined with good context from which to draw from. The wrong tasking and the results can be pretty bad and a time sink.
I think the difficulty is exercising the judgement to know where that productive boundary sits. That's more difficult than it sounds because we're not use to adjudicating machine reasoning which can appear human-like ... So we tend to treat it like a human which is, of course, an error.
I find ChatGPT excellent for writing scripts in obscure scripting languages - AppleScript, Adobe Cloud products, IntelliJ plugin development, LibreOffice, and others.
All of these have a non-trivial learning curve and/or poor and patchy docs.
I could master all of these the hard way, but it would be a huge and not very productive time sink. It's much easier to tell a machine what I want and iterate with error reports if it doesn't solve my problem immediately.
So is this AGI? It's not self-training. But it is smart enough to search docs and examples and pull them together into code that solves a problem. It clearly "knows" far more than I do in this particular domain, and works much faster.
So I am very clearly getting real value from it. And there's a multiplier effect, because it's now possible to imagine automating processes that weren't possible before, and glue together custom franken-workflows that link supposedly incompatible systems and save huge amounts of time.
My thoughts as well, good at somethings and terrible for somethings and you will lose time.
Somethings are best written by yourself.
And this is with the mighty claude opus 4.5
The line right after this is much worse:
> Coding performed by AI is at a world-class level, something that wasn’t so just a year ago.
Wow, finance people certainly don't understand programming.
World class? Then what am I? I frequently work with Copilot and Claude Sonnet, and it can be useful, but trusting it to write code for anything moderately complicated is a bad idea. I am impressed by its ability to generate and analyse code, but its code almost never works the first time, unless it's trivial boilerplate stuff, and its analysis is wrong half the time.
It's very useful if you have the knowledge and experience to tell when it's wrong. That is the absolutely vital skill to work with these systems. In the right circumstances, they can work miracles in a very short time. But if they're wrong, they can easily waste hours or more following the wrong track.
It's fast, it's very well-read, and it's sometimes correct. That's my analysis of it.
Is this why AI is telling us our every idea is brilliant and great? Because their code doesn't stand up to what we can do?
Because people who can’t code but now can have zero understanding of the ‘path to production quality code’
Of course it is mind blowing for them.
Copilot is easily the worst (and probably slowest) coding agent. SOTA and Copilot don't even inhabit similar planes of existence.
I've found Opus 4.5 in copilot to be very impressive. Better than codex CLI in my experience. I agree Copilot definitely used to be absolutely awful.
> I frequently work with Copilot and Claude Sonnet, and it can be useful, but trusting it to write code for anything moderately complicated is a bad idea
This sentence and the rest of the post reads like an horoscope advice. Like "It can be good if you use it well, it may be bad if you don't". It's pretty much the same as saying a coin may land on head or on tail.
saying "a coin may land on head or on tail" is useful when other people are saying "we will soon have coins that always land on heads"
They don’t. I’ve gone from rickety and slow excel sheets and maybe some python functions to automate small things that I can figure out to building out entire data pipelines. It’s incredible how much more efficient we’ve gotten.
> Including how it looks at the surrounding code and patterns.
Citation needed. Even with specific examples, “follow the patterns from the existing tests”, etc copilot (gpt 5) still insists on generating tests using the wrong methods (“describe” and “it” in a codebase that uses “suite” and “test”).
An intern, even an intern with a severe cognitive disability, would not be so bad at pattern following.
Do you think smart companies seeking to leverage AI effectively in their engineering orgs are using the 20$ slopify subscription from Microsoft?
You get what you pay for.
Every time a new model or tool comes out, the AI boosters love to say that n-1 was garbage and finally AI vibecoding is the real deal and it will make you 10x more productive.
Except six months ago n-1 was n and the boosters were busy ruining their credibility saying that their garbage tier AI was world class and making them 10x more productive.
Today’s leading world-class agentic model is tomorrow’s horrible garbage tier slop generator that was patently never good enough to be taken seriously.
This has been going on for years, the pattern is obvious and undeniable.
Of course not, why would they? They understand making money, and what makes money right now? What would be antithetical to making money? Why might we be doing one thing and not another? The lines are bright and red and flashing.
I completely agree. This guy is way outside his area of expertise. For those unaware, Howard Marks is a legendary investment manager with a decades-long impressive track record. Additionally, these "insights" letters are also legendary in the money management business. Personally, I would say his wisdom is one notch below Warren Buffett. I am sure he is regularly asked (badgered?) by investors what he thinks about the current state and future of AI (LLMs) and how it will impact his investment portfolio. The audience of this letter is investors (real and potential), as well as other investment managers.
Follow-up: This letter feels like a "jump the shark" moment.
Ref: https://blog.codinghorror.com/has-joel-spolsky-jumped-the-sh...
First time reading this. It's actually funny how disliking exceptions seemed crazy then but it's pretty normal now. And writing a new programming language for a certain product, well, it could turn out to be pretty cool, right? It's how we get all those Elms and so on.
It's not. And if your team is doing this you're not "advanced."
Lots of people are outing themselves these days about the complexity of their jobs, or lack thereof.
Which is great! But it's not a +1 for AI, it's a -1 for them.
Part of the issue is that I think you are underestimating the number of people not doing "advanced" programming. If it's around ~80-90%, then that's a lot of +1s for AI
Wrong. 80% of code not being advanced is quite strictly not the same as 80% people not doing advanced programming.
I completely understand the difference, and I am standing by my statement that 80-90% of programmers are not doing advanced programming at all.
Why do you feel like I'm underestimating the # of people not doing advanced programming?
Theoretically, if AI can do 80-90% of programming jobs (the ones not in the "advanced" group), that would be an unequivocal +1 for AI.
I think you're crossing some threads here.
"It's not. And if your team is doing this you're not "advanced." Lots of people are outing themselves these days about the complexity of their jobs, or lack thereof.
Which is great! But it's not a +1 for AI, it's a -1 for them.
" Is you, right?
It's true for me. I type in what I want and then the AI system (compiler) generates the code.
Doesn't everyone work that way?
Describing a compiler as "AI" is certainly a take.
I used to hand roll the assembly, but now I delegate that work to my agent, clang. I occasionally override clang or give it hints, but it usually gets it right most of the time.
clang doesn't "understand" the hints because it doesn't "understand" anything, but it knows what to do with them! Just like codex.
Given an input clang will always give the same output, not quite the same for llms. Also nobody ever claimed compilers were intelligent or that they "understood" things
An LLM will also give the same output for the same input when the temperature is zero[1]. It only becomes non-deterministic if you choose for it to be. Which is the same for a C compiler. You can choose to add as many random conditionals as you so please.
But there is nothing about a compiler that implies determinism. A compiler is defined by function (taking input on how you want something to work and outputting code), not design. Implementation details are irrelevant. If you use a neural network to compile C source into machine code instead of more traditional approaches, it most definitely remains a compiler. The function is unchanged.
[1] "Faulty" hardware found in the real world can sometimes break this assumption. But a C compiler running on faulty hardware can change the assumption too.
currently LLMs from majorvproviders are not deterministic with temp=0, there are startups focusing on this issue (among others) https://thinkingmachines.ai/blog/defeating-nondeterminism-in...
You can test that yourself in 5 seconds and see that even at a temp of 0 you never get the same output
Works perfectly fine for me.
Did you do that stupid HN thing where you failed to read the entire comment and then went off to try it on faulty hardware?