After outages, Amazon to make senior engineers sign off on AI-assisted changes
arstechnica.com564 points by ndr42 20 hours ago
564 points by ndr42 20 hours ago
https://www.ft.com/content/7cab4ec7-4712-4137-b602-119a44f77... (https://archive.ph/wXvF3)
https://twitter.com/lukolejnik/status/2031257644724342957 (https://xcancel.com/lukolejnik/status/2031257644724342957)
This “mandatory meeting” is just the usual weekly company-wide meeting where recent operational issues are discussed. There was a big operational issue last week, so of course this week will have more attendance and discussion. This meeting happens literally every week, and has for years. Feels like the media is making a mountain out of a mole hill here. The article claims: >He asked staff to attend the meeting, which is normally optional. Is that false? It also discusses a new policy: >Junior and mid-level engineers will now require more senior engineers to sign off any AI-assisted changes, Treadwell added. Is that inaccurate? It is good context that this is a regularly scheduled meeting. But, regularly scheduled meetings can have newsworthy things happen at them. When an SVP asks you to do something in a mass email, it's very much optional. Dave Treadwell is an SVP, his org is likely in the 10's of thousands, there is no way to even have a mandatory meeting for that many people. My SVP asks me to do things all the time, indirectly. I do probably 5% of them. > org is likely in the 10's of thousands, there is no way to even have a mandatory meeting for that many people. Ok, this is pretty off-topic, but is this still true? I get that you can't have 10K people all actively participate in the meeting at the same time, but doesn't Zoom have a feature where you can broadcast to thousands and thousands? Doesn't X/Twitter have a feature like this? (Although, to be fair, the last time I heard about that it was part of a headline like "DeSantis announcement of Presidential run on X/Twitter delayed for hours as X/Twitter's tech stack collapses under 200K viewers") But still - nowadays it seems like it should be possible to have 10K employees all tune in at the same time and then call it a meeting, yes? Yes, but at that point it's an all-hands presentation, and you are basically doing a very careful presentation, thinking about every minute, because of how many hours the "meeting" is costing you. Very different from the typical weekly/montly outage meeting, where discussion is actually expected, instead of being a ritual. > but doesn't Zoom have a feature where you can broadcast to thousands and thousands? They have webinar/event support for 5000+ participants, viewers can raise hands/use chat feedback for questions etc. and the meeting host can invite people to be visible. The meeting isn't the hard part—after all, shareholder meetings have huge audiences too. Enforcing mandatory attendance for myriads of employees is the hard part, so it's more likely mandatory in name only. With tens of thousands in a meeting, cracking a 30-second stupid joke is probably costing several thousand dollars. Right, but if you say something essential in a meeting with 10 people and it has to percolate through five levels of management to reach the front-lines and gets watered down, that could be much more lost, even millions. Scale cuts both ways. What matters isn't how big the meeting is, it's how important the material is, and how well presented it is. I don't think I've ever heard a top leader say anything essential in such a meeting. The stuff they work on is not related to my job at all. It's all gartner level strategy stuff. In our company they do take time talking about it in large calls but it's always boring and never relevant. And a lot of political spin you have to poke through to see the real message. If I ever attend it just put it on mute and look at the slides while I do some real work. That way my attendance gets registered and it doesn't stress me out later with too much stuff left hanging. That percolation is also translation of what they say to things that are relevant at my level. Like what we will be working on next year, if there's going to be bonus or job losses. I couldn't give a crap about the company's strategy as a whole and that's not my job anyway. Why should I. I'm not here because I believe in some holy mission. I just wanna do something I like and get paid. Most of those meetings are pretty damn fluffy. No one goes back to their desk and does anything different because they've introduced new company values and the acronym is S.M.I.L.E. But this meeting is a course correction for how they're using AI, which is a huge initiative. He'll be trying to sell the right balance of "keep using the technology, but don't fuck anything up." Too cautious, everyone freezes and there's a slowdown[0]. Too soft, everyone thinks it's "another empty warning not to fuck up" and they go right back to fucking everything up because the real message was "don't you dare slow down." After the talk, people will have conversations about "what did they really mean?" [0] If you hate AI, feel free to flip the direction of the effect. Well this is the main problem with AI right now isn't it? How to use it successfully without having it fuck up. How are they expecting some juniors to do this when the industry as a whole doesn't know where to begin yet? Like that Meta AI expert who wiped her whole mailbox with openclaw. These are the people who should come up with the answers. Ps I mostly hate AI but I do see some potential. Right now it feels like we're entering a fireworks bunker looking for a pot of gold and having only a box of matches for illumination. What we need to know from management is exactly what you mention. Do we go all out and accept that shit will hit the fan once in a while (the old move fast and break things) or do we micromanage and basically work manually like old. And that they accept the risk either way. That kind of strategy is really business leader kind of work. Blaming it on your techs when it inevitably goes wrong is not. Because the tech as it is right now is very non-deterministic. One day it works magic and the next day it blows up. And yes that SMILE thing was a good example. Been in too many of those time wasters. Unless that 30-second stupid joke is what gets the audience to take your request seriously. Sometimes people will help you when you don't come across like a self-interested corporate tool. I have never in my long life heard a joke from upper management during a meeting/presentation that wasn't awkward and cringe. Just get to the point - tell us how many people are getting fired, so the people who aren't fired can get back to work, and you go back to running this company into the ground. Sorry, I got flashbacks... If you assume everyone is making 100k it only takes 20 people in a meeting for it to cost 1k. Wasn't it Shopify who had a system for tracking how much each meeting cost based on attendees? I may be misremembering the company though I was thinking about this in recent weeks and I think I’ve actually changed my mind on it. It’s not really possible to measure how much it would cost to not have a meeting, and I think it’s pretty obvious that if there were no meetings ever, it would hurt a company a lot Yeah, I agree it's a silly metric. But it's kinda also a good reminder that meetings do have a cost associated with them, so they should stay short, focused, and held only when necessary. "This could have been an e-mail" should never need to be said. Is that because you delegate or descope? Why is an SVP doing this if it's just gonna be ignored? are you saying SVP’s words are not important and should be ignored? This is not what I remember back in the day when Bezos sent his email with a question mark (or maybe !) That's not really what the headline attempts to communicate though. It specifically emphasizes "Mandatory" and "AI breaking things". Nobody was going to click on "Regularly scheduled Amazon staff meeting will include discussion on operational improvement" > He asked staff to attend the meeting, which is normally optional. If I get a note from my boss like that, I consider it mandatory. Yeah I don’t understand why people are pretending not to understand this - > He asked staff to attend the meeting, which is normally optional. Clearly means that while normally the meeting would be optional, this time it’s not But it gets less mandatory the more layers up you go. If I get an email from an SVP that is CC: the entire division saying everyone should go to a meeting I will almost certainly be able to ascertain the contents of that meeting in 10 seconds from someone else who did attend Surely your boss notices your non-attendance. If it's actually really mandatory, my manager will probably also relay that directly to me. And that resets the count for "less mandatory the more layers up you go". Starting to wonder if some people who complain about all day meetings just don’t realize they are optional. Days are not far, where my agents are going to attend meetings & share my opinions, collect summary for me. If everyone do same - agents run meetings & share summary with parent (humans). Each of us have LLMs/Agents with our contextual data. It is another level of multi tasking. Then I spin up another agent to listen to the agent who went to the meeting and make any necessary adjustments to the output of my coding agents based on the new rules it heard about from the meeting agent. >>He asked staff to attend the meeting, which is normally optional.
>Is that false? Judging from the comment above, no, the meeting happens every week, and this week they were asked to attend. It’s not false. But it’s also weaselly worded. Note that the article doesn’t say that he told staff they have to attend the meeting. It says he “asked” staff to attend the meeting. Which again, it’s really really normal for there to be an encouragement of “hey, since we just had an operational event, it would be good to prioritize attending this meeting where we discuss how to avoid operational events”. As for the second quote: senior engineers have always been required to sign off on changes from junior engineers. There’s nothing new there. And there is nothing specific to AI that was announced. This entire meeting and message is basically just saying “hey we’ve been getting a little sloppy at following our operational best practices, this is a reminder to be less sloppy”. It’s a massive nothingburger. > It says he “asked” staff to attend the meeting Being "asked" by your boss to attend an optional meeting is pretty close to being required, it's just got a little anti-friction coating on it. That really isn’t the culture at Amazon. There are all-team meetings that happen all the time, and every now and then there is a reminder that “hey we’re gonna be talking about an interesting topic so you might want to join”, but it is certainly not a mandate or expectation that everyone will join. Different companies have different cultures. Weird that people can’t grok this. "If you could just go ahead and attend that meeting, that would be greaaaaaaat..." "Did ya get the memo... about that meeting? I'll just have my secretary forward you another copy of that memo, OK? Yeaaaaaaah..." Your characterization of the event as a simple reminder to follow established best practices is directly contradicted by the briefing note of the meeting, which specifically mentions a lack of best practices related to AI. Which makes me skeptical of your assessment of the situation in general. > Under “contributing factors” the note included “novel GenAI usage for which best practices and safeguards are not yet fully established”. > senior engineers have always been required to sign off on changes from junior engineers. definitely a team by team question. if it was required it would be a crux rule that the code review isnt approved without an l6 approver. It’s part of the change management process that all code is reviewed. This is needed as per several different compliance agreements. What’s probably happened is poor peer reviews from other junior engineers gets missed. That’s a lot of code reviews to send upstream. It didn't seem to make the news but at least in NYC the entire Amazon storefront was broken all afternoon on Friday. Items weren't displaying prices and it was impossible to add anything to your cart. It lasted from about 2pm to 5pm. It's especially strange because if a computer glitch brought down a large retail competitor like Walmart I probably would have seen something even though their sales volume is lower. Over the weekend I was trying to return a pair of shoes and get a different size and I kept getting 500s trying to go to the store page for the shoes. Funny, I was automatically refunded for a pair of shoes that Amazon thought I never received even though I’m wearing them right now. I couldn’t even find a way to dispute the refund so I just took the win… That explains why it kept changing the estimated received date. It was doing weird things. Sometimes you squeeze clay and it comes out the oddest places. There were other stressors last week.https://www.pcmag.com/news/amazon-cloud-services-disrupted-i... A little birdie told me someone pushed duplicate data into one of Amazon’s core noSQL systems that runs most of e-commerce. The front end of the site broke in weird ways but it certainly wasn’t taking orders. It’s always sobering to see a news story about something you have insider perspective on. I am not in that specific meeting but it made me chuckle that a weekly ops meeting will somehow get media attention. It's been an Amazon thing forever. Wait until the public learns about CoEs! A weekly ops meeting where they talk about ensuring PRs with AI contributions get extra scrutiny? I think that's significant news. Exactly. This is real world pushback on the "software is solved" narrative from AI labs. Also, most orgs try to copy Amazon for some reason more than big tech firms. "At our org, we disagree and commit" - yeah you made that one up yourself. Anyway, this is going to have a lot of impact in my view. There was nothing mentioned in the meeting or messaging about PRs with AI contributions. There are no extra requirements for review or scrutiny of AI-generated-code. The media reports about this have been excessively misleading about this. It's not extra scrutiny. Doing code reviews for every commit is a standard practice at Amazon and has been for a decade plus. id.expect COEs to be coming up with AI code action items though, not to have more thorough human checks There's an explicit tension: SWEs would love that as a "get out of jail free" card, but their management chain is being evaluated by ajassy on AI/ML adoption. Admitting AI code as the root cause of a CoE is gonna look really bad unless/until your peers are also copping to it. I think its a question 2 or 3 in a why chain, but 4 and 5 need to be why the agent screwed up, and there needs to be action items that are around giving the ai better guardrails, context, or tooling. "get a person to look at it" is a cop-out action item, and best intentions only. nothing that you could actually apply to make development better across the whole company > Feels like the media is making a mountain out of a mole hill here. That's been their job ever since cable news was invented. It’s been a bit longer than that. https://en.wikipedia.org/wiki/Yellow_journalism It probably goes back as long as they have been shouting news in the town square in Rome or before that even. Word around the campfire is, telling stories and exaggerating them to get people attention, is as old as humanity. But good journalism is still something else. This reply chain is confusing but I'm guessing got merged from another thread that had a different title? Must have as the comments are hours older than OP. > This meeting happens literally every week, and has for years. Feels like the media is making a mountain out of a mole hill here. Are you completely missing the point of the submission? It's not about "Amazon has a mandatory weekly meeting" but about the contents of that specific meeting, about AI-assisted tooling leading to "trends of incidents", having a "large blast radius" and "best practices and safeguards are not yet fully established". No one cares how often the meeting in general is held, or if it's mandatory or not. >> Are you completely missing the point of the submission no, and that's what people are noting: the headline deliberately tries to blow this up into a big deal. When did you last see the HN post about Amazon's mandatory meeting to discuss a human-caused outage, or a post mortem? It's not because they don't happen... Amazon has had a really bad string of various outages recently. Assuming they're internally treating this as business as usual in post-mortems then perhaps the newsworthy thing is actually that they aren't taking their outages seriously enough. > the headline deliberately tries to blow this up into a big deal I do not understand how “company that runs half the internet has had major recent outages and now explicitly names lax/non-existent LLM usage guidelines as a major reason” can possibly not be a big deal in the midst of an industry-wide hype wave over how the world’s biggest companies now run agent teams shipping 150 pull requests an hour. The chain of events is “AWS has been having a pretty awful time as far as outages go”, and now “result of an operational meeting is that the company will cut down on the use of autonomous AI.” You don’t need CoT-level reasoning to come to the natural conclusion here. If we could, as a species, collectively, stop measuring the relevance of a piece of news proportionally by how much we like hearing it, please? The defensiveness is almost as interesting as the meeting itself. Way too many people have tied their egos to the success of AI. And too many people have their egos tied to its failure, too. Im a massive AI skeptic. If anyone were to be jumping up and down on the corpse of AI and this incessant drive to use it everywhere, it’d be me. But I also work at Amazon. I got the email. I attended the meeting. I can personally attest that there are no new requirements for AI-generated code. The articles about this in the meeting at extremely misleading, if not outright wrong. But instead of believing the person that was actually there in the room, this thread is full of people dismissing my first-hand account of the situation because it doesn’t align with the “haha AI failed” viewpoint. Not just their egos, but their paychecks. This place is either going to get very quiet or really weird when the hype train derails and the AI bubble bursts. The subject of the media coverage is not AWS, it is a peer organization to AWS that runs using significant amounts of non-AWS infrastructure. They are both part of an umbrella called Amazon but are not at all the same thing. Maybe your CoT-level reasoning isn’t so robust. It's hard to that this objection seriously. The publication is literally called the Financial Times. It's not exactly crazy for them to think that their readers might care about the entity that shows up the stock ticker rather than how the company happens to divide up things internally. Even if it weren't a finance publication, I have trouble imagining you making this argument if a headline said something like "Google deals with outages in the cloud" because of the idea that it's misleading to refer to it as anything other than GCP. I think you're fundamentally not understanding how people communicate about this sort of thing if you actually think that someone saying "Amazon" is misleading in any meaningful way. The message and meeting being discussed here have nothing to do with AWS or any outages AWS has faced recently. I think you’re missing the point of the discussion. I don’t blame you, because this is just bad reporting (and potentially intentionally malicious to make you think it’s about AWS). But the meeting and discussion was with the Amazon retail teams, talking about Amazon retail processes, and Amazon retail services. The teams and processes that handle this are entirely separate from any AWS outages you are thinking of. The outages that Amazon retail has faced also have nothing to do with AI, and there was no “explicit call out” about AI causing anything. This is correct. We ran them on Wednesday’s in Alexa. Jessy actually used to come and sit in ours once a quarter or so when he was running AWS. The core message of the article is that Amazon has been having issues with AI slop causing operational reliability concerns, and that seems to be 100% accurate. What has really happened is that those employees were made into "reverse centaurs": https://www.theguardian.com/us-news/ng-interactive/2026/jan/... Who is the media you're accusing here? This is a twitter post. As far as I can tell they do not work a media company. What is worth being pointed out is how quickly people blame "The Media" for how people use, consume and spread information on social networks. The source is not a Twitter post, it's a Financial Times article (that the poster failed to cite). I believe it is by group - AWS started the weekly operations meeting, effectively every service's oncall from the last week had to attend. Then it grew massive, so they made it optional. Alexa had a similar meeting that tried to replicate what AWS did. A lot of time spent reviewing load tests getting ready for holiday season, prime day, and the superbowl (super bowl ads used to cause crazy TPS spikes for Alexa). And a lot of finger pointing if there was an outage from one team. While it probably did help raise the operational bar, so much time wasted by engineers on busywork/paperwork documenting an error or fix vs improving the actual service. >Junior and mid-level engineers can no longer push AI-assisted code without a senior signing off Review by a senior is one of the biggest "silver bullet" illusions managers suffer from. For a person (senior or otherwise) to examine code or configuration with the granularity required to verify that it even approximates the result of their own level of experience, even only in terms of security/stability/correctness, requires an amount of time approaching the time spent if they had just done it themselves. I.e. senior review is valuable, but it does not make bad code good. This is one major facet of probably the single biggest problem of the last couple decades in system management: The misunderstanding by management that making something idiot proof means you can now hire idiots (not intended as an insult, just using the terminology of the phrase "idiot proof"). When I was really early in my career, a mentor told me that code review is not about catching bugs but spreading context (i.e. increasing bus factor.) Catching bugs is a side effect, but unless you have a lot of people review each pull request, it's basically just gambling. The more expensive and less sexy option is to actually make testing easier (both programmatically and manually), write more tests and more levels of tests, and spend time reducing code complexity. The problem, I think, is people don't get promoted for preventing issues. This depends on the industry. I work on industrial machine control software, and we spend a huge amount of time on tests. We have to for some parts (human safety crtitical), but other parts would just be expensive if they failed (loss of income for customers, and possibly damaged equipment). The key to making this scalable is to make as few parts as possible critical, and make the potential bad outcomes as benign as possible. (This lets you go to a lower rating in whatever safety standard applies to your industry.) You still need tests for the less critical parts though, while downtime is better than injury, if you want to sell future machines to your customers you need to have a good track record. At least if you don't want to compete on cost. > make as few parts as possible critical, and make the potential bad outcomes as benign as possible This is a good lesson for anyone I think. Definitely something I’m going to think more about. Thanks for sharing! One of the major things code review does is prevent that one guy on your team who is sloppy or incompetent from messing up the codebase without singling him out. If you told someone "I don't trust you, run all code by me first" it wouldn't go well. If you tell them "everyone's code gets reviewed" they're ok with it. Everyone is sloppy sometimes. I wonder if what code review does is prevent velocity (acts a a brake) so that things dont change too fast (which is often a good thing). You don't get paid for features or code shipped. People don't pay $200 a head for fine dining based on the number of carrot chops or garlic crushes. The chops and crushes are necessary but not what you should be optimizing for. > people don't get promoted for preventing issues. they do - but only after a company has been burned hard. They also can be promoted for their area being enough better that everyone notices. still the best way to a promotion is write a major bug that you can come in at the last moment and be the hero for fixing. That could work but plenty of quiet heros weren’t promoted for fixing critical bugs. They fixed it too soon. You have to wait until the effect is visible on someone's dashboard somewhere. Goodhart's Law strikes again... "When a measure becomes a target, it ceases to be a good measure." You have to make sure it doesn't arrive at you before it is on the dashboard. Otherwise you are why it is blowing up the time to fix a bug metric. Unless you can make the problem so obscure other smart people asked to help you can't figure it out thus making you look bad. That is in no way guaranteed. Sometimes finding too many security issues makes you unpopular. Two years afterward, we got hit with ransomware. And obviously "I told you so" isn't a productive discussion topic at that point. That's not preventing the issue, though. The closest you can get to this is to have another competitor be burned hard and demonstrate how your code base has the exact same issue. But even that isn't guaranteed. "that can't happen here" is a hard mindset to disrupt unless you yourself are already a C suite. Code review is great for spreading context, but they also are very good at finding bugs. If you want to find bugs, review is one of the best ways to do it. https://entropicthoughts.com/code-reviews-do-find-bugs I think of code review more about ensuring understandability. When you spend hours gathering context, designing, iterating, debugging, and finally polishing a commit, your ability to judge the readability of your own change has been tainted by your intimate familiarity with it. Getting a fresh pair of eyes to read it and leave comments like "why did you do it this way" or "please refactor to use XYZ for maintainability", you end up with something more that will be easier to navigate and maintain by the junior interns who will end up fixing your latent bugs 5 years later. > The problem, I think, is people don't get promoted for preventing issues. cleaning up structural issues across a couple orgs is a senior => principal promo ive seen a couple of times > When I was really early in my career, a mentor told me that code review is not about catching bugs but spreading context (i.e. increasing bus factor.) Catching bugs is a side effect This bs is what I say my juniors when I want them to fuck off with their reviews and focus on my actual work. Sounds very insightful though. Expert reviews are just about the only thing that makes AI generated code viable, though doing them after the fact is a bit sketchy, to be efficient you kinda need to keep an eye on what the model is doing as its working. Unchecked, AI models output code that is as buggy as it is inefficient. In smaller green field contexts, it's not so bad, but in a large code base, it's performs much worse as it will not have access to the bigger picture. In my experience, you should be spending something like 5-15X the time the model takes to implement a feature on reviewing and making it fix its errors and inefficiencies. If you do that (with an expert's eye), the changes will usually have a high quality and will be correct and good. If you do not do that due dilligence, the model will produce a staggering amount of low quality code, at a rate that is probably something like 100x what a human could output in a similar timespan. Unchecked, it's like having a small army of the most eager junior devs you can find going completely fucking ape in the codebase. If you spend 5-15x the time reviewing what the LLM is doing, are you saving any time by using it? No, but that's the crux of the AI problem in software. Time to write code was never the bottleneck. AI is most useful for learning, either via conversation or by seeing examples. It makes writing code faster too, but only a little after you take into account review. The cases where it shines are high-profile and exciting to managers, but not common enough to make a big difference in practice. E.g AI can one-shot a script to get logs from a paginated API, convert it to ndjson, and save to files grouped by week, with minimal code review, but only if I'm already experienced enough to describe those requirements, and, most importantly, that's not what I'm doing every day anyway. I'm finding it in some cases I'm dealing with even more code given how much code AI outputs. So yeah, for some tasks I find myself extremely fast but for others I find myself spending ungodly amounts of time reviewing the code I never wrote to make sure it doesn't destroy the project from unforseen convincing slop. A related Dirty Secret that's going to become clear from all this is that a very large proportion of code in the wild (yes, even in 2026—maybe not in FAANG and friends, IDK, but across all code that is written for pay in the entire economy) has limited or no automated test coverage, and is often being written with only a limited recorded spec that's usually fleshed out only to the degree needed (very partial) as a given feature is being worked on. What do the relatively hands-off "it can do whole features at a time" coding systems need to function without taking up a shitload of time in reviews? Great automated test coverage, and extensive specs. I think we're going to find there's very little time-savings to be had for most real-world software projects from heavy application of LLMs, because the time will just go into tests that wouldn't otherwise have been written, and much more detailed specs that otherwise never would have been generated. I guess the bright-side take of this is that we may end up with better-tested and better-specified software? Though so very much of the industry is used to skipping those parts, and especially the less-capable (so far as software goes) orgs that really need the help and the relative amateurs and non-software-professionals that some hope will be able to become extremely productive with these tools, that I'm not sure we'll manage to drag processes & practices to where they need to be to get the most out of LLM coding tools anyway. Especially if the benefit to companies is "you will have better tests for... about the same amount of software as you'd have written without LLMs". We may end up stuck at "it's very-aggressive autocomplete" as far as LLMs' useful role in them, for most projects, indefinitely. On the plus side for "AI" companies, low-code solutions are still big business even though they usually fail to deliver the benefits the buyer hopes for, so there's likely a good deal of money to be made selling companies LLM solutions that end up not really being all that great. > better-specified software Code is the most precise specification we have for interfacing with computers. Sure, but if you define the code as the only spec, then it is usually a terrible spec, since the code itself specifies bugs too. And one of the benefits of having a spec (or tests) is that you have something against which to evaluate the program in order to decide if its behavior is correct or not. Incidentally, I think in many scenarios, LLMs are pretty great at converting code to a spec and indeed spec to code (of equal quality to that of the input spec). There are some cases where AI is generating binary machine code, albeit small amounts. What do we have when we don't have the code? Machine code is still code, even if the representation is a bit less legible than the punch cards we used to use. You’re missing the point of a spec The spec is as much for humans as it is the machine, yes? Spec should be made before hand and agreed on by stakeholders. It says what it should do. So it’s for whoever is implementing, modifying, and/or testing the code. And unfortunately devs have a tendency of poor documentation Re. productivity, if LLM's are a genuine boost with 1/3 of the work, neutral 1/3 of the time, and actually worse 1/3 of the time, it's likely we aren't really seeing performance improvements as 1) people are using them for everything and b) we're still learning how to best use them. So I expect over time we will see genuine performance improvements, but Amdahl's law dictates it won't be as much as some people and ceo's are expecting. Bingo. Hopefully there are some business opportunities for us in that truth. > because the time will just go into tests that wouldn't otherwise have been written Writing tests to ensure a program is correct is the same problem as writing a correct program. Evaluating conformance is a different category of concern from ensuring correctness. Tests are about conformance not correctness. Ensuring correct programs is like cleaning in the sense that you can only push dirt around, you can't get rid of it. You can push uncertainty around and but you can't eliminate it. This is the point of Gödel's theorem. Shannon's information theory observes similar aspects for fidelity in communication. As Douglas Adams noted: ultimately you've got to know where your towel is. A competent programmer proves the program he writes correct in his head. He can certainly make mistakes in that, but it’s very different from writing tests, because proofs abstract (or quantify) over all states and inputs, which tests cannot do. These companies don't care about saving time or lowering operating costs, they have massive monopolies to subsidize their extremely poor engineering practices with. If the mandate is to force LLM usage or lose your job, you don't care about saving time; you care about saving your job. One thing I hope we'll all collectively learn from this is how grossly incompetent the elite managerial class has become. They're destroying society because they don't know what to do outside of copying each other. It has to end. The submitter with their name on the Jira ticket saves time, the reviewer who has to actually verify the work loses a lot of time and likely just lets issues slip through. To be honest, some times it's still beneficial. For fairly straightforward changes it's probably a wash, but ironically enough it's often the trickier jobs where they can be beneficial as it will provide an ansatz that can be refined. It's also very good at tedious chores. And spotting stuff in review! Sometimes it’s false positives but on several occasions I’ve spent ~15-30 minutes teaching-reviewing a PR in person, checked afterwards and it matched every one of the points. Some, but not very much. Writing code is hard. Ai will do a lot of tedious code that you procrastinate writing. Also when you are writing code yourself you are implicitly checking it whilst at the back of your mind retaining some form of the entire system as a whole. People seem to gloss over this... As a CEO if people don't function like this I'd be awake at night sweating. That’s the reverse-centaur issue I see: humans are not great at repetitive nuanced similar seeming tasks, putting the onus on humans to retroactively approve high volumes of critical code has them managing a critical failure mode at their weakest and worst. Automated reviews should be enhancing known good-faith code, manual reviews of high volume superficially sound but subversive code is begging for issues over time. Which results the software engineering issue I’m not seeing addressed by the hype: bugs cost tens to hundreds of times their coding cost to resolve if they require internal or external communication to address. Even if everyone has been 10x’ed, the math still strongly favours not making mistakes in the first place. An LLM workflow that yields 10x an engineer but psychopathically lies and sabotages client facing processes/resources once a quarter is likely a NNPP (net negative producing programmer), once opportunity and volatility costs are factored in. > Even if everyone has been 10x’ed, the math still strongly favours not making mistakes in the first place The math depends on importance of the software. A mistake in a typical CRUD enterprise app with 100 users has zero impact on anything. You will fix it when you have time, the important thing is that the app was delivered in a week a year ago and was solving some problem ever since. It has already made enormous profit if you compare it with today’s (yesterday’s ?) manual development that would take half a year and cost millions. A mistake in a nuclear reactor control code would be a total different thing. Whatever time savings you made on coding are irrelevant if it allowed for a critical bug to slip through. Between the two extremes you thus have a whole spectrum of tasks that either benefit or lose from applying coding with LLMs. And there are also more axes than this low to high failure cost, which also affect the math. For example, even non-important but large app will likely soon degrade into unmanageable state if developed with too little human intervention and you will be forced to start from scratch loosing a lot of time. I have found ai extreemly good at finding all those really hard bugs though. Ai is a greater force multiplier when there is a complex bug than in gneen field code. Sortof. I work on a system too large for anyone to know the whole thing. Often people who don't know each other do something that will break the other. (Often because of the number of different people - most individuals go years between this) No I’m keeping up with the system as a whole because I’m always working at a system level when I’m using AI instead of worrying about the “how” No you’re not. The “how” is your job to understand, and if you don’t you’ll end up like the devs in the article. We as an industry have been able to offload a lot of “how” via deterministic systems built by humans with expert understanding. LLMs give you the illusion of this. No in my case the “how” is 1. I spoke to sales to find out about the customer 2. I read every line of the contract (SOW) 3. I did the initial requirements gathering over a couple of days with the client - or maybe up to 3 weeks 3. I designed every single bit of AWS architecture and code 4. I did the design review with the client 5. I led the customer acceptance testing > We as an industry have been able to offload a lot of “how” via deterministic systems built by humans with expert understanding. LLMs I assure you the mid level developers or god forbid foreign contractors were not “experts” with 30 years of coding experience and at the time 8 years of pre LLM AWS experience. It’s been well over a decade - ironically before LLMs - that my responsibility was only for code I wrote with my own two hands Yes, and trusting an LLM here is not a good idea. You know it will make important mistakes. I’m not saying trusting cheap devs is a good idea either. I do think cheap devs are actually at risk here.
cobolcomesback - 18 hours ago
davidclark - 18 hours ago
djb_hackernews - 12 hours ago
MikeTheGreat - 11 hours ago
hibikir - 11 hours ago
helsinkiandrew - 3 hours ago
sheept - 11 hours ago
javcasas - 12 hours ago
hyperpape - 11 hours ago
wolvoleo - 10 hours ago
hyperpape - 10 hours ago
wolvoleo - 10 hours ago
tmoertel - 10 hours ago
encom - 10 hours ago
FuckButtons - 11 hours ago
airstrike - 11 hours ago
LPisGood - 9 hours ago
airstrike - 8 hours ago
ljm - 10 hours ago
hnguyen1412 - 8 hours ago
skeeter2020 - 17 hours ago
ceejayoz - 13 hours ago
mock-possum - 4 hours ago
idiotsecant - 11 hours ago
brewdad - 11 hours ago
delecti - 11 hours ago
dpark - 10 hours ago
the_arun - 5 hours ago
xp84 - 4 hours ago
s3p - 13 hours ago
cobolcomesback - 17 hours ago
BigTTYGothGF - 16 hours ago
cobolcomesback - 12 hours ago
ryandrake - 11 hours ago
i_cannot_hack - 15 hours ago
8note - 16 hours ago
BikiniPrince - 11 hours ago
CoolGuySteve - 17 hours ago
malfist - 17 hours ago
chatmasta - 12 hours ago
BikiniPrince - 11 hours ago
m3047 - 15 hours ago
kotaKat - 17 hours ago
groundzeros2015 - 9 hours ago
belval - 18 hours ago
cmiles74 - 12 hours ago
osigurdson - 11 hours ago
cobolcomesback - 9 hours ago
falsemyrmidon - 10 hours ago
8note - 16 hours ago
coredog64 - 13 hours ago
8note - 10 hours ago
otterley - 18 hours ago
ses1984 - 17 hours ago
lukan - 13 hours ago
furyofantares - 12 hours ago
embedding-shape - 17 hours ago
skeeter2020 - 17 hours ago
ummonk - 11 hours ago
thepasch - 13 hours ago
mattgreenrocks - 13 hours ago
emp17344 - 11 hours ago
cobolcomesback - 9 hours ago
autoexec - 11 hours ago
shermantanktop - 7 hours ago
saghm - 3 hours ago
cobolcomesback - 13 hours ago
rahbert - 3 hours ago
cmiles8 - 16 hours ago
age1mlclg6 - 13 hours ago
Clent - 16 hours ago
otterley - 16 hours ago
niwtsol - 17 hours ago
happytoexplain - 18 hours ago
ardeaver - 17 hours ago
VorpalWay - 11 hours ago
happyghost - 10 hours ago
asdfman123 - 11 hours ago
behehebd - 7 hours ago
bluGill - 17 hours ago
tartoran - 17 hours ago
recursive - 16 hours ago
marcta - 16 hours ago
bluGill - 16 hours ago
joquarky - 11 hours ago
johnnyanmac - 15 hours ago
kqr - an hour ago
bloppe - 11 hours ago
8note - 16 hours ago
wiseowise - 2 hours ago
marginalia_nu - 17 hours ago
locusofself - 17 hours ago
happytoexplain - 17 hours ago
brandensilva - 15 hours ago
ritlo - 17 hours ago
ansibsha - 15 hours ago
xp84 - 4 hours ago
tmaly - 13 hours ago
marginalia_nu - 13 hours ago
interestpiqued - 11 hours ago
unselect5917 - 4 hours ago
interestpiqued - 4 hours ago
slopinthebag - 16 hours ago
dboreham - 13 hours ago
_wire_ - 17 hours ago
layer8 - 11 hours ago
shimman - 17 hours ago
SchemaLoad - 11 hours ago
marginalia_nu - 17 hours ago
misnome - 15 hours ago
bluGill - 17 hours ago
hard24 - 17 hours ago
bonesss - 16 hours ago
demosito666 - 12 hours ago
bluGill - 12 hours ago
bluGill - 16 hours ago
raw_anon_1111 - 15 hours ago
ansibsha - 15 hours ago
raw_anon_1111 - 14 hours ago
ansibsha - 11 hours ago