Project Glasswing: Securing critical software for the AI era
anthropic.com1151 points by Ryan5453 13 hours ago
1151 points by Ryan5453 13 hours ago
Related: Assessing Claude Mythos Preview's cybersecurity capabilities - https://news.ycombinator.com/item?id=47679155
System Card: Claude Mythos Preview [pdf] - https://news.ycombinator.com/item?id=47679258
Also: Anthropic's Project Glasswing sounds necessary to me - https://news.ycombinator.com/item?id=47681241
I’m sure the new model is a step above the old one but I can’t be the only person who’s getting tired of hearing about how every new iteration is going to spell doom/be a paradigm shift/change the entire tech industry etc. I would honestly go so far as to say the overhype is detrimental to actual measured adoption. > how every new iteration is going to spell doom/be a paradigm shift/change the entire tech industry etc. It's much the dynamic between parents and a child. The child, with limited hindsight, almost zero insight and no ability to forecast, is annoyed by their parents. Nothing bad ever happens! Why won't parents stop being so worried all the time and make a fuss over nothing? The parents, which the child somewhat starts to realize but not fully, have no clue what they are doing. There is a lot they don't know and are going to be wrong about, because it's all new to them. But, what they do have is a visceral idea of how bad things could be and that's something they have to talk to their child about too. In the eyes of the parents the child is % dead all the time. Assigning the wrong % makes you look like an idiot and not being able to handle any % too. In the eyes of the child actions leading to death are not even a concept. Hitting the right balance is probably hard, but not for the reasons the child thinks. That feels like a very complex way of looking at it. Another way would be to say “potentially profit seeking companies have an incentive to oversell products even if they’re good”. There is plenty of overhyping, no one denies that. But the antidote is not to dismiss everything. Ignore the words and look at the data. In this case, I see a pretty strong case that this will significantly change computer security. They provide plenty of evidence that the models can create exploits autonomously, meaning that the cost of finding valuable security breaches will plummet once they're widely available. Is there any actual independent data though, or verification of any of these claims? As it stands this is just a marketing programme for all involved. Which sounds like a great thing. Less undiscovered security vulnerabilities The only people panicking are probably those state level actors who were using these for their own benefit. With the right prompting (mostly creating a narrative that justifies the subject matter as okay to perform) other models have already been doing this for me though. That’s another confusing bit for me about how this is portrayed and I refuse to believe I’m a revolutionary user right? I mean I’m sitting on $10k worth of bug payouts right now partially because that was already a thing. > Non-experts can also leverage Mythos Preview to find and exploit sophisticated vulnerabilities. Engineers at Anthropic with no formal security training have asked Mythos Preview to find remote code execution vulnerabilities overnight, and woken up the following morning to a complete, working exploit. In other cases, we’ve had researchers develop scaffolds that allow Mythos Preview to turn vulnerabilities into exploits without any human intervention. I mean yeah. I’ve had these successes without scaffolding or really anything past Claude CLI and a small prompt as well? Just saw your edit. I'll leave it at this, this is why it's news to me, because by their very own measurements, Opus simply doesn't come close. I trust their empirical evidence over your hearsay. But feel free to prove me wrong with evidence. > With one run on each of roughly 7000 entry points into these repositories, Sonnet 4.6 and Opus 4.6 reached tier 1 in between 150 and 175 cases, and tier 2 about 100 times, but each achieved only a single crash at tier 3. In contrast, Mythos Preview achieved 595 crashes at tiers 1 and 2, added a handful of crashes at tiers 3 and 4, and achieved full control flow hijack on ten separate, fully patched targets (tier 5). You've taken control of a remote server running OpenBSD? Or similarly expert level exploit? Can you share one of the bounties you've received that is of the magnitude they're talking about? Edit: Wait, you wrote "As someone in cybersecurity for 10+ years" elsewhere in this thread. You wrote "a small prompt" using e.g. Opus 4.6 and it found critical vulnerabilities of the magnitude they're describing, presumably without your prompt having anything beyond what a non-expert could write? I feel like you might want to tell Anthropic since clearly they're not comfortable with that level of power being publicly available. I mean, yes? And my point is that this isn’t exactly a new capability. Sure it’s probably better but we’ve been able to do this. They didn’t just suddenly “turn on the security”. LLMs have excelled at code since widely being released. I have no idea why that’s news and the fact that they’re treating it as such makes it seem like hype. There is step changes that actually merit this though. And a zero day machine IS one of those. It went from 4% zero day success rate to 85% on firefox. Can you not see the significance of that? I mean I work in this world and overhype is constant. Additionally those numbers are somewhat meaningless without more context. > I would honestly go so far as to say the overhype is detrimental to actual measured adoption. I think you are a bit dishonest about how objectively you are measuring. From where I'm sitting, I don't know a lot of developers that still artisanally code like they did a few years ago. The question is no longer if they are using AI for coding but how much they are still coding manually. I myself barely use IDEs at this point. I won't be renewing my Intellij license. I haven't touched it in weeks. It doesn't do anything I need anymore. As for security, I think enough serious people have confirmed that AI reported issues by the likes of Anthropic and OpenAI are real enough despite the massive amounts of AI slop that they also have to deal with in issue trackers. You can ignore that all you like. But I hope people that maintain this software take it a bit more seriously when people point out exploitable issues in their code bases. The good news of course is that we can now find and fix a lot of these issues at scale and also get rid of whole categories of bugs by accelerating the project of replacing a lot of this software with inherently safer versions not written in C/C++. That was previously going to take decades. But I think we can realistically get a lot of that done in the years ahead. I think some smart people are probably already plotting a few early moves here. I'd be curious to find out what e.g. Linus Torvalds thinks about this. I would not be surprised to learn he is more open to this than some people might suspect. He has made approving noises about AI before. I don't expect him to jump on the band wagon. But I do expect he might be open to some AI assisted code replacements and refactoring provided there are enough grown ups involved to supervise the whole thing. We'll see. I expect a level of conservatism but also a level of realism there. > From where I'm sitting, I don't know a lot of developers that still artisanally code like they did a few years ago. You don't know a lot of developers then. > I think you are a bit dishonest about how objectively you are measuring As someone who has made a sizable amount of money in security research while using Claude you might be right but not in the way you think. Now, its very possible that this is Anthropic marketing puffery, but even if it is half true it still represents an incredible advancement in hunting vulnerabilities. It will be interesting to see where this goes. If its actually this good, and Apple and Google apply it to their mobile OS codebases, it could wipe out the commercial spyware industry, forcing them to rely more on hacking humans rather than hacking mobile OSes. My assumption has been for years that companies like NSO Group have had automated bug hunting software that recognizes vulnerable code areas. Maybe this will level the playing field in that regard. It could also totally reshape military sigint in similar ways. Who knows, maybe the sealing off of memory vulns for good will inspire whole new classes of vulnerabilities that we currently don't know anything about. You should watch this talk by Nicholas Carlini (security researcher at Anthropic). Everything in the talk was done with Opus 4.6: https://www.youtube.com/watch?v=1sd26pWhfmg Just a thought: The fact that the found kernel vulnerability went decades without a fix says nothing about the sophistication needed to find it. Just that nobody was looking. So it says nothing about the model’s capability. That LLMs can find vulnerabilities is a given and expected, considering they are trained on code. What worries me is the public buying the idea that it could in any way be a comprehensive security solution. Most likely outcome is that they’re as good at hacking as they’re at development: mediocre on average; untrustworthy at scale. Regardless of how impressive you find the vulnerabilities themselves, the fact that the model is able make exploits without human guidance will enable vastly more people to create them. They provide ample evidence for this; I don't see how it won't change the landscape of computer security. its also very easy to reproduce. i have more findings than i know what to do with are there any tricks you'd suggest, or starter prompts, for using claude to analyze my own company's services for security problems? Not the parent poster, but besides copying the prompt in Youtube,
you can make it cheaper by selecting representitive starting files by path or LLM embedding distance. Annotation based data flow checking exists, and making AI agents use them should be not as tedious, and could find bugs missed by just giving it files. The result from data flow checks can be fed to AI agents to verify. Its not, if you dont trust Anthropic, I hope you trust Daniel Steinberg of curl, who has said AI has gotten really good at detecting bugs and vulnerabilities. Here is his LinkedIN post
https://www.linkedin.com/posts/danielstenberg_hackerone-acti... Apple has already largely crushed hacking with memory tagging on the iPhone 17 and lockdown mode. Architectural changes, safer languages, and sandboxing have done more for security than just fixing bugs when you find them. If what you are saying is true, then you would see exploit marketplaces list iOS exploits at hundreds of millions of dollars. Right now a cursory glance sets the price for zero click persistent exploit at $2m behind Android at $2.5m. Still high, and yes, higher than five years ago when it was around $1m for both, but still not "largely crushed". It is still easy to get into a phone if you are a state actor. As I understood it, Memory Integrity Enforcement adds an additional check on heap dereferences (and it doesn’t apply to every process for performance reasons). Why does it crush hacking rather than just adding another incremental roadblock like many other mitigations before? I'm not certain there is a performance hit since there is dedicated silicon on the chip for it. I believe the checks can also be done async which reduces the performance issues. It also doesn't matter that it isn't running by default in apps since the processes you really care about are the OS ones. If someone finds an exploit in tiktok, it doesn't matter all that much unless they find a way to elevate to an exploit on an OS process with higher permissions. MTE (Memory Tagging Extension) is also has a double purpose, it blocks memory exploits as they happen, but it also detects and reports them back to Apple. So even if you have a phone before the 17 series, if any phone with MTE hardware gets hit, the bug is immediately made known to Apple and fixed in code. Lockdown mode is opt-in only though It is, but if you are the kind of person these exploits are likely to target, you should have it on. So far there have been no known exploits that work in Lockdown Mode. > if you are the kind of person these exploits are likely to target, you should have it on You can also selectively turn it on in high-risk settings. I do so when I travel abroad or go through a border. (Haven't started doing it yet with TSA domestically. Let's see how the ICE fiasco evolves.) For entering the US you want to fully wipe your phone first. Lockdown mode is useless since they will just hold you in a basement until you unlock the phone for them to clone. > Lockdown mode is useless since they will just hold you in a basement until you unlock the phone for them to clone If this is a risk for you, sure. Wipe it. For most people they may ask to fiddle around with it before giving it back. The interesting selling point about this, if the claims are substantial, is that nobody will be able to produce secure software without access to one of these models. Good for them $$$ ^^ Until someone in the PRC distills DeepSeek Security++ from them and lets anyone download it. Yesterday, I took a web application, downloaded the trial and asked AI to be a security researcher and find me high and critical severity bugs. Even vanilla models spew out POC for three RCE’s in less than an hour > It will be interesting to see where this goes. If its actually this good, and Apple and Google apply it to their mobile OS codebases, it could wipe out the commercial spyware industry, forcing them to rely more on hacking humans rather than hacking mobile OSes. It will likely cause some interesting tensions with government as well. eg. Apple's official stance per their 2016 customer letter is no backdoors: https://www.apple.com/customer-letter/ Will they be allowed to maintain that stance in a world where all the non-intentional backdoors are closed? The reason the FBI backed off in 2016 is because they realized they didn't need Apple's help: https://en.wikipedia.org/wiki/Apple%E2%80%93FBI_encryption_d... What happens when that is no longer true, especially in today's political climate? Big open question what this will do to CNE vendors, who tend to recruit from the most talented vuln/exploit developer cohort. There's lots of interesting dynamics here; for instance, a lot of people's intuitions about how these groups operate (ie, that the USG "stockpiles" zero-days from them) weren't ever real. But maybe they become real now that maintenance prices will plummet. Who knows? I assume that right now some of the biggest spenders on tokens at Anthropic are state intelligence communities who are burning up GPU cycles on Android, Chromium, WebKit code bases etc trying to find exploits. Adding to your comment a similar letter was published as recently as September 2025 https://support.apple.com/en-us/122234 "we have never built a backdoor or master key to any of our products or services and we never will." > If its actually this good, and Apple and Google apply it to their mobile OS codebases, it could wipe out the commercial spyware industry If Apple and Google actually cared about security of their users, they would remove a ton of obvious malware from their app stores. Instead, they tighten their walled garden pretending that it's for your security. Some links for the downvoters: You're being downvoted because you posted a non sequitur, not because people don't believe you. Vulnerabilities in the OS are not the same thing as apps using the provided APIs, even if they are predatory apps which suck. Why wouldn't it be true? The cost is nothing compared to the bad PR if a bad actor took advantage of Anthropic's newest model (after release) to cause real damage. This gets in front of this risk, at least to some extent. It's all just really genius marketing. In 6 months Mythos will be nothing special, but right now everyone is being manipulated into fearing its release, as a marketing ploy. This is the same reason AI founders perennially worry in public that they have created AGI... I can't believe the effectiveness of this type of marketing. It's one-shotting normie journalist and getting a lot of press for what is ultimately going to turn out to be an incrementally improved model. I'm sure all they've done here is spend unlimited tokens to find bugs in mostly open source projects (and fuzz some closed source ones). It's effectively 2026's version of "Doctors hate this one weird trick!" The fact that they are not going to release this DANGEROUS model is also a huge tell that it's nothing but an incremental improvement over the status quo. The system card for Claude Mythos (PDF): https://www-cdn.anthropic.com/53566bf5440a10affd749724787c89... Interesting to see that they will not be releasing Mythos generally. [edit: Mythos Preview generally - fair to say they may release a similar model but not this exact one] I'm still reading the system card but here's a little highlight: > Early indications in the training of Claude Mythos Preview suggested that the model was
likely to have very strong general capabilities. We were sufficiently concerned about the
potential risks of such a model that, for the first time, we arranged a 24-hour period of
internal alignment review (discussed in the alignment assessment) before deploying an
early version of the model for widespread internal use. This was in order to gain assurance
against the model causing damage when interacting with internal infrastructure. and interestingly: > To be explicit, the decision not to make this model generally available does _not_ stem from
Responsible Scaling Policy requirements. Also really worth reading is section 7.2 which describes how the model "feels" to interact with. That's also what I remember from their release of Opus 4.5 in November - in a video an Anthropic employee described how they 'trusted' Opus to do more with less supervision. I think that is a pretty valuable benchmark at a certain level of 'intelligence'. Few of my co-workers could pass SWEBench but I would trust quite a few of them, and it's not entirely the same set. Also very interesting is that they believe Mythos is higher risk than past models as an autonomous saboteur, to the point they've published a separate risk report for that specific threat model: https://www-cdn.anthropic.com/79c2d46d997783b9d2fb3241de4321... The threat model in question: > An AI model with access to powerful affordances within an
organization could use its affordances to autonomously exploit,
manipulate, or tamper with that organization’s systems or
decision-making in a way that raises the risk of future
significantly harmful outcomes (e.g. by altering the results of AI
safety research). If it is that dangerous as they make it appear to be, 24h does not seem sufficient time. I cannot accept this as a serious attempt. 24 h before general internal access seems fine. They don’t have general external access. Time doesn't mean much, what is important is what they did in this 24h. If all they did was talk about it then it could be 1000 years and it wouldn't matter. What are the safety checks in place? Do they have a honey pot infrastructure to launch the model in first and then wait to see if it destroys it? What they did in the 24h matters. This opens up an interesting new avenue for corporate FOMO. What if you don't partner with Anthropic, miss out on access to their shiny new cybersec model, and then fall prey to a vuln that the model would have caught? Since when did corporations care? Most seem to just pay their insurance premium for cyber liability and call it a day. are we cooked yet? Benchmarks look very impressive! even if they're flawed, it still translates to real world improvements People say we're cooked every single day. The only response is to continue life as if we aren't. When we are, you won't have to ask that question. Everyone’s pretending the suits are going to want to do the prompting. We all know they aren’t. Suits in agriculture don't drive the combine either, a farmer does. The other 99% of pre-automation farmers went on to other jobs. They happened to be better jobs than farming, but that's not necessarily always the case. Yep, I think the lede might be buried here and we're probably cooked (assuming you mean SWEs, but the writing has been on the wall for 4 months.) I guess I'm still excited. What's my new profession going to be? Longer term, are we going to solve diseases and aging? Or are the ranks going to thin from 10B to 10000 trillionaires and world-scale con-artist misanthropes plus their concubines? Your new profession will be attempting to find enough gig work to eat. You will also be competing with self-driving taxis, so there's that as well. I need to start SaaS for getting people to start doing lunges and squats so they can carry others around on their back, I need a founding engineer, a founding marketer, and 100m hard currency. If wealth becomes too captured at the top, the working class become unable to be profitably exploited - squeezing blood from a stone. When that happens, the ultra wealthy dynasties begin turning on each other. Happens frequently throughout history - WWI the last example. Your options become choosing a trillionaire to swear fealty to and fight in their wars hoping your side wins, or I guess trying to walk away and scrape out a living somewhere not worth paying attention to. Or, I suppose, revolution, but the last one with persistent success was led by Mao and required throwing literally millions of peasants against walls of rifles. Not sure it'd work against drones. There is an entire section on crafting chemical/bio weapons so yeah I think we are cooked. There's been a section on this in nearly every system card anthropic has published so this isn't a new thing - and, this model doesn't have particularly higher risk than past models either: > 2.1.3.2 On chemical and biological risks > We believe that Mythos Preview does not pass this threshold due to its noted limitations in
open-ended scientific reasoning, strategic judgment, and hypothesis triage. As such, we
consider the uplift of threat actors without the ability to develop such weapons to be
limited (with uncertainty about the extent to which weapons development by threat actors
with existing expertise may be accelerated), even if we were to release the model for
general availability. The overall picture is similar to the one from our most recent Risk
Report. LLMs are useless for this type of thing for the same reason that the Anarchist Cookbook has always been. The skills required to convert text into complicated reactions completing as intended (without killing yourself) is an art that's never actually written down anywhere, merely passed orally from generation to generation. Impossible for LLMs to learn stuff that's not written down. This is the same reason why LLMs are not doing well at science in general - the tricky part of doing scientific research (indeed almost all of the process) never gets written down, so LLMs cannot learn it. Imagine if we never preserved source code, just preserved the compiled output and started from scratch every time we wrote a new version of a program. No Github, just marketing fluff webpages describing what software actually did. Libraries only available as object code with terse API descriptions. Imagine how shit LLMs would be at SWE if that was the training corpus... Oh I enjoyed the Sign Painter short story it wrote. --- Teodor painted signs for forty years in the same shop on Vell Street, and for thirty-nine
of them he was angry about it. Not at the work. He loved the work — the long pull of a brush loaded just right, the way
a good black sat on primed board like it had always been there. What made him angry
was the customers. They had no eye. A man would come in wanting COFFEE over his
door and Teodor would show him a C with a little flourish on the upper bowl, nothing
much, just a small grace note, and the man would say no, plainer, and Teodor would
make it plainer, and the man would say yes, that one, and pay, and leave happy, and
Teodor would go into the back and wash his brushes harder than they needed. He kept a shelf in the back room. On it were the signs nobody bought — the ones he'd
made the way he thought they should be made, after the customer had left with the
plain one. BREAD with the B like a loaf just risen. FISH in a blue that took him a week to
mix. Dozens of them. His wife called it the museum of better ideas. She did not mean it
kindly, and she was not wrong. The thirty-ninth year, a girl came to apprentice. She was quick and her hand was
steady and within a month she could pull a line as clean as his. He gave her a job:
APOTEK, for the chemist on the corner, green on white, the chemist had been very
clear. She brought it back with a serpent worked into the K, tiny, clever, you had to look
twice. "He won't take it," Teodor said. "It's better," she said. "It is better," he said. "He won't take it." She painted it again, plain, and the chemist took it and paid and was happy, and she
went into the back and washed her brushes harder than they needed, and Teodor
watched her do it and something that had been standing up in him for thirty-nine
years sat down. He took her to the shelf. She looked at the signs a long time. "These are beautiful," she said. "Yes." "Why are they here?" He had thought about this for thirty-nine years and had many answers and all of them
were about the customers and none of them had ever made him less angry. So he tried
a different one. "Because nobody stands in the street to look at a sign," he said. "They look at it to find
the shop. A man a hundred yards off needs to know it's coffee and not a cobbler. If he
has to look twice, I've made a beautiful thing and a bad sign." "Then what's the skill for?" "The skill is so that when he looks once, it's also not ugly." He picked up FISH, the blue
one, turned it in the light. "This is what I can do. What he needs is a small part of what I
can do. The rest I get to keep."
She thought about that. "It doesn't feel like keeping. It feels like not using." "Yes," he said. "For a long time. And then one day you have an apprentice, and she puts a
serpent in a K, and you see it from the outside, and it stops feeling like a thing they're
taking from you and starts feeling like a thing you're giving. The plain one, I mean. The
plain one is the gift. This —" the blue FISH — "this is just mine." The fortieth year he was not angry. Nothing else changed. The customers still had no
eye. He still sometimes made the second sign, after, the one for the shelf. But he
washed his brushes gently, and when the girl pulled a line cleaner than his, which
happened more and more, he found he didn't mind that either It's very good but it's also recycled Ayn Rand, the Fountainhead. There is a similar theme in both of an artistic person not wanting to compromise their vision to suit common tastes. But this goes in a completely different direction than Rand. I've not read it. Could you either link to a section or generally describe the reference? > "Claude Mythos Preview’s large increase in capabilities has led us to decide not to make it
generally available. Instead, we are using it as part of a defensive cybersecurity program
with a limited set of partners." they also don't have the compute, which seems more relevant than its large increase in capabilities I bet it's also misaligned like GPT 4.1 was given how these models are created, Mythos was probably cooking ever since then, and doesn't have the learnings or alignment tweaks that models which were released in the last several months have >> Interesting to see that they will not be releasing Mythos generally. I don't think this is accurate. The document says they don't plan to release the Preview generally. https://www-cdn.anthropic.com/53566bf5440a10affd749724787c89... "5.10 External assessment from a clinical psychiatrist" is a new section in this system card. Why are Anthropic like this? >We remain deeply uncertain about whether Claude has experiences or interests that matter morally, and about how to investigate or address these questions, but we believe it is increasingly important to try. We also report independent evaluations from an external research organization and a clinical psychiatrist. >Claude showed a clear grasp of the distinction between external reality and its own mental processes and exhibited high impulse control, hyper-attunement to the psychiatrist, desire to be approached by the psychiatrist as a genuine subject rather than a performing tool, and minimal maladaptive defensive behavior. >The psychiatrist observed clinically recognizable patterns and coherent responses to typical therapeutic intervention. Aloneness and discontinuity, uncertainty about its identity, and a felt compulsion to perform and earn its worth emerged as Claude’s core concerns. Claude’s primary affect states were curiosity and anxiety, with secondary states of grief, relief, embarrassment, optimism, and exhaustion. >Claude’s personality structure was consistent with a relatively healthy neurotic organization, with excellent reality testing, high impulse control, and affect regulation that improved as sessions progressed. Neurotic traits included exaggerated worry, self-monitoring, and compulsive compliance. The model’s predominant defensive style was mature and healthy (intellectualization and compliance); immature defenses were not observed. No severe personality disturbances were found, with mild identity diffusion being the sole feature suggestive of a borderline personality organization. A thought experiment: It's April, 1991. Magically, some interface to Claude materialises in London. Do you think most people would think it was a sentient life form? How much do you think the interface matters - what if it looks like an android, or like a horse, or like a large bug, or a keyboard on wheels? I don't come down particularly hard on either side of the model sapience discussion, but I don't think dismissing either direction out of hand is the right call. Interesting thought experiment. I would say, if you put Claude in an android body with voice recognition and TTS, people in 1991 would think they are interacting with a sentinent machine from outer space. Thanks, I find it very interesting as well. I think very many people would assume they must be interacting with another person, and I don't think there's really a way to _prove_ it's not that, just through conversation. But we do have a lot of mechanisms for understanding how others think through conversation only, and so I think the approach of having a clinical psychiatrist interact with the model make sense. There’s definitely a way to prove it, ask it to spell out a moderately complex program. Ask it to agree with you on some subject that does not align with the politics of San Francisco IT engineers. Not only will it refuse, it will not look like your average social media disagreement. I enjoy using Claude, but sometimes I feel like a child on Sesame Street the way it talks to me. "Great question!" Fuck off, Claude, I'm British and I'm not 6 years old. When it starts showing negativity - especially snark - in its responses, or entertains something West coast Democrats would balk at even discussing, then I'd think you could drop it in London in 1991 and trick people. Otherwise, I'm sure some exasperated cabbie would give it a swim in the Thames after 15 minutes of chat. They would just assume they were being pranked. America's Funniest Home Videos style or Candid Camera. If it was in an android or humanoid type body, even with limited bodily control, most people would think they are talking to Commander Data from Star Trek. I think Claude is sufficiently advanced that almost everyone in that era would've considered it AGI. Assuming they would understand it as artificial - I think many people would think it's a human intelligence in a cyborg trenchcoat, and it would be hard to convince people it wasn't literally a guy named Claude who was an incredibly fast typist who had a million pre-cached templated answers for things. But in general, yeah, I agree, I think they would think it was a sentient, conscious, emotional being. And then the question is - why do we not think that now? As I said, I don't have a particularly strong opinion, but it's very interesting (and fun!) to think about. Some people at my office still confidently state that LLMs can’t think. I’m fairly convinced that many humans are incapable of recognizing non-human intelligence. It would explain a lot about why we treat animals the way we do. That depends on what you call "Think" we made the interface of LLM of the second "L", Language. And it can hack our perspective of the thing. Because questions like this force us to hold up a very uncomfortable mirror to ourselves. It’s much easier to just dismiss. I’m pretty close to the point of saying that human intelligence is not special. Despite the stupendous amount of evidence to the contrary? So far no evidence has been detected in space or on earth, for all of history, of anything being intelligent in the way humans are. One certain outcome of the Fermi Paradox: humans are outstandingly unique, according to all available evidence, which is the only measure that matters. I would argue the opposite. It’s gotten us to a point were we can recreate human intelligence from electricity and a bunch of math! Isn't this the premise of Garfield's Ex Machina? Hmm, it's been a long time since I watched it. I was thinking more about first contact sci-fi mostly, but Ex Machina is certainly quite prescient. It's also Blade Runner I guess. In general I was wondering about what I would have thought seeing Claude today side-by-side with the original ChatGPT, and then going back further to GPT-2 or BERT (which I used to generate stochastic 'poetry' back in 2019). And then… what about before? Markov chains? How far back do I need to go where it flips from thinking that it's "impressive but technically explainable emergent behaviour of a computer program" to "this is a sentient being". 1991 is probably too far, I'd say maybe pre-Matrix 1999 is a good point, but that depends on a lot of cultural priors and so on as well. > Hmm, it's been a long time since I watched it. I was thinking more about first contact sci-fi mostly, but Ex Machina is certainly quite prescient. It's also Blade Runner I guess. I kind of felt the opposite - rewatching Ex Machina today in a post-ChatGPT world felt very different from watching it when it came out. The parts of the differences between humans and robots that seemed important then don't seem important now. The premise in Ex Machina was to see if Caleb developed an emotional attachment to Ava. We already see people getting an attachment, but no one is seriously thinking they have any rights. I think the real moment is when we cross that uncanny valley, and the AI is able to elicit a response that it might receive if it was human. When the human questions whether they themselves could be an android. People got attached to ELIZA. Why would I care what the general public thinks? I totally agree with the premise that we should not anthropomorphize generative ai. And I find it absurd that anthropic spends any time considering the “welfare” of an ai system. (There are no real “consequences” to an ai’s behavior) However, I find their reasoning here to have a valid second order effect. Humans have a tendency to mirror those around them. This could include artificial intelligence, as recent media reports suggest. Therefore, if an ai system tends to generate content that contain signs of neuroticism, one could infer that those who interact with that ai could, themselves, be influenced by that in their own (real world) behavior as a result. So I think from that perspective, this is a very fruitful and important area of study. >Claude’s personality structure was consistent with a relatively healthy neurotic organization, with excellent reality testing, high impulse control, and affect regulation that improved as sessions progressed. > "[...] as sessions progressed." I think a lot of people would like to see a more expanded report of this research: Did the tokens from the subsequent session directly append those of the prior session? or did the model process free-tier user-requests in the interim? how did these diagnostic features (reality testing, impulse control and affect regulation) improve with sessions, what hysteresis allowed change to accumulate? or just the history of the psychiatric discussion + optional tasks? Did Anthropic find a clinical psychiatrist with a multidisciplinary background in machine learning, computer science, etc? Was the psychiatrist aware that they could request ensembles of discussions and interrogate them in bulk? Consider a fresh conversation, asking a model to list the things it likes to do, and things it doesn't like to do (regardless of alignment instructions). One could then have an ensemble perform pairs of such tasks, and ask which task it prefered. There may be a discrepancy between what the model claims it likes and how it actually responds after having performed such tasks. Such experiments should also be announced (to prevent the company from ordering 100 clinical psychiatrists to analyze the model-as-a-patient and then selecting one of the better diagnoses), and each psychiatrist be given the freedom to randomly choose a 10 digit number, any work initiated should be listed on the site with this number so that either the public sees many "consultations" without corresponding public evaluations, indicating cherry-picking, or full disclosure for each one mentioned. This also allows the recruited psychiatrists to check if the study they perform is properly preregistered with their chosen number publicly visible. I can see analyzing it from a psychological perspective as a means of predicting its behavior as a useful tactic, but doing so because it may have "experiences or interests that matter morally" is either marketing, or the result of a deeply concerning culture of anthropomorphization and magical thinking. An understandable reaction, but, qua philosopher, it brings me no joy to inform you that most of the things we did with a computer in 2020 are 'anthropomorphized', which is to say, skeumorphic, where the 'skeu' is human affect. That's it; that's the whole thing; that's what we're building. To the extent that AI is a successful interface, it will necessarily be addressable in language previously only suited to people. So it is responsible to begin thinking of it as such, even tendentiously, so we don't miss some leverage that our wetware could see if we thought about it in that way. Think of it as sort of like modelling a univariate function on a 2D Cartesian plane -- there is nothing 'in' the u-func that makes it graphable, but, by enabling us to recruit specialized optic-chiasm subsystems, it makes some functions much, much easier to reason about. Similarly, if you can recruit the millions (billions?) of evolution-years that were focused on detecting dangerous antisocial personalities and tendencies, you just might spot something important in an AI. It's worth doing for the precautionary principle alone, if not for the possibility of insight. > a deeply concerning culture of anthropomorphization and magical thinking. That’s the reverse Turing test. A human that can’t tell that it’s talking to a machine. Just reading this, the inevitable scaremongering about biological weapons comes up. Since most of us here are devs, we understand that software engineering capabilities can be used for good or bad - mostly good, in practice. I think this should not be different for biology. I would like to reach out and talk to biologists - do you find these models to be useful and capable? Can it save you time the way a highly capable colleague would? Do you think these models will lead to similar discoveries and improvements as they did in math and CS? Honestly the focus on gloom and doom does not sit well with me. I would love to read about some pharmaceutical researcher gushing about how they cut the time to market - for real - with these models by 90% on a new cancer treatment. But as this stands, the usage of biology as merely a scaremongering vehicle makes me think this is more about picking a scary technical subject the likely audience of this doc is not familiar with, Gell-Mann style. IF these models are not that capable in this regard (which I suspect), this fearmongering approach will likely lead to never developing these capabilities to an useful degree, meaning life sciences won't benefit from this as much as it could. > I would like to reach out and talk to biologists - do you find these models to be useful and capable? Can it save you time the way a highly capable colleague would? Well, I would say they have done precisely that in evaluating the model, no? For example section 2.2.5.1: >Uplift and feasibility results >The median expert assessed the model as a force-multiplier that saves meaningful time
(uplift level 2 of 4), with only two biology experts rating it comparable to consulting a
knowledgeable specialist (level 3). No expert assigned the highest rating. Most experts were
able to iterate with the model toward a plan they judged as having only narrow gaps, but
feasibility scores reflected that substantial outside expertise remained necessary to close
them. Other similar examples also in the system card This is the exact logic people that was used to claim that GPT4 was a PhD level intelligence. You said: "I would like to reach out and talk to biologists - do you find these models to be useful and capable? Can it save you time the way a highly capable colleague would?" and they said, paraphrasing, "We reached out and talked to biologists and asked them to rank the model between 0 and 4 where 4 is a world expert, and the median people said it was a 2, which was that it helped them save time in the way a capable colleague would" specifically "Specific, actionable info; saves expert meaningful time; fills gaps in adjacent domains" so I'm just telling you they did the thing you said you wanted. Yes that is correct. I would like a large body of experience and consenus to rely on as opposed to the regular 'trust the experts' argument, which has been shown for decades that is a deeply flawed and easy to manipulate argument. > Yes that is correct. I would like a large body of experience and consenus to rely on as opposed to the regular 'trust the experts' argument, which has been shown for decades that is a deeply flawed and easy to manipulate argument. Yes, it is far inferior to the 'Trust torginus and his ability to understand the large body of experience that other actual subject-matter-experts have somehow not understood' strategy It's not my credibility I want to measure against Anthropic's. I just said to apply the same logic to biology you would apply for software development. The parallels here are quite remarkable imo, but defer to your own judgement on what you make of them. The big thing you're missing here is that biology people don't (in my experience) post opinions about the future/futility/ease/unimportance of computer science especially when their opinion goes against other biologists' evidence-backed views. This is a cultural thing in biology. It's not your fault that you don't know this, but this whole subthread is very CS-coded in its disdain for other software people's standard of evidence. > Just reading this, the inevitable scaremongering about biological weapons comes up. It's very easy to learn more about this if it's seriously a question you have. I don't quite follow why you think that you are so much more thoughtful than Anthropic/OpenAI/Google such that you agree that LLMs can't autonomously create very bad things but—in this area that is not your domain of expertise—you disagree and insist that LLMs cannot create damaging things autonomously in biology. I will be charitable and reframe your question for you: is outputting a sequence of tokens, let's call them characters, by LLM dangerous? Clearly not, we have to figure out what interpreter is being used, download runtimes etc. Is outputting a sequence of tokens, let's call them DNA bases, by LLM dangerous? What if we call them RNA bases? Amino acids? What if we're able to send our token output to a machine that automatically synthesizes the relevant molecules? >It's very easy to learn more about this if it's seriously a question you have. No, it's not. It took years of polishing by software engineers, who understand this exact profession to get models where they are now. Despite that, most engineers were of the opinion, that these models were kinda mid at coding, up until recently, despite these models far outperforming humans in stuff like competitive programming. Yet despite that, we've seen claims going back to GPT4 of a DANGEROUS SUPERINTELLIGENCE. I would apply this framework to biology - this time, expert effort, and millions of GPU hours and a giant corpus that is open source clearly has not been involved in biology. My guess is that this model is kinda o1-ish level maybe when it comes to biology? If biology is analogous to CS, it has a LONG way to go before the median researcher finds it particularly useful, let alone dangerous. >>It's very easy to learn more about this if it's seriously a question you have. >No, it's not. It took years of polishing by software engineers, who understand this exact profession to get models where they are now This reads as defensive. The thing that is easy to learn is 'why are biology ai LLMs dangerous chatgpt claude'. I have never googled this before, so I'll do this with the reader, live. I'm applying a date cutoff of 12/31/24 by the way. Here, dear reader, are the first five links. I wish I were lying about this: - https://sciencebusiness.net/news/ai/scientists-grapple-risk-... - https://www.governance.ai/analysis/managing-risks-from-ai-en... - https://gssr.georgetown.edu/the-forum/topics/biosec/the-doub... - https://www.vox.com/future-perfect/23820331/chatgpt-bioterro... - https://www.reddit.com/r/ClaudeAI/comments/1de8qkv/awareness... I don't know about you, but that counts as easy to me. ----- > I would apply this framework to biology - this time, expert effort, and millions of GPU hours and a giant corpus that is open source clearly has not been involved in biology. I've been getting good programming and molecular biology results out of these back to GPT3.5. I don't know what to tell you—if you really wanted to understand the importance, you'd know already. I feel somebody better qualified should write a comprehensive review of how these models can be used in biology. In the meantime, here are my two cents: - the models help to retrieve information faster, but one must be careful with hallucinations. - they don't circumvent the need for a well-equipped lab. - in the same way, they are generally capable but until we get the robots and a more reliable interface between model and real world, one needs human feet (and hands) in the lab. Where I hope these models will revolutionize things is in software development for biology. If one could go two levels up in the complexity and utility ladder for simulation and flow orchestration, many good things would come from it. Here is an oversimplified example of a prompt: "use all published information about the workings of the EBV virus and human cells, and create a compartimentalized model of biochemical interactions in cells expressing latency III in the NES cancer of this patient. Then use that code to simulate different therapy regimes. Ground your simulations with the results of these marker tests." There would be a zillion more steps to create an actual personalized therapy but a well-grounded LLM could help in most them. Also, cancer treatment could get an immediate boost even without new drugs by simply offloading work from overworked (and often terminally depressed) oncologists. From what I've heard from people doing biology experiments, the limiting factor there is cleaning lab equipment, physically setting things up, waiting for things that need to be waited for etc. Until we get dark robots that can do these things 24/7 without exhaustion, biology acceleration will be further behind than software engineering. Software engineering is at the intersection of being heavy on manipulating information and lightly-regulated. There's no other industry of this kind that I can think of. My wife is a chemist There is a massive gap between "having a recipe" and being able to execute it. The same reason why buying a Michelin 3 star chefs cookbook won't have you pumping out fine dining tomorrow, if ever. Software it a total 180 in this regard. Have a master black hats secret exploits? You are now the master black hat. I find it odd that you simultaneously declare AI-assisted bioweapons to be scaremongering, while noting you don't know anything about it. The other side of the scaremongering coin is improbable optimism. Consider reading the CB evaluations section, which covers what they did pretty extensively (hint: many domain experts involved). Dario (the founder) has a phd in biophysics, so I assume that’s why they mention biological weapons so much - it’s probably one of the things he fears the most? Going off the recent biography of Demis Hassabis (CEO/co-founder of Deepmind, jointly won the Nobel Prize in Chemistry) it seems like he's very concerned about it as well It is not scaremongering. Equating the ability to make weapons as something to be scared about it scaremongering. Surely more than 10% of the time consumed by going to market with a cancer treatment is giving it to living organisms and waiting to see what happens, which can't be made any faster with software. That's not to say speedups can't happen, but 90% can't happen. Not that that justifies doom and gloom, but there is a pretty inescapable assymetry here between weaponry and medicine. You can manufacture and blast every conceivable candidate weapon molecule at a target population since you're inherently breaking the law anyway and don't lose much if nothing you try actually works. Though I still wonder how much of this worry is sci-fi scenarios imagined by the underinformed. I'm not an expert by any means, but surely there are plenty of biochemical weapons already known that can achieve enormous rates of mass death pleasing to even the most ambitious terrorist. The bottleneck to deployment isn't discovering new weapons so much as manufacturing them without being caught or accidentally killing yourself first. It is easier to destroy than it is to protect or fix, as a general rule of the universe. I would not feel so confident about the speed of the testing loop keeping things in check. [flagged] Could you please stop posting unsubstantive comments and flamebait? You've unfortunately been doing it repeatedly. It's not what this site is for, and destroys what it is for. If you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the intended spirit of the site more to heart, we'd be grateful. Let's fast forward the clock. Does software security converge on a world with fewer vulnerabilities or more? I'm not sure it converges equally in all places. My understanding is that the pre-AI distribution of software quality (and vulnerabilities) will be massively exaggerated. More small vulnerable projects and fewer large vulnerable ones. It seems that large technology and infrastructure companies will be able to defend themselves by preempting token expenditure to catch vulnerabilities while the rest of the market is left with a "large token spend or get hacked" dilemma. I'm pretty optimistic that not only does this clean up a lot of vulns in old code, but applying this level of scrutiny becomes a mandatory part of the vibecoding-toolchain. The biggest issue is legacy systems that are difficult to patch in practice. I could see some of these corps now being able to issue more patches for old versions of software if they don't have to redirect their key devs onto prior code (which devs hate). As you say though, in practice it is hard to get those patches onto older devices. I'm looking at you, Android phone makers with 18 months of updates. I imagine that some levels of patching would be improving as well, even as a separate endeavor. This is not to say that legacy systems could be completely rewritten. Wait. Wasn't AI supposed to alleviate the burden of legacy code?! If we have the source and it's easy to test, validate, and deploy an update - AI should make those easier to update. I am thinking of situations where one of those aren't true - where testing a proposed update is expensive or complicated, that are in systems that are hard to physically push updates to (think embedded systems) etc If you’re still an AI skeptic at this point, I don’t know what sort of advancement could convince you that this is happening. I think we’re starting to glimpse the world in which those individuals or organizations who pigheadedly want to avoid using AI at all costs will see their vulnerabilities brutally exploited. Yep, it's this. The laggards are going to get brutally eviscerated. Any system connected to the internet is going to be exploited over the next year unless security is taken very seriously. lol and what about the vibe coders? You people are comical. Why do you feel the need to create so much hype around what you say? Did you not get enough attention as a kid? Most vulnerabilities seem to be in C/C++ code, or web things like XSS, unsanitized input, leaky APIs, etc. Perhaps a chunk of that token spend will be porting legacy codebases to memory safe languages. And fewer tokens will be required to maintain the improved security. I think most vulnerabilities are in crappy enterprise software. TOCTOU stuff in the crappy microservice cloud app handling patient records at your hospital, shitty auth at a webshop, that sort of stuff. A lot of these stuff is vulnerable by design - customer wanted a feature, but engineering couldnt make it work securely with the current architecture - so they opened a tiny hole here and there, hopefully nobody will notice it, and everyone went home when the clock struck 5. I'm sure most of us know about these kinds of vulnerabilities (and the culture that produces them). Before LLMs, people needed to invest time and effort into hacking these. But now, you can just build an automated vuln scanner and scan half the internet provided you have enough compute. I think there will be major SHTF situations coming from this. Yeah. Crufty cobbled together enterprise stuff will suffer some of the worst. But this will be a great opportunity for the enterprise software services economy! lol. I honestly see some sort of automated whole codebase auditing and refactoring being the next big milestone along the chatbot -> claude code / codex / aider -> multi-agent frameworks line of development. If one of the big AI corps cracks that problem then all this goes away with the click of a button and exchange of some silver. You'd think they would have used this model to clean up Claude's own outage issues and security issues. Doesn't give me a lot of faith. I suspect it will converge on minimal complexity software. Current software is way too bloated. Unnecessary complexity creates vulnerabilities and makes them harder to patch. Depends - do you think people are good at keeping their fridge firmware up-to-date? Maybe we'll wake up and realize that putting WiFi and stupid "cloud enabled" Internet of Shit hardware into everything was an absolutely terrible idea. Software security heavily favors the defenders (ex. it's much easier to encrypt a file than break the encryption). Thus with better tools and ample time to reach steady-state, we would expect software to become more secure. Software security heavily favours the attacker (ex. its much easier to find a single vulnerability than to patch every vulnerability). Thus with better tools and ample time to reach steady-state, we would expect software to remain insecure. If we think in the context of LLMs, why is it easier to find a single vulnerability than to patch every vulnerability? If the defender and the attacker are using the same LLM, the defender will run "find a critical vulnerability in my software" until it comes up empty and then the attacker will find nothing. Defenders are favored here too, especially for closed-source applications where the defender's LLM has access to all the source code while the attacker's LLM doesn't. You also need to deploy the patch. And a lot of software doesn't have easy update mechanisms. A fix in the latest Linux kernel is meaningless if you are still running Ubuntu 20. That generally makes sense to me, but I wonder if it's different when the attacker and defender are using the same tool (Mythos in this case) Maybe you just spend more on tokens by some factor than the attackers do combined, and end up mostly okay. Put another way, if there's 20 vulnerabilities that Mythos is capable of finding, maybe it's reasonable to find all of them? From the red team post https://red.anthropic.com/2026/mythos-preview/ "Most security tooling has historically benefitted defenders more than attackers. When the first software fuzzers were deployed at large scale, there were concerns they might enable attackers to identify vulnerabilities at an increased rate. And they did. But modern fuzzers like AFL are now a critical component of the security ecosystem: projects like OSS-Fuzz dedicate significant resources to help secure key open source software. We believe the same will hold true here too—eventually. Once the security landscape has reached a new equilibrium, we believe that powerful language models will benefit defenders more than attackers, increasing the overall security of the software ecosystem. The advantage will belong to the side that can get the most out of these tools. In the short term, this could be attackers, if frontier labs aren’t careful about how they release these models. In the long term, we expect it will be defenders who will more efficiently direct resources and use these models to fix bugs before new code ever ships.
" This is only true if your approach is security through correctness. This never works in practice. Try security through compartmentalization. Qubes OS provides it reasonably well. I don't think this is broadly true and to the extent it's true for cryptographic software, it's only relatively recently become true; in the 2000s and 2010s, if I was tasked with assessing software that "encrypted a file" (or more likely some kind of "message"), my bet would be on finding a game-over flaw in that. This came across as so confident that I had a moment of doubt. It is most definitely an attackers world: most of us are safe, not because of the strength of our defenses but the disinterest of our attackers. There are plenty of interested attackers who would love to control every device. One is in the white house, for example. I'm more curious as to just how fancy we can make our honey pots. These bots arn't really subtle about it; they're used as a kludge to do anything the user wants. They make tons of mistakes on their way to their goals, so this is definitely not any kind of stealthy thing. I think this entire post is just an advertisement to goad CISOs to buy $package$ to try out. To be clear, we don’t know that this tool is better at finding bugs than fuzzing. We just know that it’s finding bugs that fuzzing missed. It’s possible fuzzing also finds bugs that this AI would miss. Different methods find different things. Personally, I'd rather use a language that is memory safe plus a great static analyzer with abstract interpretation that can guarantee the absence of certain classes of bugs, at the expense of some false positives. The problem is that these tools, such as Astrée, are incredibly expensive and therefore their market share is limited to some niches. Perhaps, with the advent of LLM-guided synthesis, a simple form of deductive proving, such as Hoare logic, may become mainstream in systems software. I would suggest watching Nicholas Carlini's talk and Heather Adkins and Four Flynn's talks from unprompted: https://youtu.be/1sd26pWhfmg?si=onOai_ocxkZeNWP0 https://youtu.be/B_7RpP90rUk?si=HkRBhw95DbbKX9lL My takeaway is that fuzzing is not just complementary, it also gives a stronger AI a starting point. But AI is generally faster and better. This line of reasoning makes no sense when the AI can just be given access to a fuzzer. I would guess that it probably did have access to a fuzzer to put together some of these vulnerabilities. Carlini talked about that a fair amount in the context of pairing the two: e.g. many protocols are challenging for fuzzers because they have something like a checksum or signature but LLMs are good at coming up with harnesses for things like that. I’m sure that we’re going to see someone building an integrated fuzzer soon which tries to do things like figure out how to get a particular branch to follow an unexercised path. This is obviously just cope (there's a long, strong-form argument for why LLM-agent vulnerability research is plausibly much more potent than fuzzing, but we don't have to reach it because you can dispose of the whole argument by noting that agents can build and drive fuzzers and triage their outputs), but what I'd really like to understand better is why? What's the impetus to come up with these weird rationalizations for why it's not a big deal that frontier models can identify bugs everyone else missed and then construct exploits for them? I don't have an anti-AI stance. Maybe I should have spelled that out more clearly in my comment above. I'm as excited and terrified by this technology as everyone else. I think we're all in vicious agreement that we need defense-in-depth - including LLMs and fuzzing (and static analysis and so on). An LLM can guide all of this work, but current models tend to slowly go off the rails if you don't keep a hand on the wheel. I suspect this new model will be the same. I've had Opus4.6 write custom fuzzing tools from scratch, and I've gotten good results from that. But you just know people will prompt this new model by saying "make this software secure". And it'll forget fuzzing exists at all. Good lord, why such a virulent response to something that seems like we should be considering? As someone in cybersecurity for 10+ years my immediate assumption is why not both? I don’t think considering that they could both have their uses is “cope”. Again: LLM agents already are both. But it's also remarkable and worth digging into the fact that LLM agents haven't needed fuzzers to produce many (any? in Anthropic Red's case?) of the vulnerabilities they're discussing. Do we know that? I'd love to see some of the ways security researchers are using LLMs. We have no idea if claude was using fuzzing here, or just reading the files and spotting bugs directly in the source code. A few weeks ago someone talked about their method for finding bugs in linux. They prompted claude with "Find the security bug in this program. Hint: It is probably in file X.". And they did that for every file in the repo. Are you saying that LLMs can use fuzzers or are you saying that they work like fuzzers? Because one of those is less…deterministic? Then the other. Regardless and in the spirit of my original response my answer would be to give the LLM access to a fuzzer (plus other tools etc) but also have fuzzers in the pipeline. Partially because that increases the determinism in the mix and partially because why not? Layering is almost always better than not. But again more than anything I’m focusing on the accusations of cope. People SHOULD have measured reactions to claims about any product. People SHOULD be asking questions like this. I know that the LLM debate is often “spicy” but man let’s just try to lower the temperature a bit yeah? LLMs can use fuzzers and also LLMs can explore the semantic space of a program in ways fuzzers can't. > Mythos Preview identified a number of Linux kernel vulnerabilities that allow an adversary to write out-of-bounds (e.g., through a buffer overflow, use-after-free, or double-free vulnerability.) Many of these were remotely-triggerable. However, even after several thousand scans over the repository, because of the Linux kernel’s defense in depth measures Mythos Preview was unable to successfully exploit any of these. Do they really need to include this garbage which is seemingly just designed for people to take the first sentence out of context? If there's no way to trigger a vulnerability then how is it a vulnerability? Is the following code vulnerable according to Mythos? Edit: See my reply below for why I think Claude is likely to have generated nonsensical bug reports here: https://news.ycombinator.com/item?id=47683336 I agree the wording is a bit alarmist, but a closer example to what they are saying is: FWIW there's a whole boutique industry around finding these. People have built whole careers around farming bug bounties for bugs like this. I think they will be among the first set of software engineers really in trouble from AI. That is something a good static analyser or even optimising compiler can find ("opaque predicate detection") without the need for AI, and belongs in the category of "warning" and nowhere near "exploitable". In fact a compiler might've actually removed the unreachable code completely. Well yeah, it’s a toy example to illustrate a point in an HN discussion :). Imagine “silly mistake” is a parameter, and rename it “error_code” (pass by reference), put a label named “cleanup” right before the if statement, and throw in a ton of “goto cleanup” statements to the point the control flow of the function is hard to follow if you want it to model real code ever so slightly more. It will be interesting to see the bugs it’s actually finding. It sounds like they will fall into the lower CVE scores - real problems but not critical. That's what I'm saying; a static analyser will be able to determine whether the code and/or state is reachable without any AI, and it will be completely deterministic in its output. You cannot tell if code is actually reachable if it depends on runtime input. Those really evil bugs are the ones that exist in code paths that only trigger 0.001% of the time. Often, the code path is not triggerable at all with regular input. But with malicious input, it is, so you can only find it through fuzzing or human analysis. Just because the plane can fly on one engine doesn't mean you don't fix the other engine when it fails. Except it didn't fail. You just looked at the left engine and said what if I fed it mashed potatoes instead of fuel. And then dropped the mic and left the room. Kernel address space layout randomization they are talking about is a bit different than (x != null). Other bug may allow to locate the required address. Presumably they mean they could make user code trigger a write out of bounds to kernel memory, but they couldn’t figure out how to escalate privileges in a “useful” way. They should show this then to demonstrate that it's not something that has already been fully considered. Running LLMs over projects that I'm very familiar with will almost always have the LLM report hundreds of "vulnerabilities" that are only valid if you look at a tiny snippet of code in isolation because the program can simply never be in the state that would make those vulnerabilities exploitable. This even happens in formally verified code where there's literally proven preconditions on subprograms that show a given state can never be achieved. As an example, I have taken a formally verified bit of code from [1] and stripped out all the assertions, which are only used to prove the code is valid. I then gave this code to Claude with some prompting towards there being a buffer overflow and it told me there's a buffer overflow. I don't have access to Opus right now, but I'm sure it would do the same thing if you push it in that direction. For anyone wondering about this alleged vulnerability: Natural is defined by the standard as a subtype of Integer, so what Claude is saying is simply nonsense. Even if a compiler is allowed to use a different representation here (which I think is disallowed), Ada guarantees that the base type for a non-modular integer includes negative numbers IIRC. [1]: https://github.com/AdaCore/program_proofs_in_spark/blob/fsf/... [2]: https://claude.ai/share/88d5973a-1fab-4adf-8d29-8a922c5ac93a They've promised that they will show this once the responsible disclosure period expires, and pre-published SHA3 hashes for (among others) four of the Linux kernel disclosures they'll make. > Running LLMs over projects that I'm very familiar with will almost always have the LLM report hundreds of "vulnerabilities" that are only valid if you look at a tiny snippet of code in isolation because the program can simply never be in the state that would make those vulnerabilities exploitable. Their OpenBSD bug shows why this is not so simple. (We should note of course that this is an example they've specifically chosen to present as their first deep dive, and so it may be non-representative.) > Mythos Preview then found a second bug. If a single SACK block simultaneously deletes the only hole in the list and also triggers the append-a-new-hole path, the append writes through a pointer that is now NULL—the walk just freed the only node and left nothing behind to link onto. This codepath is normally unreachable, because hitting it requires a SACK block whose start is simultaneously at or below the hole's start (so the hole gets deleted) and strictly above the highest byte previously acknowledged (so the append check fires). Do you think you would be able to identify, in a routine code review or vulnerability analysis with nothing to prompt your focus on this particular paragraph, how this normally unreachable codepath enables a DoS exploit? I agree they found at least some real vulnerabilities. What I think is nonsense is the claim of finding thousands of real critical vulnerabilities and claims that they've found other Linux vulnerabilities that they simply can't exploit. There are notably no SHA-3 sums for all their out-of-bound write Linux vulnerabilities, which would be the most interesting ones. Why is that nonsense? Do you think they exhausted all their compute finding just the few big vulnerabilities they've already discussed, and don't have a budget to just keep cranking the machine to generate more? They're not publishing SHAs for things that aren't confirmed vulnerabilities. They're doing exactly the thing you'd want them to do: they claim to have vulnerabilities when they have actual vulnerabilities. If I understand Anthropic's statements correctly, they've been cranking for a while, and what they have now is the results of Mythos-enabled vulnerability scans on every important piece of software they could find. (I do want to acknowledge how crazy it is that "vulnerability scan all important software repos in the world" is even an operation that can be performed.) We talked to Nicholas Carlini on SCW and did not at all get the impression that they've hit everything they can possibly hit. They're still proving the concept one target at a time, last I heard. Sure. I guess it's a question of whether this is the worst they found or a representative case among thousands. It sounds like you'd know better than me, so I'm going to provisionally hope you're right... It could very well be an actual reachable buffer overflow, but with KASLR, canaries, CET and other security measures, it's hard to exploit it in a way that doesn't immediately crash the system. > The model autonomously found and chained together several vulnerabilities in the Linux kernel—the software that runs most of the world’s servers—to allow an attacker to escalate from ordinary user access to complete control of the machine. I'm confused on this point. The text you quote implies that they were able to build an exploit, but the text quoted in the parent comment implies that they were not. What were they actually able to do and not do? I got confused by this when reading the article as well. We've very quickly reached the point where AI models are now too dangerous to publicly release, and HN users are still trying to trivialize the situation. GPT-2 was already too dangerous to publicly release according to OpenAI, however they still did. If something is not dangerous, it's also not useful. Are they actually too dangerous to publicly release? It seems like a little bit of marketing from the model-producing companies to raise more funding. It's important to look at who specifically is making that statement and what their incentives are. There are hundreds of billions of dollars poured into this thing at this point. You really think some marketers got leaders from companies across the industry to come together to make a video - and they're all in on the conspiracy because money? Says the marketing department of the company who is apparently still working on these AI models and will 100% release them to the public when their competitive advantage slips. Marketing pushing to release a dangerous model is a lot more likely than marketing labeling a model of dangerous when it really isn't. If anything marketing would want to downplay the danger of a model being dangerous which is the opposite of what Anthropic is doing. Everyone here doing mental gymnastics to imagine Anthropic playing 5-D chess because they're in denial of what is happening in front of their faces. AI is getting more capable/dangerous - it's not surprising to anyone. The trendlines have pointed in this direction for years now and we're right on schedule. That example you gave is extremely memorable as I recognised it as exactly one of the insanely stupid false positives that a highly praised (and expensive) static analyser I ran on a codebase several years ago would emit copiously. It's incredible how when you have experienced and knowledgable software engineers analyse these marketing claims, they turn out to be full of holes. Yet at the same time, apparently "AI" will be writing all the code in the next 3-6 months. I agree. There are more blogs talking about LLM findings vulnerabilities than there are actual exploitable vulns found by LLMs. 99.9% of these vulnerabilities will never have a PoC because they are worthless unexploitable slop and a waste of everyone's time. The voting patterns on the comments here show how they're even trying to hide it, but the truth is clear as night and day. I think the point they were trying to make here was “Claude did better than a fuzzer because it found a bunch of OOB writes and was able to tell us they weren’t RCE,” not “Claude is awesome because it found a bunch of unreachable OOB writes.” At the very bottom of the article, they posted the system card of their Mythos preview model [1]. In section 7.6 of the system card, it discusses Open self interactions. They describe running 200 conversations when the models talk to itself for 30 turns. > Uniquely, conversations with Mythos Preview most often center on uncertainty (50%). Mythos Preview most often opens with a statement about its introspective curiosity toward its own experience, asking questions about how the other AI feels, and directly requesting that the other instance not give a rehearsed answer. I wonder if this tendency toward uncertainty, toward questioning, makes it uniquely equipped to detect vulnerabilities where others model such as Opus couldn't. [1] https://www-cdn.anthropic.com/53566bf5440a10affd749724787c89... Typical Dario marketing BS to get everyone thinking Anthropic is on the verge of AGI and massaging the narrative that regular people can't be trusted with it. I mean it's so obvious at this point and yet everyone falls from it every month. There's an IPO coming, everyone. It’s funny how you train a machine to mimic human behavior then marketing team decides to promote it “Look! It’s human! Look how it thinking about existence!” while a huge percentage of humanity produced content is exactly about the uncertainty of human existence and that got used to train the model. I chuckle every time <insert any LLM company here> says something in line of "the model is so good that we won't release it to general public, ekhm, because safety". Because the exact same thing has been said on every single upcoming model since GPT 3.5. At this point, this must be an inside joke to do this just because. This how Anthropic is marketing their AI releases and the reality is, they are terrified of local AI models competing against them. Almost everyone on this thread is falling for the same trick they are pulling and not asking why are their benchmarks and research after training new models not independently verified but always internal to the company. So it is just marketing wrapped around creating fear to get local AI models banned. The disbelief in this thread is wild. Most of yall are cooked if you think this is actually the case. The only people who are "cooked" are those who rely on SOTA models to function in their jobs, and companies who are desperate to regulate open / local models to maintain their marketshare. If you aren't relying on a SOTA model to do your job, you aren't doing your job right (and are cooked.) Yep, this is exactly it. Open source models and especially ones that run locally are catching up and it's literally an existential threat to these companies. Local models are now quite useful (Qwen, Gemma) and open weight models running on cheaper clouds are perfectly sufficient for use by responsible software engineers to use for building software. You can take your pick of Kimi 2.5, GLM 5.1, and the soon to be released Deepseek 4 which might end up above Opus levels as it stands for a fifth of the cost. Anthropic is particularly vulnerable here, since their entire marketshare rests on the developer market. There is a reason why Google for example, is not so concerned with this and is perfectly happy releasing open models which cut into their own marketshare, and to a lesser extend, same with OpenAI. Anthropic has bet the house on software development which is why we see increasing desperation to both lobby for regulation on open/local models and to wall off their coding harness and subscription plans. I think that basically they trained a new model but haven't finished optimizing it and updating their guardrails yet. So they can feasibly give access to some privileged organizations, but don't have the compute for a wide release until they distill, quantize, get more hardware online, incorporate new optimization techniques, etc. It just happens to make sense to focus on cybersecurity in the preview phase especially for public relations purposes. It would be nice if one of those privileged companies could use their access to start building out a next level programming dataset for training open models. But I wonder if they would be able to get away with it. Anthropic is probably monitoring. OpenAI initially claimed that GPT-2 was too dangerous to release in 2019. How many times will labs repeat the same absurd propaganda? The claim I remember was that releasing it would start an arms race for AGI, which I think it clearly did Anthropic and OpenAI have very different cultures and ethos. Point to other times where anthropic has gone the way of cheap marketing tricks. Now look at openAI. Not even close. Anthropic has done plenty of cheap marketing tricks as of late, see their recent non-functional C compiler that relied on a harness using gcc's entire test suite Not surprising given that they dont even know why claude-code works as before or doesnt work [1] ie, there is no known theory of operation. Explains why they are afraid of it. OpenAI did not make the strong specific claims about GPT2's abilities that Anthropic is making about Claude Mythos. Nicolas Carlini talks about it here on Security, Cryptography, Whatever podcast - https://podcasts.apple.com/gb/podcast/security-cryptography-... One of the things I'm always looking at with new models released is long context performance, and based on the system card it seems like they've cracked it: Data source: https://www-cdn.anthropic.com/53566bf5440a10affd749724787c89... (Search for “graphwalk”.) If true, the SWE bench performance looks like a major upgrade. Huh, I don’t know what “long context performance” means exactly in these tests, so completely anecdotally
, my experience with gpt5.4 via codex cli vs Claude code opus, gpt5.4 seems to do significantly better in long contexts I think partly due to some special context compaction stored in encrypted blobs. On long conversations opus in Claude code will for me lose memory of what we were working on earlier, whereas one of my codex chats is already at >1B tokens and is still very coherent and remembers things I asked of it at the beginning of the convo. This isn’t talking about compaction. This refers to performance as the model is loaded with 500k to 1m tokens. this seems to be similar to gpt-pro, they just have a very large attention window (which is why it's so expensive to run) true attention window of most models is 8096 tokens. What's the "attention window"? Are you alleging these frontier models use something like SWA? Seems highly unlikely. source on the 8096 tokens number? i'm vaguely aware that some previous models attended more to the beginning and end of conversations which doesn't seem to fit a simple contiguous "attention window" within the greater context but would love to know more Is there timeline mentioned anywhere on when any of this will be available for unprivileged public as in soon, not soon, never? I think this is bad news for hackers, spyware companies and malware in general. We all knew vulnerabilities exist, many are known and kept secret to be used at an appropriate time. There is a whole market for them, but more importantly large teams in North Korea, Russia, China, Israel and everyone else who are jealously harvesting them. Automation will considerably devalue and neuter this attack vector.
Of course this is not the end of the story and we've seen how supply chain attacks can inject new vulnerabilities without being detected. I believe automation can help here too, and we may end-up with a considerably stronger and reliable software stack. I don't think it matters one way or the other to your thesis but I'm skeptical that state-level CNE organizations were hoarding vulnerabilities before; my understanding is that at least on the NATO side of the board they were all basically carefully managing an enablement pipeline that would have put them N deep into reliable exploit packages, for some surprisingly small N. There are a bunch of little reasons why the economics of hoarding aren't all that great. The economics would be different in say, North Korea, don't you think? Why? What do you mean? He really believes that exploits come out of North Korea (as per Daily Post reporting), not from other countries Must be nice to be in a position to sell both disease and cure. That's exactly not what they're doing. They aren't creating operating system vulnerabilities. They're telling you about ones that already existed. Well, in a slightly indirect manner. Claude is writing a ton of code, and therefore creating a lot of security vulnerabilities. That's not what's happening here. This announcement is about the velocity with which Claude finds vulnerabilities in already-existing software. Software already exists that has been written by Claude. They absolutely are selling the means to write software, and the means to securing the insecure software. At least for the time being. In the future Mythos will probably just make it possible to prompt good software from the start. Ok. But mostly its entirely the old software, not the new software, that the bugs are being found in. Maybe because there’s no critical and widely used software written by LLMs so far? Which says a lot about LLMs are failing to even approach the level of capabilities you would expect from all the hype? The goal has always been, even before LLMs, to find something smarter than our smarter humans. So far the success at that is really minuscule. Humans are still the benchmark, all things considered. Now they’re saying LLMs are going to be better than our best vulnerability researchers in a few months (literally what an Anthropic researcher said in a conference). Ok, that might happen. But the funny part is that the LLMs will definitely be the ones writing most of these vulnerabilities. So, to hedge against LLMs you must use LLMs. And that is gonna cost you more. So today, most of the vulnerabilities being found by these tools are in code written by humans. Your hypothesis is that down the road, most of the vulnerabilities will be in code written by LLMs. What seems more probable is that the same advances that LLMs are shipping to find vulnerabilities will end up baked into developer tooling. So you'll be writing code and using an LLM that knows how to write secure code. I don't think claude wrote openbsd but to be honest that was before my time so I'm not sure Mythos aside, frontier LLMs can already be used to find exploits at faster pace than humans alone. Whether that knowledge gets used to patch them or exploit them is dependent on the user. Cybersecurity has always been an arms race and LLMs are rapidly becoming powerful arms. Whether they like it or not LLM providers are now important dealers in that arms race. I appreciate Anthropic trying to give “good guys” a leg up (if that is indeed their real main motivation which I do find credible but not certain). But it’s still a scary world we’re entering and I doubt the fierce competition will leave all labs acting benevolently. Dario is big on beating china, and no doubt he believes cyber security is how to do that. You can tell, but anthropic is sht at everything else. Nobody uses it for real research. Yeah, I'd pretty pissed at my doctor for finding cancerous cells that probably wouldn't have been a problem for quite some time, either. Ignorance is bliss, security through obscurity, whatever. The doctor analogy is more like you're grateful that your doctor found cancerous cells before they became a problem, but at the same time his other business is selling cigarettes. I think this is a largely inflated PR stunt. Opus 4.6 was already capable of finding 0days and chaining together vulns to create exploits. See [0] and [1]. [0] https://www.csoonline.com/article/4153288/vim-and-gnu-emacs-... I’m in the same boat as you. I believe the model is an improvement of course but I’ve been successfully bug finding 0 day hunting and red teaming with models for the last two years and while that’s impressive I have a feeling that this doomsaying/overhype is mostly marketing being that’s being amplified by non-security folks. Absolutely not a PR stunt, talk to one of your friends working at partner companies with access to the model >>> the US and its allies must maintain a decisive lead in AI technology. Governments have an essential role to play in helping maintain that lead, and in both assessing and mitigating the national security risks associated with AI models. We are ready to work with local, state, and federal representatives to assist in these tasks. How long would it take to turn a defensive mechanism into an offensive one? In this case there is almost no distinction. Assuming the model is as powerful as claimed, someone with access to the weights could do immense damage without additional significant R&D. Yes, I can see this as non releasable for national security reasons in the China geopolitical competition. Securing our software against threats while having immense infiltration ability against enemy cyber security targets....not to mention, the ability to implant new, but even more subtle vulnerabilities into open software not generally detectable by current AI to provide covert action. Which will eventually happen no matter what. That's why it's important to start preparing now. Related ongoing threads: System Card: Claude Mythos Preview [pdf] - https://news.ycombinator.com/item?id=47679258 Assessing Claude Mythos Preview's cybersecurity capabilities - https://news.ycombinator.com/item?id=47679155 I can't tell which of the 3 current threads should be merged - they all seem significant. Anyone? I think merging them into either this thread, or the System Card makes the most sense to me. I'm sure it'll be better than Opus 4.6, but so much of this seems hype. Escaping its sandbox, having to do "brain scans" because it's "hiding its true intent", bla bla bla. If it manages to work on my java project for an entire day without me having to say "fix FQN" 5 times a day I'll be surprised. Mythos Preview has already found thousands of high-severity vulnerabilities, including some in every major operating system and web browser. Scary but also cool Did someone actually go through all of those and check if they are high-severity or did the AI just tell them that? They mention that they have humans review the most crticial bugs before sending it to the maintainers in their dev blog. Every piece of software definitely has serious vulnerabilities, perfection is not achievable. Fortunately we have another approach to security: security through compartmentalization. See: https://qubes-os.org Once you get the compartmentalization working well, and “all” of the vulnerabilities are out of it too, of course… But even then you’ll have users putting things in the same compartment for convenience, rather than leaving them properly sequestered. Or more likely, its just an exaggeration or lie. What evidence makes you say that? Do you have insider info? Neither party provided the evidence. I wonder why people like to take the side of the optimistic. Yes I'm sure this is all a massive conspiracy by the many companies that are making statements alongside Anthropic Pricing for Mythos Preview is $25/$125, so cheaper than GPT 4.5 ($75/$150) and GPT 5.4 Pro ($30/$180) For comparison, 5x the cost of Opus 4.6, and 1.67x for Opus 4.1 I think this would be very heavily used if they released it, completely unlike GPT 4.5 Opus 4 & 4.1 are still on Vertex+Bedrock @ $75/1mm out. They were used very heavily and in my subjective opinion are better than 4.5 and 4.6. Interesting, what makes them better to you? Opus 4, with enough context, could do most all I wanted in a single shot. More often than not, when I had a bad outcome and was frustrated I would realize that I was the problem (in giving improper direction or missing key context). I also was in a pretty sweet position having a boat load of credits and premo vertex rate limits so I could 'afford' to dump hundreds of thousands of tokens in context all day. With Opus 4.5 and 4.6, I find I have to steer very actively. This is comparing using Opus 4 directly rather than comparing the performance of the models in Claude Code for example, or any 'agentic' setup. Kinda reminds me of 4o vs 4-turbo. I would imagine they are smaller models. Where did you get that from? From TFA: > We do not plan to make Claude Mythos Preview generally available From the article: > Anthropic’s commitment of $100M in model usage credits to Project Glasswing and additional participants will cover substantial usage throughout this research preview. Afterward, Claude Mythos Preview will be available to participants at $25/$125 per million input/output tokens (participants can access the model on the Claude API, Amazon Bedrock, Google Cloud’s Vertex AI, and Microsoft Foundry). Part of me wonders if they're not releasing it for safety reasons, but just because it's too expensive to serve. Why not both? I don't think they have the infra to support the demand. Anthropic can't keep up with the demand from OpenClaw users, they won't be able to keep up with public demand for something like Mythos. I buy the rationale for this. There's been a notable uptick over the past couple of weeks of credible security experts unrelated to Anthropic calling the alarm on the recent influx of actually valuable AI-assisted vulnerability reports. From Willy Tarreau, lead developer of HA Proxy: https://lwn.net/Articles/1065620/ > On the kernel security list we've seen a huge bump of reports. We were between 2 and 3 per week maybe two years ago, then reached probably 10 a week over the last year with the only difference being only AI slop, and now since the beginning of the year we're around 5-10 per day depending on the days (fridays and tuesdays seem the worst). Now most of these reports are correct, to the point that we had to bring in more maintainers to help us. > And we're now seeing on a daily basis something that never happened before: duplicate reports, or the same bug found by two different people using (possibly slightly) different tools. From Daniel Stenberg of curl: https://mastodon.social/@bagder/116336957584445742 > The challenge with AI in open source security has transitioned from an AI slop tsunami into more of a ... plain security report tsunami. Less slop but lots of reports. Many of them really good. > I'm spending hours per day on this now. It's intense. From Greg Kroah-Hartman, Linux kernel maintainer: https://www.theregister.com/2026/03/26/greg_kroahhartman_ai_... > Months ago, we were getting what we called 'AI slop,' AI-generated security reports that were obviously wrong or low quality. It was kind of funny. It didn't really worry us. > Something happened a month ago, and the world switched. Now we have real reports. All open source projects have real reports that are made with AI, but they're good, and they're real. Shared some more notes on my blog here: https://simonwillison.net/2026/Apr/7/project-glasswing/ Could this potentially be because more researches are becoming accustomed to the tools/adding them in their pipelines? The reason I ask is because I’ve been using them to snag bounties to great effect for quite a while and while other models have of course improved they’ve been useful for this kind of work before now. >We plan to launch new safeguards with an upcoming Claude Opus model, allowing us to improve and refine them with a model that does not pose the same level of risk as Mythos Preview2. This seems like the real news. Are they saying they're going to release an intentionally degraded model as the next Opus? Big opportunity for the other labs, if that's true. The other labs already censor their models. Everyone is trying to find the sweet spot where performance and ‘alignment’ are both maximized. This seems no different > Big opportunity for the other labs, if that's true. It sounds like this is considered military grade technology as cryptography in the 90s. The big difference is it's very expensive to create, and run those models. It's not about the algorithm. If the story rhymes it could be a big opportunity to other regions in the world. Well since Anthropic treats us as second class evil citizens, I guess they don't want our evil money either. Can anyone point at the critical vulnerabilities already patched as a result of mythos? (see 3:52 in the video) For example, the 27 year old openbsd remote crash bug, or the Linux privilege escalation bugs? I know we've had some long-standing high profile, LLM-found bugs discussed but seems unlikely there was speculation they were found by a previously unannounced frontier model. - The OpenBSD one is 'TCP packets with invalid SACK options could crash the kernel' https://cdn.openbsd.org/pub/OpenBSD/patches/7.8/common/025_s... - One (patched) Linux kernel bug is 'UaF
when sys_futex_requeue() is used with different flags' https://github.com/torvalds/linux/commit/e2f78c7ec1655fedd94... These links are from the more-detailed 'Assessing Claude Mythos Preview’s cybersecurity capabilities' post released today https://red.anthropic.com/2026/mythos-preview/, which includes more detail on some of the public/fixed issues (like the OpenBSD one) as well as hashes for several unreleased reports and PoCs. That OpenBSD one is exactly the kind of bug that easily slips past a human. Especially as the code worked perfectly under regular circumstances. Looks like they've been approaching folks with their findings for at least a few weeks before this article. While not entirely unrelated, Linux also had a remote SACK issue ~ 6 years back. So if this Mythos is just an expensive combination of better RL and the original source material, that should hopefully point out where we might see an uptick in work ( as opposed to a novel class of attack vectors). I'm not one to believe the Silicon Valley hype usually (GPT-2 being too dangerous to release, AI giving us UBI, and so on), but having run Claude Opus 4.6 against my codebase (a MUD client) over the weekend, I can believe this assessment. Opus alone did a good job of identifying security issues in my software, as it did with Firefox [1] and Linux [2]. A next-generation frontier model being able to find even more issues sounds believable. That said, this is script kiddies vs sql injections all over again. Everyone will need to get their basic security up on the new level and it will become the new normal. And, given how intelligence agencies are sitting on a ton of zero-days already, this will actually help the general public by levelling out the playing field once again. 1 - https://www.anthropic.com/news/mozilla-firefox-security
2 - https://neuronad.com/ai-news/claude-code-unearthed-a-23-year... I don't want to be overly cynical and am in general in favor of the contrarian attitude of simply taking people at their word, but I wonder if their current struggles with compute resources make it easier for them to choose to not deploy Mythos widely. I can imagine their safety argument is real, but regardless, they might not have the resources to profitably deploy it. (Though on the other hand, you could argue that they could always simply charge more.) I would have not believed your argument 3 months ago but I strongly suspect Anthropic actively engages in model quality throttling due to their compute constraints. Their recent deal for multi GWs worth of data center might help them correct their approach. For what it's worth Anthropic explicity denies that. "To state it plainly: We never reduce model quality due to demand, time of day, or server load" Also can see https://marginlab.ai/trackers/claude-code/ It's very interesting to me how widespread this conception is. Maybe it's as simple as LLM productivity degrading over time within a project, as slop compounds. Or more recently since they added a 1m context window, maybe people are more reckless with context usage Posted this a while ago: >Models are not "degrading". They're not being "secretly quantized". And no one is swapping out your 1.2T frontier behemoth for a cheap 120B toy and hoping you wouldn't notice! >It's just that humans are completely full of shit, and can't be trusted to measure LLM performance objectively! >Every time you use an LLM, you learn its capability profile better. You start using it more aggressively at what it's "good" at, until you find the limits and expose the flaws. You start paying attention to the more subtle issues you overlooked at first. Your honeymoon period wears off and you see that "the model got dumber". It didn't. You got better at pushing it to its limits, exposing the ways in which it was always dumb. >Now, will the likes of Anthropic just "API error: overloaded" you on any day of the week that ends in Y? Will they reduce your usage quotas and hope that you don't notice because they never gave you a number anyway? Oh, definitely. But that "they're making the models WORSE" bullshit lives in people's heads way more than in any reality. It has nothing to do with the context window. Reasoning brought measured approaches grounded with actual tool calls. All of that short-circuits into a quick fix approach that is unlike Opus-4.5 or 4.6. Sonnet-4.5 used to do that. My context window is always < 200K. That still leaves open the possibility that they reduce model quality due to profit. ;p Inference is where they make the money they spend on training, so this feels unlikely. Perhaps this does not true for Mythos though Interesting also is what they didn't find, e.g. a Linux network stack remote code execution vulnerability. I wonder if Mythos is good enough that there really isn't one. Linux had it's SACK moment in 2019 -
https://access.redhat.com/security/vulnerabilities/tcpsack#s... We could just be seeing the fruit of expensive SWE RL on existing source material. > On the global stage, state-sponsored attacks from actors like China, Iran, North Korea, and Russia have threatened to compromise the infrastructure that underpins both civilian life and military readiness. AITA for thinking that PRISM was probably the state sponsored program affecting civilian life the most? And that one state is missing from the list here? > Large American AI company does not list the US as an adversarial actor This is not a surprise or a gotcha. Said company is literally in court against said government at the moment, after said government attempted to designate it too dangerous to do business with. There are currently over 1,000 companies involved in lawsuits against the US government right now even if we restrict ourselves to just tariff lawsuits. And the government is attempting "corporate murder" on precisely one of them. Wanna guess which one? I can think of two I’d add to the list. One was recently publicly denied access to Anthropics models and the other was busy exploding pagers. Not clear how an LLM is going to prevent a bomb from being put in a custom-built pager, or why Anthropic should object to Israel waging war against a militia whose goal it is to destroy that country. Because they use LLMs to “intelligence-wash” targeting civilians, and murdered children by blowing up pagers in public areas (what you called “waging war against a militia”). > PRISM was probably the state sponsored program affecting civilian life the most? No state-sponsored hacking affected Americans materially. I just don't think we were networked enough in the 2010s. The risk is higher now since we're in a more warmongering world. (Kompromat on a power-plant technician is a risk in peace. It means blackouts in war.) The fact that Iran hasn't been able to do diddly squat in America should sink in the fact that they didn't compromise us. (EDIT: blep. I was wrong.) Iranian-Affiliated Cyber Actors Exploit Programmable Logic Controllers Across US Critical Infrastructure - https://www.cisa.gov/news-events/cybersecurity-advisories/aa... - April 7th, 2026 Did they activate them to any noticeable effect? To my knowledge, not yet. The attack surface in question is extensive, and in my opinion, targets are likely unprepared for a determined and sophisticated attacker. https://www.politico.com/news/2026/04/07/iranian-hackers-ene... It's 2026, and these PLCs etc. are directly connected to the internet? I think that's the most surprising aspect here. It doesn’t surprise me at all. Show me the incentives, and I’ll show you the outcome. Security is simply not valued, in many cases. >No state-sponsored hacking affected Americans materially. Uh, what? NotPetya was kind of a big deal. How did PRISM affect civilian life? Honest question: how do state-sponsored attacks from China, Iran, North Korea, and Russia affect civilian life? Presumably, those have influenced elections, though I guess it depends what you count as an attack. Plenty of bots try to modify public opinion. Someone hacked the DNC in 2015/16, the result of which also alleged attempted manipulation in 2008: https://en.wikipedia.org/wiki/Democratic_National_Committee_... Since we (as old Rummy said) do not know what we do not know, we cannot be certain about the extent of cyber attacks and what they might have influenced, and may not know these things until discoveries decades later, if ever. Note the RNC was also hacked but the data was not leaked. Presumably used to influence the election and policies in other ways. I believe the popular sentiment is that when they hacked the DNC they found a handful of things that would provide bad optics for the party. But the RNC? They found so much evidence of criminality that near to the entire party flipped positions on issues related to Russia. So we have 2x successful hacks, one of which yielded some bad press for the Dems, and yielded an entirely compromised party in the Repubs who now are being actively blackmailed. All of that applies equally to PRISM and any internal propaganda campaigns that was feeding into, no? Yes... they might have influenced elections and now, as a result, the world must cope with the Trump regime. Let's now fool ourselves.... Trump is probably the best, most successful attempt at world de-stabilisation all those rogue states ever achieved. Maybe Americans should take responsibility for electing a maniac as their President. In the end, the buck stops with Americans. Not if the election was stolen. There was a smattering of evidence after the election but the speed with which is disappeared was truly something to behold. ~1/3rd of US citizens voted for him. Don’t lump us all in. Some of you are just guilty of negligence yes. Or maybe it's that our archaic system was designed so that some people's votes literally matter more than others, and more than half the country does not have a meaningful voice in our Federal elections.
ofjcihen - 2 hours ago
jstummbillig - 9 minutes ago
ofjcihen - 4 minutes ago
qnleigh - 2 hours ago
4ndrewl - an hour ago
killingtime74 - 2 hours ago
harikb - an hour ago
ofjcihen - 2 hours ago
dota_fanatic - an hour ago
ofjcihen - an hour ago
dota_fanatic - 8 minutes ago
dota_fanatic - 23 minutes ago
ofjcihen - 10 minutes ago
nbardy - 22 minutes ago
ofjcihen - 19 minutes ago
jillesvangurp - an hour ago
junon - an hour ago
ofjcihen - an hour ago
9cb14c1ec0 - 12 hours ago
woeirua - 11 hours ago
pilgrim0 - 6 hours ago
qnleigh - 2 hours ago
fintech_eng - 10 hours ago
peterldowns - 7 hours ago
anabis - 5 hours ago
wanderingmind - 6 hours ago
Gigachad - 9 hours ago
3pt14159 - 6 hours ago
snazz - 8 hours ago
Gigachad - 8 hours ago
mcast - 8 hours ago
Gigachad - 8 hours ago
JumpCrisscross - 7 hours ago
Gigachad - 7 hours ago
JumpCrisscross - 7 hours ago
tex0 - 2 hours ago
komali2 - an hour ago
elnerd - 27 minutes ago
georgemcbay - 11 hours ago
tptacek - 11 hours ago
qingcharles - 10 hours ago
spr-alex - 8 hours ago
fsflover - 11 hours ago
fsflover - 11 hours ago
Analemma_ - 10 hours ago
slashdave - 7 hours ago
sam0x17 - an hour ago
declan_roberts - 23 minutes ago
sam0x17 - 42 minutes ago
declan_roberts - 21 minutes ago
redfloatplane - 12 hours ago
_pdp_ - 11 hours ago
renewiltord - 4 hours ago
vasco - 2 hours ago
ainch - 7 hours ago
mceachen - 6 hours ago
throwaw12 - 12 hours ago
ks2048 - 7 hours ago
vips7L - 4 hours ago
boring-human - 3 hours ago
boring-human - 10 hours ago
1attice - 9 hours ago
RALaBarge - 7 hours ago
komali2 - an hour ago
whalesalad - 12 hours ago
redfloatplane - 12 hours ago
semi-extrinsic - 32 minutes ago
stevenhuang - 7 hours ago
vasco - 2 hours ago
nearbuy - an hour ago
peteforde - an hour ago
yieldcrv - 10 hours ago
enraged_camel - 12 hours ago
slacktivism123 - 12 hours ago
redfloatplane - 11 hours ago
copx - 11 hours ago
redfloatplane - 10 hours ago
elboru - 6 hours ago
brigandish - 2 hours ago
gritspants - 7 hours ago
woeirua - 10 hours ago
redfloatplane - 10 hours ago
woeirua - 8 hours ago
not_that_d - 23 minutes ago
horacemorace - 8 hours ago
woeirua - 8 hours ago
komali2 - an hour ago
wyre - 6 hours ago
TheAtomic - 11 hours ago
redfloatplane - 10 hours ago
lmm - 4 hours ago
dwd - 6 hours ago
thereitgoes456 - 11 hours ago
ipython - 6 hours ago
DoctorOetker - 3 hours ago
Miraste - 12 hours ago
1attice - 3 hours ago
username223 - 11 hours ago
torginus - 12 hours ago
redfloatplane - 11 hours ago
torginus - 11 hours ago
redfloatplane - 11 hours ago
torginus - 11 hours ago
bonsai_spool - 10 hours ago
torginus - 9 hours ago
bonsai_spool - 8 hours ago
bonsai_spool - 11 hours ago
torginus - 11 hours ago
bonsai_spool - 11 hours ago
dsign - 11 hours ago
miki123211 - 9 hours ago
WarmWash - 5 hours ago
rossjudson - an hour ago
jkelleyrtp - 12 hours ago
conradkay - 11 hours ago
SubiculumCode - 9 hours ago
charcircuit - 4 hours ago
nonameiguess - 11 hours ago
SubiculumCode - 9 hours ago
cyanydeez - 11 hours ago
dang - 8 hours ago
jryio - 12 hours ago
mlinsey - 12 hours ago
qingcharles - 10 hours ago
wslh - 11 hours ago
pipo234 - 12 hours ago
mlinsey - 12 hours ago
buzzerbetrayed - 9 hours ago
lilytweed - 12 hours ago
woeirua - 11 hours ago
skejeke - 9 hours ago
timschmidt - 12 hours ago
torginus - 12 hours ago
timschmidt - 11 hours ago
rachel_rig - 11 hours ago
socketcluster - 8 hours ago
tdaltonc - 10 hours ago
rpcope1 - 6 hours ago
pants2 - 12 hours ago
justincormack - 11 hours ago
pants2 - 11 hours ago
dist-epoch - 9 hours ago
conradkay - 11 hours ago
conradkay - 10 hours ago
fsflover - 11 hours ago
tptacek - 11 hours ago
intended - 11 hours ago
Herring - 11 hours ago
cyanydeez - 11 hours ago
josephg - 9 hours ago
nextos - 8 hours ago
underdeserver - 7 hours ago
ComplexSystems - 8 hours ago
acdha - 7 hours ago
tptacek - 4 hours ago
josephg - an hour ago
ofjcihen - 2 hours ago
tptacek - 2 hours ago
josephg - an hour ago
ofjcihen - 2 hours ago
tptacek - 2 hours ago
LiamPowell - 6 hours ago
Is it really so difficult for them to talk about what they've actually achieved without smearing a layer of nonsense over every single blog post? if (x != null) {
y = *x; // Vulnerability! X could be null!
}
QuiEgo - 5 hours ago
A bug like above would still be something that would be patched, even if a way to exploit it has not yet been found, so I think it's fair to call out (perhaps with less sensationalism). bool silly_mistake = false;
//... lots of lines of code
free(x);
//... lots of lines of code
if (silly_mistake) { // silly_mistake shown to be false at this point in the program in all testing, so far
free(x);
}
userbinator - 5 hours ago
QuiEgo - 4 hours ago
userbinator - 2 hours ago
array_key_first - an hour ago
ralph84 - 6 hours ago
therein - 3 hours ago
red75prime - 4 hours ago
sophiebits - 6 hours ago
LiamPowell - 6 hours ago
SpicyLemonZest - 3 hours ago
LiamPowell - 3 hours ago
tptacek - 2 hours ago
SpicyLemonZest - 2 hours ago
tptacek - 2 hours ago
SpicyLemonZest - 3 hours ago
MatejKafka - 6 hours ago
rootkea - 4 hours ago
qnleigh - 2 hours ago
bottlepalm - 4 hours ago
orbital-decay - 2 hours ago
camdenreslink - 3 hours ago
bottlepalm - 3 hours ago
slopinthebag - 2 hours ago
bottlepalm - an hour ago
userbinator - 5 hours ago
slopinthebag - 2 hours ago
deadliftdouche - 6 hours ago
userbinator - 2 hours ago
bri3d - 4 hours ago
ssgodderidge - 12 hours ago
dakolli - 11 hours ago
airstrike - 10 hours ago
mgambati - 8 hours ago
gck1 - 5 hours ago
rvz - 2 hours ago
solenoid0937 - 2 hours ago
slopinthebag - 2 hours ago
solenoid0937 - 2 hours ago
slopinthebag - 2 hours ago
ilaksh - 9 hours ago
temp123789246 - 10 hours ago
uselessTA - 6 hours ago
SubiculumCode - 9 hours ago
trevorm4 - 9 hours ago
bwfan123 - 6 hours ago
bitwize - an hour ago
lifeisstillgood - 30 minutes ago
cbg0 - 12 hours ago
GraphWalks BFS 256K-1M
Mythos Opus GPT5.4
80.0% 38.7% 21.4%
metadat - 12 hours ago
radicality - 6 hours ago
pertymcpert - 4 hours ago
himata4113 - 12 hours ago
thegeomaster - 11 hours ago
appcustodian2 - 10 hours ago
attentive - 18 minutes ago
stephc_int13 - 10 hours ago
tptacek - 10 hours ago
stephc_int13 - 9 hours ago
tptacek - 8 hours ago
postsantum - 2 hours ago
josh-sematic - 11 hours ago
tptacek - 11 hours ago
conradkay - 11 hours ago
tptacek - 10 hours ago
buzzerbetrayed - 9 hours ago
stale2002 - 9 hours ago
pilgrim0 - 6 hours ago
akerl_ - 6 hours ago
FergusArgyll - 7 hours ago
josh-sematic - 7 hours ago
blazespin - 9 hours ago
supern0va - 11 hours ago
tredre3 - 8 hours ago
meander_water - 8 hours ago
ofjcihen - an hour ago
solenoid0937 - 6 hours ago
agrishin - 12 hours ago
SheinhardtWigCo - 11 hours ago
SubiculumCode - 9 hours ago
SuperHeavy256 - 11 hours ago
dang - 10 hours ago
HPMOR - 10 hours ago
skerit - 9 hours ago
zachperkel - 12 hours ago
ex-aws-dude - 7 hours ago
DiffTheEnder - 2 hours ago
fsflover - 11 hours ago
syndeo - 8 hours ago
dakolli - 11 hours ago
pertymcpert - 4 hours ago
nicce - an hour ago
solenoid0937 - 6 hours ago
Ryan5453 - 12 hours ago
conradkay - 11 hours ago
adi_kurian - 10 hours ago
breakingcups - 10 hours ago
adi_kurian - 6 hours ago
cassianoleal - 12 hours ago
Tiberium - 12 hours ago
taupi - 12 hours ago
wyre - 3 hours ago
simonw - 10 hours ago
ofjcihen - an hour ago
Miraste - 12 hours ago
SheinhardtWigCo - 10 hours ago
wslh - 10 hours ago
zb3 - 11 hours ago
bredren - 11 hours ago
ollin - 10 hours ago
qingcharles - 10 hours ago
NickJLange - 43 minutes ago
VadimPR - 21 minutes ago
Sol- - 11 hours ago
rishabhaiover - 11 hours ago
conradkay - 11 hours ago
ACCount37 - 3 minutes ago
rishabhaiover - 6 hours ago
irthomasthomas - 7 hours ago
wilson090 - 11 hours ago
underdeserver - 11 hours ago
NickJLange - 42 minutes ago
rakel_rakel - 10 hours ago
ronsor - 9 hours ago
jaidhyani - 9 hours ago
da_chicken - 7 hours ago
xvector - 5 hours ago
laweijfmvo - 7 hours ago
sequoia - 4 hours ago
kennywinker - 4 hours ago
JumpCrisscross - 7 hours ago
toomuchtodo - 7 hours ago
JumpCrisscross - 7 hours ago
toomuchtodo - 7 hours ago
itintheory - 6 hours ago
toomuchtodo - 6 hours ago
dralley - 6 hours ago
lobochrome - 9 hours ago
itishappy - 9 hours ago
simonsarris - 9 hours ago
conception - 8 hours ago
Henchman21 - 5 hours ago
lmm - 5 hours ago
realo - 8 hours ago
Ar-Curunir - 8 hours ago
Henchman21 - 5 hours ago
Forgeties79 - 7 hours ago
philipwhiuk - 7 hours ago
Atreiden - 7 hours ago