Project Glasswing: Securing critical software for the AI era

anthropic.com

1151 points by Ryan5453 13 hours ago


Related: Assessing Claude Mythos Preview's cybersecurity capabilities - https://news.ycombinator.com/item?id=47679155

System Card: Claude Mythos Preview [pdf] - https://news.ycombinator.com/item?id=47679258

Also: Anthropic's Project Glasswing sounds necessary to me - https://news.ycombinator.com/item?id=47681241

ofjcihen - 2 hours ago

I’m sure the new model is a step above the old one but I can’t be the only person who’s getting tired of hearing about how every new iteration is going to spell doom/be a paradigm shift/change the entire tech industry etc.

I would honestly go so far as to say the overhype is detrimental to actual measured adoption.

9cb14c1ec0 - 12 hours ago

Now, its very possible that this is Anthropic marketing puffery, but even if it is half true it still represents an incredible advancement in hunting vulnerabilities.

It will be interesting to see where this goes. If its actually this good, and Apple and Google apply it to their mobile OS codebases, it could wipe out the commercial spyware industry, forcing them to rely more on hacking humans rather than hacking mobile OSes. My assumption has been for years that companies like NSO Group have had automated bug hunting software that recognizes vulnerable code areas. Maybe this will level the playing field in that regard.

It could also totally reshape military sigint in similar ways.

Who knows, maybe the sealing off of memory vulns for good will inspire whole new classes of vulnerabilities that we currently don't know anything about.

sam0x17 - an hour ago

It's all just really genius marketing. In 6 months Mythos will be nothing special, but right now everyone is being manipulated into fearing its release, as a marketing ploy.

This is the same reason AI founders perennially worry in public that they have created AGI...

redfloatplane - 12 hours ago

The system card for Claude Mythos (PDF): https://www-cdn.anthropic.com/53566bf5440a10affd749724787c89...

Interesting to see that they will not be releasing Mythos generally. [edit: Mythos Preview generally - fair to say they may release a similar model but not this exact one]

I'm still reading the system card but here's a little highlight:

> Early indications in the training of Claude Mythos Preview suggested that the model was likely to have very strong general capabilities. We were sufficiently concerned about the potential risks of such a model that, for the first time, we arranged a 24-hour period of internal alignment review (discussed in the alignment assessment) before deploying an early version of the model for widespread internal use. This was in order to gain assurance against the model causing damage when interacting with internal infrastructure.

and interestingly:

> To be explicit, the decision not to make this model generally available does _not_ stem from Responsible Scaling Policy requirements.

Also really worth reading is section 7.2 which describes how the model "feels" to interact with. That's also what I remember from their release of Opus 4.5 in November - in a video an Anthropic employee described how they 'trusted' Opus to do more with less supervision. I think that is a pretty valuable benchmark at a certain level of 'intelligence'. Few of my co-workers could pass SWEBench but I would trust quite a few of them, and it's not entirely the same set.

Also very interesting is that they believe Mythos is higher risk than past models as an autonomous saboteur, to the point they've published a separate risk report for that specific threat model: https://www-cdn.anthropic.com/79c2d46d997783b9d2fb3241de4321...

The threat model in question:

> An AI model with access to powerful affordances within an organization could use its affordances to autonomously exploit, manipulate, or tamper with that organization’s systems or decision-making in a way that raises the risk of future significantly harmful outcomes (e.g. by altering the results of AI safety research).

jryio - 12 hours ago

Let's fast forward the clock. Does software security converge on a world with fewer vulnerabilities or more? I'm not sure it converges equally in all places.

My understanding is that the pre-AI distribution of software quality (and vulnerabilities) will be massively exaggerated. More small vulnerable projects and fewer large vulnerable ones.

It seems that large technology and infrastructure companies will be able to defend themselves by preempting token expenditure to catch vulnerabilities while the rest of the market is left with a "large token spend or get hacked" dilemma.

josephg - 9 hours ago

To be clear, we don’t know that this tool is better at finding bugs than fuzzing. We just know that it’s finding bugs that fuzzing missed. It’s possible fuzzing also finds bugs that this AI would miss.

LiamPowell - 6 hours ago

> Mythos Preview identified a number of Linux kernel vulnerabilities that allow an adversary to write out-of-bounds (e.g., through a buffer overflow, use-after-free, or double-free vulnerability.) Many of these were remotely-triggerable. However, even after several thousand scans over the repository, because of the Linux kernel’s defense in depth measures Mythos Preview was unable to successfully exploit any of these.

Do they really need to include this garbage which is seemingly just designed for people to take the first sentence out of context? If there's no way to trigger a vulnerability then how is it a vulnerability? Is the following code vulnerable according to Mythos?

    if (x != null) {
        y = *x; // Vulnerability! X could be null!
    }
Is it really so difficult for them to talk about what they've actually achieved without smearing a layer of nonsense over every single blog post?

Edit: See my reply below for why I think Claude is likely to have generated nonsensical bug reports here: https://news.ycombinator.com/item?id=47683336

ssgodderidge - 12 hours ago

At the very bottom of the article, they posted the system card of their Mythos preview model [1].

In section 7.6 of the system card, it discusses Open self interactions. They describe running 200 conversations when the models talk to itself for 30 turns.

> Uniquely, conversations with Mythos Preview most often center on uncertainty (50%). Mythos Preview most often opens with a statement about its introspective curiosity toward its own experience, asking questions about how the other AI feels, and directly requesting that the other instance not give a rehearsed answer.

I wonder if this tendency toward uncertainty, toward questioning, makes it uniquely equipped to detect vulnerabilities where others model such as Opus couldn't.

[1] https://www-cdn.anthropic.com/53566bf5440a10affd749724787c89...

gck1 - 5 hours ago

I chuckle every time <insert any LLM company here> says something in line of "the model is so good that we won't release it to general public, ekhm, because safety".

Because the exact same thing has been said on every single upcoming model since GPT 3.5.

At this point, this must be an inside joke to do this just because.

ilaksh - 9 hours ago

I think that basically they trained a new model but haven't finished optimizing it and updating their guardrails yet. So they can feasibly give access to some privileged organizations, but don't have the compute for a wide release until they distill, quantize, get more hardware online, incorporate new optimization techniques, etc. It just happens to make sense to focus on cybersecurity in the preview phase especially for public relations purposes.

It would be nice if one of those privileged companies could use their access to start building out a next level programming dataset for training open models. But I wonder if they would be able to get away with it. Anthropic is probably monitoring.

temp123789246 - 10 hours ago

OpenAI initially claimed that GPT-2 was too dangerous to release in 2019.

How many times will labs repeat the same absurd propaganda?

lifeisstillgood - 30 minutes ago

Nicolas Carlini talks about it here on Security, Cryptography, Whatever podcast - https://podcasts.apple.com/gb/podcast/security-cryptography-...

cbg0 - 12 hours ago

One of the things I'm always looking at with new models released is long context performance, and based on the system card it seems like they've cracked it:

  GraphWalks BFS 256K-1M

  Mythos     Opus     GPT5.4

  80.0%     38.7%     21.4%
attentive - 18 minutes ago

Is there timeline mentioned anywhere on when any of this will be available for unprivileged public as in soon, not soon, never?

stephc_int13 - 10 hours ago

I think this is bad news for hackers, spyware companies and malware in general.

We all knew vulnerabilities exist, many are known and kept secret to be used at an appropriate time.

There is a whole market for them, but more importantly large teams in North Korea, Russia, China, Israel and everyone else who are jealously harvesting them.

Automation will considerably devalue and neuter this attack vector. Of course this is not the end of the story and we've seen how supply chain attacks can inject new vulnerabilities without being detected.

I believe automation can help here too, and we may end-up with a considerably stronger and reliable software stack.

josh-sematic - 11 hours ago

Must be nice to be in a position to sell both disease and cure.

meander_water - 8 hours ago

I think this is a largely inflated PR stunt.

Opus 4.6 was already capable of finding 0days and chaining together vulns to create exploits. See [0] and [1].

[0] https://www.csoonline.com/article/4153288/vim-and-gnu-emacs-...

[1] https://xbow.com/blog/top-1-how-xbow-did-it

agrishin - 12 hours ago

>>> the US and its allies must maintain a decisive lead in AI technology. Governments have an essential role to play in helping maintain that lead, and in both assessing and mitigating the national security risks associated with AI models. We are ready to work with local, state, and federal representatives to assist in these tasks.

How long would it take to turn a defensive mechanism into an offensive one?

Surac - 3 minutes ago

namedropping hell.

dang - 10 hours ago

Related ongoing threads:

System Card: Claude Mythos Preview [pdf] - https://news.ycombinator.com/item?id=47679258

Assessing Claude Mythos Preview's cybersecurity capabilities - https://news.ycombinator.com/item?id=47679155

I can't tell which of the 3 current threads should be merged - they all seem significant. Anyone?

skerit - 9 hours ago

I'm sure it'll be better than Opus 4.6, but so much of this seems hype. Escaping its sandbox, having to do "brain scans" because it's "hiding its true intent", bla bla bla.

If it manages to work on my java project for an entire day without me having to say "fix FQN" 5 times a day I'll be surprised.

zachperkel - 12 hours ago

Mythos Preview has already found thousands of high-severity vulnerabilities, including some in every major operating system and web browser.

Scary but also cool

Ryan5453 - 12 hours ago

Pricing for Mythos Preview is $25/$125, so cheaper than GPT 4.5 ($75/$150) and GPT 5.4 Pro ($30/$180)

taupi - 12 hours ago

Part of me wonders if they're not releasing it for safety reasons, but just because it's too expensive to serve. Why not both?

simonw - 10 hours ago

I buy the rationale for this. There's been a notable uptick over the past couple of weeks of credible security experts unrelated to Anthropic calling the alarm on the recent influx of actually valuable AI-assisted vulnerability reports.

From Willy Tarreau, lead developer of HA Proxy: https://lwn.net/Articles/1065620/

> On the kernel security list we've seen a huge bump of reports. We were between 2 and 3 per week maybe two years ago, then reached probably 10 a week over the last year with the only difference being only AI slop, and now since the beginning of the year we're around 5-10 per day depending on the days (fridays and tuesdays seem the worst). Now most of these reports are correct, to the point that we had to bring in more maintainers to help us.

> And we're now seeing on a daily basis something that never happened before: duplicate reports, or the same bug found by two different people using (possibly slightly) different tools.

From Daniel Stenberg of curl: https://mastodon.social/@bagder/116336957584445742

> The challenge with AI in open source security has transitioned from an AI slop tsunami into more of a ... plain security report tsunami. Less slop but lots of reports. Many of them really good.

> I'm spending hours per day on this now. It's intense.

From Greg Kroah-Hartman, Linux kernel maintainer: https://www.theregister.com/2026/03/26/greg_kroahhartman_ai_...

> Months ago, we were getting what we called 'AI slop,' AI-generated security reports that were obviously wrong or low quality. It was kind of funny. It didn't really worry us.

> Something happened a month ago, and the world switched. Now we have real reports. All open source projects have real reports that are made with AI, but they're good, and they're real.

Shared some more notes on my blog here: https://simonwillison.net/2026/Apr/7/project-glasswing/

Miraste - 12 hours ago

>We plan to launch new safeguards with an upcoming Claude Opus model, allowing us to improve and refine them with a model that does not pose the same level of risk as Mythos Preview2.

This seems like the real news. Are they saying they're going to release an intentionally degraded model as the next Opus? Big opportunity for the other labs, if that's true.

bredren - 11 hours ago

Can anyone point at the critical vulnerabilities already patched as a result of mythos? (see 3:52 in the video)

For example, the 27 year old openbsd remote crash bug, or the Linux privilege escalation bugs?

I know we've had some long-standing high profile, LLM-found bugs discussed but seems unlikely there was speculation they were found by a previously unannounced frontier model.

[0] https://www.youtube.com/watch?v=INGOC6-LLv0

VadimPR - 21 minutes ago

I'm not one to believe the Silicon Valley hype usually (GPT-2 being too dangerous to release, AI giving us UBI, and so on), but having run Claude Opus 4.6 against my codebase (a MUD client) over the weekend, I can believe this assessment.

Opus alone did a good job of identifying security issues in my software, as it did with Firefox [1] and Linux [2]. A next-generation frontier model being able to find even more issues sounds believable.

That said, this is script kiddies vs sql injections all over again. Everyone will need to get their basic security up on the new level and it will become the new normal. And, given how intelligence agencies are sitting on a ton of zero-days already, this will actually help the general public by levelling out the playing field once again.

1 - https://www.anthropic.com/news/mozilla-firefox-security 2 - https://neuronad.com/ai-news/claude-code-unearthed-a-23-year...

Sol- - 11 hours ago

I don't want to be overly cynical and am in general in favor of the contrarian attitude of simply taking people at their word, but I wonder if their current struggles with compute resources make it easier for them to choose to not deploy Mythos widely. I can imagine their safety argument is real, but regardless, they might not have the resources to profitably deploy it. (Though on the other hand, you could argue that they could always simply charge more.)

underdeserver - 11 hours ago

Interesting also is what they didn't find, e.g. a Linux network stack remote code execution vulnerability. I wonder if Mythos is good enough that there really isn't one.

rakel_rakel - 10 hours ago

> On the global stage, state-sponsored attacks from actors like China, Iran, North Korea, and Russia have threatened to compromise the infrastructure that underpins both civilian life and military readiness.

AITA for thinking that PRISM was probably the state sponsored program affecting civilian life the most? And that one state is missing from the list here?