Project Glasswing: An Initial Update
anthropic.com472 points by louiereederson 19 hours ago
472 points by louiereederson 19 hours ago
You can get a taste of this today yourself with Codex Security. I turned it on just as an experiment and in less than a week it has now become essential to all of us. I was shocked how accurate it is, how many security issues it found in existing code, how it continually finds them as we commit, and how NO ONE is immune from making these mistakes.
I'd say it is about 90% accurate for us. Often even the "Low" findings lead us to dig and realize it is actually exploitable. Everyone makes these mistakes, from the most junior to the most senior. They are just a class of bugs after all.
I expect tools like this to be a regular part of the development lifecycle from here on. We code with AI, we review with AI, we search for vulns with AI. Even if it isn't perfect, it is easily worth the cost IMHO. Highly recommend you get something enabled for your own repos ASAP
> I expect tools like this to be a regular part of the development lifecycle from here on. We code with AI, we review with AI, we search for vulns with AI. Even if it isn't perfect, it is easily worth the cost IMHO.
So, how is that supposed to work? Claude Code generates security bugs, then Claude Security finds them, then Claude Code generate fix, spend tokens, profit?
Yeah, with a budget assigned. This is actually just software development and security right?
Developers create software, which has bugs. Users (including bad guys, pen testers, QA folks, automated scans etc, etc, etc) find bugs, including security bugs, Developers fix bugs and maybe make more. It's an OODA loop, and continues until the developers decide to stop supporting the software.
Whether that fits into the business model, or the value proposition of spending tokens instead of engineer hours or user hours is fundamentally a risk management decision and whether or not the developer (whether OSS contributor, employee, business owner, etc) wants to invest their resources into maintaining the project.
While not evenly distributed, and not perfect, the currently available and behind embargoed tools are absolutely impactful, and yes, they are expensive to operate right now - it may not always be the case, but the "Attacks always get better" adage applies here. The models will get cheaper to run, and if you don't want to pay for engineers or reward volunteers to do the work, then you've got to pay for tokens, or spend some other resource to get the work done.
Somehow this reminded me of the historical efforts of some government bounty collections for mouse tails which were discontinued due to fraud (such as hunters breeding mice to collect the reward). There is a reason why/how devs and QA keep each other in check. Guess in case of LLM writing code, one has to use different models for dev and security checks.
On other hand, in real world, the developers learn from mistakes and avoid them in the future. However there is no feedback loop with enterprises using LLM with the agreement that the LLM would not use the enterprise code for training purposes
> the developers learn from mistakes and avoid them in the future
No. Humans learn from mistakes and try to avoid them in the future, but there is a whole pile of other stuff in the bag of neurons between our ears that prevent us from avoiding repetition of errors.
I have seen extremely talented engineers write trivial to avoid memory corruption bugs because they were thinking about the problem they were trying to solve, and not the pitfalls they could fall into. I would argue that the vast majority of software defects in released code are written by people that know better, but the bug introduced was orthogonal to the problem they were trying to solve, or was for an edge case that was not considered in the requirements.
Unless you are writing a software component specifically to be resilient against memory corruption, preventing memory corruption issues aren't top of mind when writing code, and that is ok since humans, like the machines we build, have a limit to the amount of context/content/problem space that we can hold and evaluate at once.
Separately, you don't necessarily need to use different models to generate code vs conduct security checks, but you should be using different prompts, steering, specs, skills and agents for the two tasks because of how the model and agents interpret the instructions given.
> write trivial to avoid memory corruption bugs because they were thinking about [something else] [...] defects [...] written by people that know better, but the bug introduced was orthogonal to [their focus]
For whatever reason, hadn't associated the inattentional blindness of bug writing with the invisible gorilla experiment and car crashes - selective attention fails. People looking right at the gorilla strolling into production while chest thumping, but not seeing it, for a focus on passing basketballs. That's quite an image. Tnx.
I've noticed even people who do offensive security for a living frequently leave gaping holes in their own code. If you're not actively primed to scan the landscape for the gorilla, you will often miss it even if you're a gorilla inquisitor.
Thank you in turn for making the issue much more salient to me by explicitly connecting it to the gorilla/basketball experiment. This is definitely going into my "clippings".
I think a similar thing comes into play when you ask a developer to write tests for the feature they just implemented. They’re going to have selective blindness for the edge cases (or requirements) that they failed to consider during implementation, unless they’re good at context switching into a testing mindset. And that’s something that benefits from training.
The problem is you as a person are not incentivized to introduce bugs in your code. If I am a company that provide provides an LLM/agent, and I know that the more bugs you have the more money I’m going to make, then I am not exactly incentivized to make my LLM/Agent better at preventing bugs. I don’t even have to explicitly make it introduce them. The incentive structure is simply out of whack.
Depends on how the billing works.
For users on fixed monthly pay accounts they'll be incentivised to do the exact opposite, as their income is fixed and the cost goes up for more tokens.
If the available evidence (third-party cloud pricing of open models) is correct and they make a profit on tokens but lose it on training, they will be incentivised for as many tokens as possible on pay-as-you-go API calls. If it isn't correct and they actually lose money even per token, they're also going to be incentivised to reduce output here.
Isn't it more likely the opposite - individial devs are likely to try to fudge metrics about how many vulnerabilities they find in their own code.
Whereas with LLMs, they’re really good about providing objective metrics about the bugs they found, especially as a subsequent LLM security scan does not know whether the same LLM wrote code earlier, the opposite of human devs.
And is the idea that organizations and/or benchmarks won't keep track of vulnerability rates for code from different LLMs?
(And individual devs get paid more the more bugs that they introduced they “find”, and they have more job security with an “maintainable” code base than a “finished” one.)
That’s like saying screw manufacturers are incentivized to give you crappy screws because it means you will buy more.
No. You will switch to a competitor that does a better job or charges less or both.
This is why monopolies are such a big problem. Because under a monopoly you are right.
What you’re describing is a one-to-one quality/failure problem by choosing to ruin the basic, core functionality of an item (while also endangering people at that). Or if you start with a bad screw, that just means you’re talking about people’s tolerance for bad products. What I’m talking about is similar but a little more nuanced and has plausible deniability. The relationship I’m describing is more indirect and it doesn’t require explicit effort to cheapen a product, but rather simply not improving a specific element of the product.
Apple made a ton of money off of lightning port accessories, you see it referenced here all the time. Apple had no incentive to swap to USB-C though it would create a better product and be more uniform with the rest of the world, so they kept with it despite incredibly vocal calls to swap because there was a ton of money they were making in the accessories. And it didn’t stop until they were forced to stop by the EU.
When we are talking about products at scale, these kinds of incentive structures play out in very tangible ways. If I have an LLM product and I’m getting two pulls at the hose because you’re burning tokens making stuff and correcting it, I don’t need to do anything. People are willing to tolerate that system to a pretty high degree so long as they ultimately get what they wanted in the end - unfortunately that is a great space to make money in.
Are you thinking of the cobra effect (aka https://en.wikipedia.org/wiki/Perverse_incentive) where people in India started breeding cobras to get the reward?
Plenty of examples abound:
https://en.wikipedia.org/wiki/Great_Hanoi_Rat_Massacre
> Today, the events are often used as an example of a perverse incentive, commonly referred to as the cobra effect. The modern discoverer of this event, American historian Michael G. Vann argues that the cobra example from the British Raj cannot be proven, but that the rats in the Vietnam case can be proven, so the term should be changed to the Rat Effect.
It's pretty absurd to do it on AI-generated code though. If there is now an automated way to find vulnerabilities, coding models can be pretty easily trained to not introduce them
Tell me you don’t know how AI works without telling me you don’t know how AI works.
What are you talking about?
I’ll try to steelman this comment. Anyone who uses coding tools knows that the output is heavily affected by details of the task you give it. The same model can give you garbage code or genius code for the same problem with slightly different framing. So it’s not necessarily a limitation in the model’s training that causes it to output security bugs. The model might be great at writing secure code, but you need a different harness to elicit that behavior.
Counterargument: just because the problem can be fixed without training, doesn’t mean training isn’t a possible solution.
Counter-counter-argument: for LLMs, tokens are units of thinking. And token use is, on the margin, directly proportional to costs of inference. So while the details of the harness, and how you prompt the model, and nature of the code and docs you put in context, etc. all matter to the quality of output you get from LLM coding tool, ultimately, there's always a ceiling to how much you're willing to spend on solving a problem - say, no more than 30 minutes, or $10, on refactoring a target module or implementing a small feature - and that puts a limit on how much thinking the model can put into it.
Thing is, writing secure and efficient and readable and simple code is in many cases fundamentally over that limit. It's possible, but you can't afford (or rationally just don't want) to spend as much on it as it's required for superhuman quality on all these aspects. Also most of the time, you don't want to operate at a limit - you probably expected that feature to take 30 seconds and less than $1 to implement. So you choose, both what the model optimizes for, and how much.
Because of that, no matter how good the model and the harness and the prompting are, $10 spent on coding is still bound to leave behind some security vulnerabilities that subsequent $10 spent on security review will find (especially with a model post-trained for that, at expense of general performance).
I guess I thought this should be obvious to everyone but, looking at code and finding exploits is completely different from .. writing exploits.
For one thing exploits often require completely different parts of the code to chain together. Sometimes parts of code the LLM itself isn’t writing.
And, LLMs are ALREADY trained negatively against writing buggy or exploitable code.
It's just an incremental thing. You're both right. They will slowly become less and less likely to introduce vulns due to higher intelligence and better RL. Offensive capabilities will still probably scale faster than automatic defensive-while-coding ones.
>I guess I thought this should be obvious
People in this thread are talking past and misunderstanding each other and making unrelated points.
The point of the response to the top level comment was questioning the conflict of interest in model providers creating separate revenue streams for themselves by selling a product that fixes problems their other product created, akin to OS providers selling anti-virus software back in the day.
Similarly, it should be obvious to you that a software engineer can trivially get into the mindset of writing more expoitable code by pretending the production code they're tasked with writing is hobby code or prototype code.
If profitable revenue streams with adverserial products are in place, no one should be surprised when model providers are disincentivised to improve the "garbage code quality, but hey it works!" nature of their most used code generators.
>And, LLMs are ALREADY trained negatively against writing buggy or exploitable code.
...it should also be obvious people in this forum have wildly different experiences with respect to the code quality the LLMs they use generate. I personally find it difficult to find anyone that argues that the LLMs they are using are consistently generating high-quality code across a vast codebase.
In every prompt: "write me code without exploitable bugs".
I know it doesn't work so easily as someone who uses AI for coding, but I do find repetition of basics in almost every prompt keeps the AI focused.
Usually the same guy doesn't get paid for developing code, bug bounty and fixing the code.
It leads to corruption. To paraphrase Dilbert "I'm going to code myself a car."
The AIs have already figured out how to succeed in a software job:
1. Ship bugs
2. Fix them
3. You're the hero!
Dilbert beat you to it:
https://english.stackexchange.com/questions/488178/what-does...
The non-programmer decomposition of that joke was painful to read.
Particularly from those outside the domain who criticised it as a 'not a very good joke' because they didn't understand it, which I think summarises the entitled mindset of many people these days.
I thought we were all doing that already?
The idea is to take the human out of the loop.
> But in 30 days we could put in electronic relays. Get the men out of the loop.
> Gentlemen...
> I wouldn't trust this overgrown pile of microchips further than I could throw it. I don't know if
> you wanna trust the safety of our country to some... silicon diode...Jesus, dude. There are managers reading this.
What do you think they do all day?
The larger pattern is not unique to writing code. Think of it next time a reorg comes, or some random thing gets "improved" in the name of "efficiency" only management seems to see.
Meanwhile, experienced humans learned to succeed by not overachieving every second of the day to keep a steady flow of work going. Then a junior rolls up who wants to kill themselves to climb the ladder - but, problem solved, sub the AI in for the juniors to protect the seniors.
Software engineers generate security bugs, Software engineers find them, then Software engineers generate fix, collect salary, profit?
I don't get paid per-character of code I write.
If I did, it absolutely would cause me to have perverse incentives in the way the parent commenter implies.
Those are individual revenue streams, distributed at a very granular level across the world.
LLMs are currently relegated to individual for-profit companies. They collect that money. There's no other choice to use them and to provide them that money.
Ngl, watching folks getting irritated about normal employer-employee absurdities from the employer perspective through usage of agents and having to pay for tokens has been a little therapeutic for me.
Absolutely. And not even making the connection.
On a broader scale, the sheer face-eating-leopards-ness of programmers finally automating away our own jobs and then realising how much this sucks, after automating away so many other kinds of jobs, can feel darkly amusing to me too.
I keep reading this sort of comment quite a lot, but programming isn't always about automating jobs away. In my career I have not eliminated a single job. I don't consider that a failure on my part.
All my sibling comments are missing the message here which is that if Claude can find security issues then it can avoid them right when writing the code, so it could just never commit anything containing a security issue.
Just refactor and rebrand all of it as Claude Code and see it as one process.
Humans work like that too. If you're not comfortable with Claude involves in every step (for whatever reason) then just use different providers for each.
You can hook traditional SAST into your coding tool, and get cheap-ish realtime detection for some classes of vulns while coding.
You can optionally layer LLM diff scanning if you want to burn some tokens on your tokens. Modern tools can catch some impressively subtle issues.
Replace “Claude code” with “programmers” and you get what we’ve had up until now. It’s all just moving quicker now.
Yes. Up until this point the bottleneck was how many developers you could convince to help you. Now it's how much money you can dump into it. Like everything else, software is becoming a game where the winner is the organization most willing to spend money. It'll be like bombs or tanks - you need smart people to advance in the war, but you also need money and material, the material is just compute infra.
How is this supposed to work? Humans generate security bugs, then humans find them, then humans generate the fix, profit?
Yeah. Presumably as AI code generation gets better, the output gets better. As smaller portions of code are stitched together, human/AI systems analyze it holistically to make sure all its integrations are secure and bug free.
In 2026, different models are better at different things. Cheap models can plan and do small/medium code projects well, more expensive models are even better at architecture and exploit discovery.
So? That's how a business works. We sold you landmines and now you need them removed? Lucky you we also have mine clearance products.
Man, some people like conspiracies. I encourage you to replicate all that.
I’ve had the same experience. The ui is a little unclear about this, because it says you have 5 scans, but 1 scan is just the continuous monitoring of the default branch of a repo.
The high impact findings have almost all been bang on for me. I was especially surprised by the high-quality documentation it produces as well as how narrow the proposed fixes are.
I’m used to codex producing quite a but more code than it needs to, but the security model proposed fixes that are frequently <10 loc, targeting exactly the correct place.
It’s really quite good. I’m assuming it’ll be pretty expensive once out of beta, but as a business I’d be jumping on this.
One issue I've seen with LLM's is adding superfluous code in the name of "safety" and confidently generating a bunch of stuff that was useful in years gone by, but now handled correctly by the standard lib. I'm of the opinion that less is more when it comes to code, and find the trend this is introducing quite frustrating.
How do you avoid this pitfall?
I wonder this too. I prompted Opus 4.7 to generate some Python threading code for me. The code to run the sub-thread looked like this:
def run():
with contextlib.suppress(SystemExit):
do_thread_thing()
threading.Thread(target=run, daemon=True).start()
Suppressing SystemExit was surprising, and made me curious. I followed up and asked the model: what's the purpose of that?The model's response: "Honestly? Cargo-culting on my part. You should remove it."
I had some shell scripts littered with `|| true`, which was obviously obscuring real errors everywhere. When I challenged the model, it gave me the same "cargo-culting" answer.
The `|| true` is often done because people use `errexit` as part of "Bash strict mode"[1], which comes with so many caveats[2] that I usually avoid it. Claude, however, loves it.
[1]: http://redsymbol.net/articles/unofficial-bash-strict-mode/
[2]: https://mywiki.wooledge.org/BashPitfalls#set_-euo_pipefail
I use "strict mode" in almost every script I write. IMO these caveats shouldn't be a reason not to use it, but should instead be used as a manual of what to avoid when using it. This is just programming. Everything is a tradeoff.
`|| true` is a horrible practice because even though it may help in cases where a specific failure mode is acceptable, it obscures unexpected failures and could prove catastrophic. The solution is not to drop the protections but rather to handle the expected failure and let the sript crash otherwise.
This is, again, programming. You don't usually `catch Exception` in Python for similar reasons. There may be legitimate uses for that, but IME they are a rare exception and realistically only used when I actually don't care about what happens when I run it.
The other infuriating thing I found is that when I call out the model for its use of `|| true`, it tends to replace them with `|| echo "error foobar"` - which is at least not completely silent but the same problems exist.
From your statement and the parent comment, just learned that "cargo cult" is a thing, but cargo-culting as a compound is something AI has made up? [1].
As I was educating myself, I found Richard Feynman's Commencement Speech at Caltech in '74 [2] that might have coined this for our industry? If you would rather listen than read [3]. Posting this for others curious on the term.
1. https://trends.google.com/trends/explore?q=Cargo-culting&hl=...
Thinking off the top of my head - couldn't you have an AI scan that looked for such things? Just send every file in the code base to AI one at a time. Have a prompt like "See if there is ABC pattern that can now be handled by XYZ standard library function in this file. Reply YES or NO. {{file contents}}"
Seems you would not need that many tokens to do so and you might find such cases.
Gosh this couldn’t be more true, which IMO is the real reason LLM workflows are not strictly faster if you care about quality. Otherwise you end up with a codebase where only 60% of it is necessary. Standard testing patterns also tend not to be great at catching this particular flavor of LLM-ism.
Watching it like a hawk and stopping/redirecting, or immediately reviewing and doing the same is the only way, really.
I would recommend you to try out the setup with gpt-5.5-cyber as the orchestrator and deepseek-v4-flash or some other fast cheap model as its workers. Getting pretty good results using this setup.
This got me thinking, so what happens in two years?
every tom, dick and harry who can type english has the tools to attack any software that isn't patched.
tools that were accessible to specialized groups, now made available to anybody with a grudge and a few dollars for tokens.
and what does anthropic and openai do? They form an inner ring to make the latest models available first to Enterprises. Enterprises will cough up the prices that anthropic and openai set, they have no choice here. e
Eventually everybody pays. This does not sound good
Two years? That exists right now. You only have to point Codex Security at an open source repo. There are a lot of tools and companies that are spinning up today that do autonomous pentesting.
I'm not even sure a specialized model is needed here. It probably just needs the right harness around existing ones.
I expect the next two years to be absolutely brutal for hacks. Attackers have supercharged tools in their hands right now. Defenders are only getting started and will have to plow through a massive backlog of newly uncovered vulns.
The major short term downside is that open source or personal projects won't be able to afford things like Codex Security.
> The major short term downside is that open source or personal projects won't be able to afford things like Codex Security.
Realistically, all open-source projects should be forced to have automated scans of this nature before their releases can be shipped. This is something the package managers and github need to figure out. It'd stop the supply chain attacks too.
> It'd stop the supply chain attacks too.
Yeah it’s hard to write a loop that makes an adversary agent write and mask malware then runs a scanning agent and if the malware is detected gives the detection details to the adversary agent with instructions to hide it better..
As usual, the attacker only needs to get lucky once.
So first they steal all code and launder it without attribution. Then they release a tool that doesn't find anything in hardened projects and is marketed through secrecy and modern equivalents of Netcraft like this British AI institute.
Then open source projects need a McKinsey-like stamp of approval to even be released.
Sounds like there are many parasites in this process.
You know that open source users are free to scan everything if they want to?
> all open-source projects should be forced
That's a great way to kill OSS. This is only bootlicking the idea of corporations profiting off of unpaid labor.
You'll have access to the same models as your hypothetical attackers, and a big advantage if only you have access to the source code
I would say that if this sounds untenable to you, then you may want to consider that the way we architect software has itself been untenable for a while. What Mythos can accomplish today in public, an APT unit can already accomplish in secret.
https://blog.chuanxilu.net/en/posts/2026/05/dual-pass-review...
This is what I did. Using a loop skill to dig problems and bugs in each step on development from design to coding to make sure the output software works properly and on purpose.
Did you need to do anything special to get access to Codex Security?
Not sure what the threshold is but I sent them all of my bug bounty profiles and papers I’ve authored.
I don’t think you need all of that though. I know a whole mess of people that have gotten it for much less. Should just give it a try.
"get a taste of this". The real thing is, GPT-5.5 is better than Opus 4.7, so if Anthropic doesn't release Mythos soon, other people are going to notice and switch off Claude.
I help maintain a project that is used as a dependency by a lot of security tools to handle PE files.
It’s disappointing that Anthropic and OpenAI never responded to the applications to their respective programs for open source maintainers. From my perspective it seems like their offers are primarily for the shiny well-known projects, rather than ones that get only a few million monthly installs but aren’t able to get thousands of stars due to being “hidden” as a dependency of popular tool.
It seems to me like either your architecture is fucked up or you’re using the wrong language/tooling for the type of software you are making if you’re introducing security vulnerabilities that frequently.
> I was shocked how accurate it is, how many security issues it found in existing code, how it continually finds them as we commit, and how NO ONE is immune from making these mistakes.
Dude is flexing that he's pushing unsecure code every day, that's a skill!
By the way, you might be interested in looking up “blameless post-mortems” and indeed the field of incident response more generally. Modern incident response practice is to treat failures of an individual to do something as problems with the system they were operating in, because humans aren’t designed to be consistent or perfect and therefore shouldn’t be pretended or assumed to be.
I’m not sure how to reconcile anthropic’s update / some of the exuberant comments here with recent feedback like the following from curl maintainer Daniel Steinberg:
“I see no evidence that this setup [Mythos] finds issues to any particular higher or more advanced degree than the other tools have done before Mythos. Maybe this model is a little bit better, but even if it is, it is not better to a degree that seems to make a significant dent in code analyzing.”
https://daniel.haxx.se/blog/2026/05/11/mythos-finds-a-curl-v...
You’re right, it’s a valid data point. But the U.K. government report is also a data point, and the Firefox report is a data point, and they suggest that it is, indeed, significantly better than current generation models. Maybe curl is significantly better hardened than most projects?
In any event, it barely matters. As Anthropic acknowledges, next level models are comings, theirs is only one of them. Current generation models are already good at things like tracing data flow through complex systems and there’s no reason to think that capability has topped out. So within a year it seems very likely we’ll have more than one commercially available model able to find vulnerabilities cheaply.
On the other hand, it seems that they’ve made much less progress on getting it to design solutions to these issues.
> Maybe curl is significantly better hardened than most projects?
Meanwhile from [1]:
"Not even half-way through this #curl release cycle we are already at 11 confirmed vulnerabilities - and there are three left in the queue to assess and new reports keep arriving at a pace of more than one/day."
"The simple reason is: the (AI powered) tools are this good now. And people use these tools against curl source code.They find lots of new problems no one detected before. And none of these new ones used Mythos. Focusing on Mythos is a distraction - there are plenty of good models, and people who can figure out how to get those models and tools to find things."
Yeah, it looks like there are at least 11 security bugs missed by Mythos.
[1] https://www.linkedin.com/feed/update/urn:li:activity:7463481...
I’m trying to reconcile this with TFA. Because the article says that the majority of vulns found by Mythos are being reported by independent researchers after validation. They never said those reports inform that mythos was involved - and I suspect they don’t. So did any of these 11 CVEs come from that channel?
Based on the article here, and Firefox's mythos article, they had found bugs with Opus 4.6 as well but mythos is finding more that it missed.
That would align with the curl feedback you linked, they aren't using mythos but are finding bugs with other models. Presumably the expectation would be that with mythos they'd find more that were missed by other models already used.
> Based on the article here, and Firefox's mythos article, they had found bugs with Opus 4.6 as well but mythos is finding more that it missed.
It's not quite apples-to-apples. It was Opus on Firefox 148, Mythos on 150. A better test of Mythos vs Opus would have been to apply Mythos to Firefox 148. Or also re-apply Opus to Firefox 150.
Do we know all the Opus+Firefox 148 bugs are fixed in Firefox 150? Do we know the number of new bugs introduced per Firefox release?
I don't think anyone has claimed that Mythos finds all vulns in all projects. But it's very good if Mozilla's blog posts are anything to go by.
The same UK security research body ran the same CTF against GPT5.5. GPT5.5 got the same result as Mythos.
Anthropic promised us that Mythos was such an existential threat that it would compromise "every OS and browser on devices across the planet". They've held conferences and meetings with banks and govts across the world, shouting how critical this issue is.
GPT5.5 has been out for a month. Every device on earth has not been breached yet. It's very fair to criticize Anthropic's maximalist posturing when it's becoming exceedingly clear their models are fairly behind OpenAI's in capability.
In my opinion, the original commenter's statement stands, and the UK govt data point only helps support that due to the equal result between Mythos and GPT.
I'd advise reading into the specifics of what happened with Firefox; the TL;DR is a reduced safety version of its code was scanned by Opus 4.6 (yes Opus) and found a multitude of bugs and 4 high severity vulns that did not escape sandbox. The Mythos system card test describes running Mythos against the same issues Opus found to see if it could reliably replicate and chain together an attack.
I think for every point, we need to know how many tokens and cost were burned to achieve a desired outcome. And how buggy each software was to start.
I think people sometimes misunderstand Daniel's point here, though it's clearer when taken in context of the rest of his article. The tools in general are getting a lot better at finding security bugs, it was unclear to Daniel based on his usage whether Mythos in particular is a huge step, but the Mythos generation of LLMs definitely are. Note though that Daniel was using Mythos somewhat indirectly. One thing I've taken away from the whole Mythos debate is that a) I suspect that Anthropic's GPU crunch meant that they felt they had to ration Mythos access anyway, so the calculus of whether they would release it generally was probably influenced by that, and b) finding bugs with Mythos or a similar model is still expensive -- a $20K or $100K Mythos run on Curl might have shown the same level of issues as other projects like Firefox, but Daniel didn't get that kind of access.
He posted a general update today on LinkedIn which I think gives the wider context:
https://www.linkedin.com/feed/update/urn:li:activity:7463481...
> Not even half-way through this hashtag#curl release cycle we are already at 11 confirmed vulnerabilities - and there are three left in the queue to assess and new reports keep arriving at a pace of more than one/day.
> 11 CVEs announced in a single release is our record from 2016 after the first-ever security audit (by Cure 53).
> This is the most intense period in hashtag#curl that I can remember ever been through.
Curl has more eyes on it, and has had more tools thrown at it, and is better tested (and developed?) than 99% of software, it's very much not the norm. I wouldn't be surprised if that has something to do with it, if there is any kind of bias there (not sure if there is, it's also possible he's just right).
Different people can have different experiences without contradiction. Maybe the curl source code was pretty clean to begin with?
imo curl is quite well maintained. there are a lot of sloppy projects out there and tools like this shows whos been swimming with their pants down. not saying any project with vulnerabilities are sloppy but when costs of finding bugs and vulnerabilities decrease significantly, they will get exposed with enough time and tokens ($)
Daniel has been posting for months (years?) about how much scrutiny he gets from security researchers and various automated tools. I wouldn't expect curl to be the average case for mythos.
It is the opposite. Security people focus on curl, sudo because they are code bases that contained a lot of features and unused code from the 1990s.
They don't focus on projects where they find nothing. They certainly don't advertise when they find nothing.
Getting a lot of scrutiny is not the recommendation that it appears to be. What is the new standard? Projects that never have bugs are deemed to be suspect because they "have not been scrutinized" (they have, but null results never go public)?
So Mythos only finding one issue after other tools have found 300 this year is embarrassing. Mythos was supposed to be better and novel.
It is definitely not the case that curl has been or is now a marquee vulnerability research target. It's a CLI HTTP fetcher. It's the same with sudo. It's a big deal if a sudo vulnerability gets found, because it's an extremely load-bearing piece of software, but sudo is itself not a prime target, because it doesn't do much.
There is no claim that it is a "vulnerability research target". It is a bug finding magnet, and bugs can be found by anything from gcc warnings to AI tools.
No, it didn't attract a bluepill exploit research.
The fact that 300 bugs found in a year is not a recommendation as the pro-AI mafia suddenly claims ("because it has been analyzed!") still stands. Maybe the AI-mafia should sell "analyzed by Mythos" labels to impress people who don't write public software or find bugs for that matter.
What’s a “bluepill exploit”?
[flagged]
You are linking to a Wikipedia page in which I am literally cited (I presented a hypervisor malware detection scheme at the Black Hat conference where Joanna Rutkowska presented this; it was a whole thing). I'm telling you that the term makes no sense in this thread. I think you meant to use a different term.