I built a vulnerable app and spent $1,500 seeing if LLMs could hack it
kasra.blog402 points by jc4p 13 days ago
402 points by jc4p 13 days ago
One interesting takeaway is the low score on Anthropic models from this benchmark. It’s not because of capability, it’s because Anthropic’s guardrails prevented it from solving the problem.
I noticed with each model release Anthropic constrains the model more security wise. Its propensity to refuse doing legitimate work has been increasing. It now puts up more resistance around performing logins, handling credentials on behalf of the user, etc.
For myself, it’s already gotten to the point where it has mildly affected the usefulness of the model. If I bump on some action I want it to do I can usually work around it, but I suspice the ability to do so will close with each new release. Eventually I’ll reach a point where I am forced to choose between the useful aspects of the model and the limiting ones instead of just picking the most capable model out there
Eventually these models will significantly suffer from overfitting to the least common denominator. If I have this beautiful deterministic setup that swaps secrets out in flight so the LLM never sees them, I’m going to be really annoyed when the LLM still won’t send them out because it is trained to deal with the 99% of people just doing the dumb thing
> Eventually I’ll reach a point where I am forced to choose between the useful aspects of the model and the limiting ones instead of just picking the most capable model out there
No, the choice will be whether or not to to upgrade to "Claude Security Professional" or whatever they want to brand it as.
What look like tightening "constraints" today are just setting up the upsell opportunities of tomorrow.
And next month you'll need to add on "Claude Database Pro" or you'll just get a working (for demo purposes with dozens of db rows) but completely un indexed database schema and a refusal to optimise SQL requests.
And the month after you'll need "Claude DataScience Pro" to get any Python Pandas or NumPy code generated.
And and and...
While this is a perfectly reasonable thing to expect when the models are competent enough, half the conversation on places like Hacker News are about all the times an LLM has produced garbage that was harmful to a business either by hallucinations, by deleting something critical during the work, or by hitting some endpoint way too often and denial-of-servicing it.
Right now, the software guardrails in LLMs are useful for the same kinds of reasons factories have hardware guardrails: to reduce the rate at which errors become "incidents".
Just because they sometimes delete the production database rather than sometimes spilling a thousand tons of incandescent molten metal over a factory floor, doesn't mean LLMs are safe enough to be used the way they're actually being used.
https://simonwillison.net/2025/Dec/10/normalization-of-devia...
I think you're assuming too much care. Right now they haven't adopted that business model because they don't see it as a viable business model. As soon as they realize that they can lock certain categories of query behind a different subscription they will do that. We saw the same thing with streaming services and basically every other kind of online service -- small, singular subscription followed by a gold rush and then suddenly there's an upcharge for access to every other publisher's catalog of movies.
That kind of thing is basically why I wrote the opening clause of the first sentence.
i.e., yeah, probably.
This is why I'm thankful for Chinese LLM research. They'll keep us honest.
Well, I'd prefer bugfixes over exploit vectors. This will keep us honest.
With pi or better omp it would be incredibly easy to adjust the Claude system prompt so it will be easy to do what the Chinese models or gpt did. That's how the Chinese were training their models btw
Same thing with the weird push towards humanoid robots.
"They can do anything!"
Sure, once you subscribe to the $15/mo laundry package, the $25/mo lawn care package (with the $10/mo hedge trimmer upgrade), and the $10/mo dog-walking package.
And in the end the big reveal is, it was a dude in VR all along, piloting the dumb things remotely. Every single time, without exception.
When we are stabbed to death by impoverished dudes who are piloting a robot worth more than a decade of their income to do household chores for 16 hours a day, we will deserve it.
I think it’s just riding off LLM coattails.
We don’t have good world models. We have had bipedal robotics in various POC demo-ready forms for decades.
It turns out that industrial, purpose build robotics is an easier and better market.
I’m still not completely convinced a robot that’s shaped like a human is the best design other than for PR.
I remember nearly losing my mind at that stupid conveyor belt sorting demonstation because
1. The human beat the robot, but more importantly
2. We've had non-humanoid conveyor belt sorting machinery for decades that beats both
Isn't this inline with trying to leave no money on the table?
I'd hate it, sure, but it wouldn't surprise me.
> What look like tightening "constraints" today are just setting up the upsell opportunities of tomorrow.
I don't buy this, because is predicated on staying permanently far ahead of the open weights models.
If in the future Anthropic fully stops you from doing security research, you can be sure some other provider will sell you an 'unshackled' DeepSeek v8 Pro...
> I don't buy this, because is predicated on staying permanently far ahead of the open weights models.
In my mind, that fits exactly how the SOTA labs think today about what they're doing, they're all both working towards and expecting to stay permanently ahead of FOSS, otherwise they'd change their tune really quickly, if they didn't think that was possible.
Sure, you might be able to use DeepSeek V8 Pro instead for the same purposes, but that'll hardly stop Anthropic from trying to sell bundles of use cases instead and claim it's "ethical AI", "Patriotic AI" or some marketing terms like that.
> fits exactly how the SOTA labs think today about what they're doing, they're all both working towards and expecting to stay permanently ahead of FOSS
They are just straight up delusional, no? Or at least, have a vested financial interest in maintaining said delusion until the money runs out. They have to hit the point of diminishing returns at some point...
> They are just straight up delusional, no?
Well, I guess that's one way to put it. Another is "dress for the job you want", startup culture typically seems to shove people in the direction of "aim big and believe in yourself, regardless of what others say" so naturally you get these companies who seem very disconnected from reality.
I'd also wager a guess that the amount of money makes people's reasoning and perspectives get very messed up as well, for better or worse.
FYI there are no FOSS LLMs
> FYI there are no FOSS LLMs
FYI there is and been for a long time. Won't claim they're SOTA, but they exists. From the top of my head, I think Olmo (https://allenai.org/olmo) was pretty early, but been more since then too.
I agree most releases today that claim to be "open source" actually aren't, but that doesn't mean "FOSS LLMs" don't exists at all.
What? You can't give access to that kind of power to just anyone with $5,000/month.
These people should be trained and licensed before they get access. Thankfully, Anthropic has worked with regulators to develop the appropriate courses to maintain your license -- don't worry, the series is cheap when you buy all up through OT XVII. And because Anthropic has been approved as Security Overseer, we will take care of reporting back to the license bureau on our monitoring of your work to ensure you meet your ongoing license responsibilities and are able to keep your license.
Which regulators? You know, the new agency led by several of our former mid-level executives. With relationships like that, we were honored to lead the Industry Coalition that donated the final-draft regulations.
>What look like tightening "constraints" today are just setting up the upsell opportunities of tomorrow.
on the one hand agree, but on the other hand think it's reasonable in that they can then verify the person allowed to purchase access to that model is in fact a Security professional and should be allowed to do stuff like crack security.
So, supposing it's true that these models completely change the security field and humans are ~obsolete other than as pilots guiding them what to crack, you think it's reasonable that Anthropic and OpenAI should unilaterally determine who gets to be a security professional? I hope you do understand that is what you are suggesting.
Why should anyone get to determine that? Do people really want us to move to an exclusionary guild system? I thought the experience with proprietary versus open source over the past 30 years had driven home the point that closed ecosystems are almost always far worse for security.
> the experience with proprietary versus open source over the past 30 years had driven home the point that closed ecosystems are almost always far worse for security.
Has it? Can you prove it? I've been using computers for almost 40 years. I've seen foss-enthusiasts repeat that claim ad-nauseam, without proof. All they ave is the vague, hand-wavy, "millions of people read the code!!11".
I use both proprietary and foss software. I write both proprietary and foss software. I have not noticed a meaningful difference in security.
Then I think you haven't been paying attention. We regularly see examples of companies attempting to cover up vulnerabilities, attacking security researchers, dragging their feet on fixes, etc. Meanwhile you can easily see for yourself how long it takes various FOSS projects to get patched and often what the attitude of the devs is.
You can also take an aggregate view. Presumably skilled developers working on major projects should be expected to have similar rates of security issues. So compare CVE frequency between various FOSS and closed source projects.
Additionally, even if there is a guild - no guild ever let a vendor pick and choose what their capabilities were, that would be insanely dumb.
Vendors choose what capabilities they create and sell literally all day every day.
A more charitable interpretation might be that a guild would not be expected to passively allow such a situation to continue to exist. I think you'd expect a guild to directly contract for the desired tools or failing that to move into production themselves.
Sure! And Anthropic isn't preventing other people from making offensive cyber models.
"The guild" is absolutely free to go seek other vendors if Anthropic declines to sell to them.