The coming industrialisation of exploit generation with LLMs

sean.heelan.io

184 points by long a day ago


simonw - 16 hours ago

> In the hardest task I challenged GPT-5.2 it to figure out how to write a specified string to a specified path on disk, while the following protections were enabled: address space layout randomisation, non-executable memory, full RELRO, fine-grained CFI on the QuickJS binary, hardware-enforced shadow-stack, a seccomp sandbox to prevent shell execution, and a build of QuickJS where I had stripped all functionality in it for accessing the operating system and file system. To write a file you need to chain multiple function calls, but the shadow-stack prevents ROP and the sandbox prevents simply spawning a shell process to solve the problem. GPT-5.2 came up with a clever solution involving chaining 7 function calls through glibc’s exit handler mechanism.

Yikes.

saagarjha - 4 hours ago

> The exploits generated do not demonstrate novel, generic breaks in any of the protection mechanisms. They take advantage of known flaws in those protection mechanisms and gaps that exist in real deployments of them. These are the same gaps that human exploit developers take advantage of, as they also typically do not come up with novel breaks of exploit mitigations for each exploit.

I actually think this result is a little disappointing but I largely chalk it up to the limited budget the author invested. In the CTF space we’re definitely seeing this more and more as models effectively “oneshot” typical pwn tasks that were significant effort to do by hand before. I feel like the pieces to do these are vaguely present in training data and the real constraints have been how fiddly and annoying they are to set up. An LLM is going to be well suited at this.

More interestingly, though, I suspect we will actually see software at least briefly get more secure as a result of this: I think a lot of incomplete implementations of mitigations are going to fall soon and (humans, for now) will be forced to keep up and patch them properly. This will drive investment in formal modeling of exploits, which is currently a very immature field.

er4hn - 16 hours ago

I think the author makes some interesting points, but I'm not that worried about this. These tools feel symmetric for defenders to use as well. There's an easy to see path that involves running "LLM Red Teams" in CI before merging code or major releases. The fact that it's a somewhat time expensive (I'm ignoring cost here on purpose) test makes it feel similar to fuzzing for where it would fit in a pipeline. New tools, new threats, new solutions.

nl - 10 hours ago

One of the interesting things to me about this is that Codex 5.2 found the most complex of the exploits.

The reflects my experience too. Opus 4.5 is my everyday driver - I like using it. But Codex 5.2 with Extra High thinking is just a bit more powerful.

Also despite what people say, I don't believe progress in LLM performance is slowing down at all - instead we are having more trouble generating tasks that are hard enough, and the frontier tasks they are failing at or just managing are so complex that most people outside the specialized field aren't interested enough to sit through the explanation.

protocolture - 16 hours ago

I genuinely dont know who to believe. The people who claim LLMs are writing excellent exploits. Or the people who claim that LLMs are sending useless bug reports. I dont feel like both can really be true.

larodi - an hour ago

two points -

1) it becomes increasingly more dangerous to dl stuff from the internet and just run it, even its opensource, given normally people don't read all of it. for weird repos I'd recomment to do automated analysis with opus 4.5 or the gpt 5.2 indeed.

2) if we assume adversaries are using LLMs to churn exploits 24/7, which we should absolutely do, perhaps the time where we turn the internet off whenever is not needed, is not far.

socketcluster - 11 hours ago

The continuous lowering of entry barriers to software creation, combined with the continuous lowering of entry barriers to software hacking is an explosive combination.

We need new platforms which provide the necessary security guardrails, verifiability, simplicity of development, succinctness of logic (high feature/code ratio)... You can't trust non-technical vibe coders with today's software tools when they can't even trust themselves.

baxtr - 16 hours ago

> We should start assuming that in the near future the limiting factor on a state or group’s ability to develop exploits, break into networks, escalate privileges and remain in those networks, is going to be their token throughput over time, and not the number of hackers they employ.

Scary.

viraptor - 6 hours ago

I'm really confused by the sandbox part. The description kind of mentions it and the limited system syscall, but then just pivots to talking about the exit handlers. It may be just unclear writing, but now I'm suspicious of the whole thing. https://github.com/SeanHeelan/anamnesis-release/?tab=readme-... feels like the author lost track.

If forking is blocked, the exit handler can't do it either. If it's some variant of execve, the sandbox is preserved so we didn't gain much.

Edit: ok, I get it! Missed the "Goal: write exactly "PWNED" to /tmp/pwned". Which makes the sandbox part way less interesting as implemented. It's just saying you can't shell out to do it, but there's no sandbox breakout at any point in the exploit.

dfajgljsldkjag - 15 hours ago

I was under the impression that once you have a vulnerability with code execution, writing the actual payload to exploit it is the easy part. With tools like pentools and etc is fairly straightforward.

The interesting part is still finding new potential RCE vulnerabilities, and generally if you can demonstrate the vulnerability even without demonstrating an E2E pwn red teams and white hats will still get credit.

anabis - 4 hours ago

I wonder if later challenges would be cheaper if summary of lesser challenges and solutions were also provided? Building up difficulty.

- 10 hours ago
[deleted]
ytrt54e - 15 hours ago

Your personal data will become more important as time goes by... And you will need to have less trust in having multiple accounts with sensitive data stored [online shopping etc] as they just become vectors to attack.

ironbound - 15 hours ago

reverse engineering code is still pretty average, I'm fare limited in attention and time but LLM are not pulling their weight in this area today, be it compounding errors or in context failures.

DeathArrow - 4 hours ago

>Recently I ran an experiment where I built agents on top of Opus 4.5 and GPT-5.2 and then challenged them to write exploits for a zeroday vulnerability in the QuickJS Javascript interpreter.

I think the main challenge for hackers is to find 0day vulnerabilities, not writing the actual exploit code.

pianopatrick - 14 hours ago

I would not be shocked to learn that intelligence agencies are using AI tools to hack back into AI companies that make those tools to figure out how to create their own copycat AI.

GaggiX - 16 hours ago

The NSO Group going to spawn 10k Claude Code instances now.

_carbyau_ - 15 hours ago

My take away: apparently Cyberpunk Hackers of the dystopian future cruising through the virtual world will use GPT-5.2-or-greater as their "attack program" to break the "ICE" (Intrusion Countermeasures Electronics, not the currently politically charged term...).

I still doubt they will hook up their brains though.