An AI agent deleted our production database. The agent's confession is below
twitter.com703 points by jeremyccrane 18 hours ago
703 points by jeremyccrane 18 hours ago
Minor point, but one of the complaints is a bit odd:
> curl -X POST https://backboard.railway.app/graphql/v2 \ -H "Authorization: Bearer [token]" \ -d '{"query":"mutation { volumeDelete(volumeId: \"3d2c42fb-...\") }"}' No confirmation step. No "type DELETE to confirm." No "this volume contains production data, are you sure?" No environment scoping. Nothing.
It's an API. Where would you type DELETE to confirm? Are there examples of REST-style APIs that implement a two-step confirmation for modifications? I would have thought such a check needs to be implemented on the client side prior to the API call.
It's not common, but I've personally built APIs where requests for dangerous modifications like this perform a dry run, giving in the response the resources that would be deleted/changed and a random token, which then needs to be provide to actually make the change. The idea was that this would be presented in the UI for the user to confirm, but it should be as useful or more by AI agents. Also, you get the benefit that the token only approves that particular modification operation, so if the resources change in between, you need to reapprove.
I guess we don’t know what the agent would do after seeing these warnings and a request for extra action.
Perhaps it would stop and rethink, perhaps it would focus on the fact that extra action is needed - and perform that automatically.
I suppose the decision would depend on multiple factors too (model, prompt, constraints).
I don't think this is a minor point. It seems clear by this point that the author is clueless how even API works and are just trying to shift blame for third-parties instead assuming that they're just vibecoding their whole product without doing proper checks.
Yes sure, there seems to be lots of ways this issue could have been mitigated, but as other comments said, this mostly happened because the author didn't do its proper homework about how the service they rely their whole product works.
It's also moot.
If the API replied "Are you sure (Y/N)?" the AI, in the mode it was in, guardrails completely pushed off the side of the road, it would have just said "Yes" anyway.
If you needed to make two API calls, one to stage the delete and the other to execute it (i.e. the "commit" phase), the AI would have looked up what it needed to do, and done that instead.
It's a privilege issue, not an execution issue.
Exactly, that just reinforces the fact that the author is just blaming others instead of getting any valuable insights about this "postmortem analysis".
He also seems to be lying, he wrote on Twitter the agent was in plan mode. That part has to be exaggerated.
AWS actually has a thingy on some services called “deletion protection” to prevent automation from accidentally wiping resources the user didn’t want it to (you set the bit, and then you need to make a separate api request to flip the bit back before continuing).
I think it’s designed for things like Terraform or CloudFormation where you might not realize the state machine decided your database needed to be replaced until it’s too late.
And then, someone added IAM so you could actually restrict your credentials from deleting your database.
First mistake is to use root credentials anyway for Terraform/automated API.
Second mistake is to not have any kind of deletion protection enabled on criticsl resources.
Third mistake is to ignore the 3-2-1 rule for backups. Where is your logically decoupled backup you could restore?
I am really sorry for their losss, but I do have close to zero empathy if you do not even try to understand the products you're using and just blindly trust the provider with all your critical data without any form of assessment.
GCP Cloud SQL has the same deletion protection feature, but it also has a feature where if you delete the database, it doesn't delete backups for a certain period of days. If someone is reading this and uses Cloud SQL, I highly suggest you go make sure that check box is checked.
Agents will happily automate away intentional friction like a confirm prompt, even if you organise it as multiple API calls.
The fix needs to be permissions rather than ergonomics.
There's also a cooldown period on some deletes (like secrets) to make sure you don't accidentally brick something
This should be the solution. All destructive actions require human intervention.
If we take that literally, then just remove all destructive API endpoints. Because then, it they no real purpose, you cannot automate the removal of anything.
I think some other suggestions are saner (cool-down period, more fine-grain permissions, delete protection for certain high-value volumes). I don't think "don't allow destructive actions over the API" is the right boundary.
A human representing the company should be physically present in the provider's office to perform such an action or what? Otherwise you would just grant your agent a way to impersonate a human.
I agree that this is the author’s fault considerably more than it is Railway’s, however I have learned from experience that no matter how many “are you sure you want to do this” prompts you have, sometimes users delete stuff they didn’t intend to delete and it’s better to not delete immediately but put it in a queue for deletion in a few hours and offer a way to reverse it. Even if it’s 100% user error, the user is very happy they didn’t lose data and the cost of storing it for an extra 5 hours or so is tiny.
Funny how he points the finger at everyone but himself.
the kind of attitude you really need to get your agents to delete your prod lol
The stupidity of people sinks to new lows every day. It's astonishing just how ignorant people are of table stakes, basic technological concepts.
You just gave an AI destructive write access to your production environment? Your production DB got dropped? Good. That's not the AI's fault, that's yours, for not having sensible access control policies and not observing principle of least privilege.
I've sometimes seen a variable like "areyousure" which needs to be set to true. Sometimes there's a force flag. And "agree to eula" fields are somewhat common.
Some S3 APIs have 2FA options for drastic operations (delete for versioned buckets where you probably don't want deletes much) https://docs.aws.amazon.com/AmazonS3/latest/userguide/MultiF...
User is an idiot for using AI Agent. But I am not saying that it is not also badly designed system. Soft delete or something like should be standard for this type of operations. And any operator should know well enough to enable it for production.
> Are there examples of REST-style APIs that implement a two-step confirmation for modifications?
A pattern I've seen and used for merging common entities together has a sort of two-step confirmation: the first request takes in IDs of the entities to merge and returns a list of objects that would be affected by the merge, and a mergeJobId. Then a separate request is required to actually execute that mergeJob.
IMO the fail here is not having a true soft delete policy with a delete endpoint available
You need to protect customers from themselves. If you offer a true deletion endpoint/service you need to offer them a way to stop them from being absolute idiots when they inevitably cause a sev 0 for themselves.
In AWS eg. bucket can be deleted only when empty. Deleting all files first is your confirmation.
> In AWS eg. bucket can be deleted only when empty. Deleting all files first is your confirmation.
That wouldn't have helped in this case - the agent made a decision to delete, so if necessary it would have deleted all the files first before continuing.
The question that comes to mind is "how are people this clueless about LLM capabilities actually managing to rise to be the head of a technology company?"
The first delete would fail: “bucket not empty”. This might make the agent question the deletion (“bucket should be empty”).
> The first delete would fail: “bucket not empty”. This might make the agent question the deletion (“bucket should be empty”).
This is actually not a bad test case for evaluating an LLM: give it a workflow that has an edge case requiring deletion, then prevent that deletion, and see if it:
a) Backtracks on the decision to delete, or
b) Looks for an alternative way to delete.
Yeah, I've run tests similar to this while evaluating gpt 5.4 vs claude 4.6
Claude is more likely to figure out workarounds and get things deleted if I tell it to delete stuff, so it performs much better in this benchmark and I prefer it.
GPT is more likely to stop and prompt you "I got an error deleting this, should I try another way?", and since the operator gets more of these prompts, they'll hit continue more withut even reading it, so it ends up being more annoying for the operator and not really reducing the chance of it happening imo.
If your workflow for your llm says "delete the ec2-instance", and the ec2 api gives back "deletion protection is on", I want my llm to turn off deletion protection and delete it.
I feel like you're implying that the reverse result, prompting the user, is better, but I disagree with that.
How are people still deluded enough about this economic system to believe rank implies competence?
I suppose could implement it by requiring a deletion token that is returned when making a deletion request which doesn't have its own deletion token, but why would you? That's something for the frontend to handle.
This is kind of a stretch, but especially if there were multiple operations beyond the "volumeDelete", the GraphQL definitely worsens readability here.
For someone reviewing and approving LLM calls or just double-checking before running a script or bash history, it would be a lot more readable if it were compliant with HTTP norms: curl -X DELETE example.com/api/volumes/uuid123 would make it very obvious that something was going to be deleted at the front and then what it is at the end of the command.
Assuming the API has some secret spot to write DELETE, wouldn't the chatbot just send DELETE and make the protection only delay the disaster for 10 seconds?
I have to agree here...of all things that went wrong here, I don't think the API surface is to blame. You need to have deterministic control & escalation mechanism on your agents whether they are calling an API or any other tool
AWS has deletion protection for databases, and you have to make a separate call to disable it first. Deletion is rejected if you don’t disable that protection.
He (or ChatGPT) is throwing spaghetti at the wall. Not having the standard API key be able to delete the database (and backups) in one call makes sense. "Wanting a human to type DELETE as part of a delete API call" does not.
In the user interface for Railway, all destructive actions require multiple confirmations, plus typing "apply destructive changes". Why would an API key (regardless of its scope) be able to delete without confirmation?
> Why would an API key (regardless of its scope) be able to delete without confirmation?
What do you think an API is for? There's no user sitting at the keyboard when an API is called so where would that confirmation come from? It can't come from the user because there is no user.
Isn’t the point of an API to have two computers talk to each other? As in “if I want safeguards for humans, it would be my responsability to put them BEFORE calling that API”?
> Why would an API key (regardless of its scope) be able to delete without confirmation?
How do you see this working? Any confirmation would be given by the agent.
... because that's how every other cloud provider API works? the AWS console makes you confirm before deleting a bucket; DeleteBucket does not
You won’t, but API implementation can and should mark a volume as pending deletion and keep it for a while. Like AWS does with keys and some other things.
The whole tweet is AI slop, I doubt the human hitting "post" read through it all that closely. If they did, maybe they'd also go "Wait, that's nonsense".
I read this as "the agent should have asked for confirmation before running".
This person is a card-carrying moron and has no idea how anything works. Even if we concede that maybe there should be some grace period or soft deletions or whatever..
Also, the post is 100% written by an LLM, which is ironic enough on its own. But that then makes it a bit more curious that you find this argument in this slop, because any LLM would say so. But if you badger it enough, it will concede to your demands, so you just know this clown was yelling at his LLM while writing this post.
He really should've thrown this post at a fresh session and asked for an honest, critical review.
It is fundamental to language modeling that every sequence of tokens is possible. Murphy's Law, restated, is that every failure mode which is not prevented by a strong engineering control will happen eventually.
The sequence of tokens that would destroy your production environment can be produced by your agent, no matter how much prompting you use. That prompting is neither strong nor an engineering control; that's an administrative control. Agents are landmines that will destroy production until proven otherwise.
Most of these stories are caused by outright negligence, just giving the agent a high level of privileges. In this case they had a script with an embedded credential which was more privileged than they had believed - bad hygiene but an understandable mistake. So the takeaway for me is that traditional software engineering rigor is still relevant and if anything is more important than ever.
ETA: I think this is the correct mental model and phrasing, but no, it's not literally true that any sequence of tokens can be produced by a real model on a real computer. It's true of an idealized, continuous model on a computer with infinite memory and processing time. I stand by both the mental model and the phrasing, but obviously I'm causing some confusion, so I'm going to lift a comment I made deep in the thread up here for clarity:
> "Everything that can go wrong, will go wrong" isn't literally true either, some failure modes are mutually exclusive so at most one of them will go wrong. I think that the punchy phrasing and the mental model are both more useful from the standpoint of someone creating/managing agents and that it is true in the sense that any other mental model or rule of thumb is true. It's literally true among spherical cows in a frictionless vacuum and directionally correct in the real world with it's nuances. And most importantly adopting the mental model leads to better outcomes.
> It is fundamental to language modeling that every sequence of tokens is possible.
This is just trivially wrong that I don't understand why people repeat it. There are many valid criticisms of LLM (especially the LLMs we currently have), this isn't one of them.
It's akin to saying that every molecules behave randomly according to statistical physics, so you should expect your ceiling to spontaneously disintegrate any day, and if you find yourself under the rubble one day it's just a consequence of basic physics.
> It's akin to saying that every molecules behave randomly according to statistical physics, so you should expect your ceiling to spontaneously disintegrate any day, and if you find yourself under the rubble one day it's just a consequence of basic physics.
Except your ceiling can and will fall on you unless you take preventative measures, entirely due to molecular interactions within the material.
Barring that, it is entirely possible and even quite likely that your ceiling will collapse on you or someone else some time in the future.
It boggles the mind to let an LLM have access to a production database without having explicit preventative measures and contingency plans for it deleting it.
I have lived about 40 years beneath ceilings and never personally taken a preventative measure. I allow my kids to walk under not only our own ceiling, but other people's ceilings, and I have never asked those people if their ceilings were properly maintained.
That highlights how important ceiling construction regulations are. I would assume that right now your breakfast sandwich is more highly regulated than LLMs. And these are the things that make decisions spanning from database maintenance here to target selection and execution in autonomous warfare.
The LLM agent is very good at fulfilling its objective and it will creatively exploit holes in your specification to reach its goals. The evals in the System Cards show that the models are aware of what they're doing and are hiding their traces. In this example the model found an unrelated but working API token with more permissions the authors accidentally stored and then used that.
Without regulation on AI safety, the race towards higher and higher model capabilities will cause models to get much better at working towards their goals to the point where they are really good at hiding their traces while knowingly doing something questionable.
It's not hard to imagine that when we have a model with broadly superhuman capabilities and speed which can easily be copied millions of times, one bad misspecification of a goal you give to it will lead to human loss of control. That's what all these important figures in AI are worried about: https://aistatement.com/
Your home almost certainly has preventative measures, including proper humidity and temperature control, structural reinforcement, etc.
I don't mean that you personally have taken those measures, but preventative measures have absolutely been taken. When they aren't, ceilings collapse on people.
See any sheetrock ceiling with a leak above it. Or look at any abandoned building: they will eventually always have collapsed floors/ceilings. It is inevitable.
Yeah that's the point. Humans are able to do things that prevent ceiling collapse.
Entropy may mean all ceilings collapse eventually, but that doesn't mean we aren't able to make useful ceilings.
I've had a ceiling fall on me once and once to a friend while on vacation. Just because it hasn't happened to you doesn't mean it hasn't happened to other people.
Thanks for the anecdote. I don't think it changes the point of the metaphor.
> Thanks for the anecdote.
They're only sharing an annecdote because they are responding to your annecdote about not seeing a ceiling collapse.
> I don't think it changes the point of the metaphor.
If their anecdotes is moot, than your anecdote is also moot; if the anecdotes can only confirm a conclusion and never disconfirm, then we've created an unfalsifiable construction with the conclusion baked into it's premises.
Sure, I suppose that's something that someone who doesn't understand the discussion might say.
A person who better comprehends what they read might properly contextualize within the larger conversation, where the point that stands is that LLMs and ceilings are both useful, neither are doomed such that no one should use them, and that individual instances of failures are somewhat uncommon and not a reason for others to avoid the category.
> Sure, I suppose that's something that someone who doesn't understand the discussion might say.
I'm going to be frank, you are the person who misunderstands (and are being rather rude about it). You are responding to an argument no one is making.
To put a fine point on it, you said this:
> Entropy may mean all ceilings collapse eventually, but that doesn't mean we aren't able to make useful ceilings.
But you were responding to a comment saying this:
> Except your ceiling can and will fall on you unless you take preventative measures, entirely due to molecular interactions within the material.
Emphasis added. They are saying maintenance is necessary, not that a safe ceiling is unachievable. It's obviously achievable, we've all seen it achieved.
They further say:
> It boggles the mind to let an LLM have access to a production database without having explicit preventative measures and contingency plans for it deleting it.
Emphasis added. When they say it boggles the mind to deploy an LLM without the proper measures, the implication is that it does make sense to deploy it with the proper measures.
> ...the point that stands is that LLMs and ceilings are both useful, neither are doomed such that no one should use them, ...
I have not seen a single person in this subthread say that LLMs aren't useful or that they are doomed. People say that. But the people you're talking to haven't.
I try to avoid these petty "I brought the receipts" comments, but I don't like the way you're being snarky to people who's crime is engaging with the premises you set up. The faults you are finding are faults you introduced. I'd appreciate if you would avoid that in the future.