The Other Half of AI Safety

personalaisafety.com

48 points by sofiaqt 2 hours ago


nojs - an hour ago

> Every week, somewhere between 1.2 and 3 million ChatGPT users, roughly the population of a small country, show signals of psychosis, mania, suicidal planning, or unhealthy emotional dependence on the model.

> Why is mental-health crisis not a gating category, the kind where the conversation stops, full stop, and the user is routed to a human?

Well, obviously “routing to a human” is not feasible at that scale. And cold exiting the conversation is probably worse for the user than answering carefully.

Legend2440 - 2 hours ago

I don't buy that chatGPT is actually doing these users any harm.

I think openAI is doing the best they reasonably can with a very difficult class of users, whose problems are neither their fault nor within their power to fix.

ngruhn - 2 hours ago

The bad cases make headlines. But I think it's quite possible that AI is helping a lot of people in distress. Many people are uncomfortable opening up to humans, or have no one to talk to, or can't afford to fork over whatever-hourly-rate a therapist takes.

ianbutler - 2 hours ago

OpenAI has 900 million weekly active users. So around 0.01% are having problems. That's actually way less than population level measures for the same symptoms on a bigger percentage of people relative to the US on just suicidal ideation alone.

https://www.cdc.gov/mmwr/volumes/74/wr/mm7412a4.htm

Yokohiii - 29 minutes ago

I don't think that governments or civil society at large have found a good balance about mental health. Expecting profit oriented companies to be on par or better is weird.

Don't get me wrong, mental health is important and should be considered and improved. But companies wont do it just for the sake of it.

timf34 - an hour ago

I sympathize with the piece, evaluating how LLMs interact with mentally vulnerable users is something I've been actively working on: https://vigil-eval.com/

The biggest observation so far is that the latest models are night and day from LLMs from even 6 months ago (from OpenAI + Anthropic, Google is still very poor!)

mbgerring - 43 minutes ago

“AI safety” as it’s understood today is an entire faith-based belief system, incubated in a cult-like community with a high propensity for drug abuse and mental illness, over more than a decade.

The reason that real-world harms caused by AI can’t get a hearing in what is now the mainstream AI safety community is that these harms were never part of the core tenets of the cult.

Best of luck to anyone working on reality-based AI harm reduction, you have many hard battles in front of you.

adampunk - 2 hours ago

>Why is mental-health crisis not a gating category, the kind where the conversation stops, full stop, and the user is routed to a human?

there aren't enough humans.

b65e8bee43c2ed0 - an hour ago

the big labs could crank up their (brand) safety dials to the point where their chatbots give GOODY-2 responses to everything beyond PG13, and guess what? there are a hundred other services available, built upon Chinese models 5-10% behind Western SOTA.

it is no longer 2023. let go of whatever delusions you might hold about unopenining this Pandora's box.

photochemsyn - an hour ago

The ‘tobacco warning label’ approach sounds good but I’m not sure if it stopped that many people from smoking or was just a means for corporations to limit their liability. Corporate culture being what it is, having warnings like the following pop up every time a client opens an LLM app would not be that popular in the C-suite. Possible examples:

AI MENTAL SAFETY WARNING:

> This chatbot can sound caring, certain, and personal, but it is not a human and cannot protect your mental health. It may reinforce false beliefs, emotional dependence, suicidal thinking, manic plans, paranoia, or poor decisions. Do not use it as your therapist, only confidant, crisis counselor, doctor, lawyer, or source of reality-testing.

AI TECHNICAL SAFETY WARNING

> This AI may generate plausible but destructive technical instructions. Incorrect commands can erase data, expose secrets, compromise security, damage systems, or brick hardware. Never run commands you do not understand. Always verify AI-generated code, scripts, and shell commands before execution.

Now, if I’m running my own open-source model on my own hardware, I can’t really blame the model if I myself make bad decisions based on its advice - that’s like growing your own tobacco from seed in your garden, drying and curing it, then complaining about the health effects after you smoke it. If I give it agentic capabilities on my LAN without understanding the risks, same old story - with great power comes great responsibility.

wilg - 2 hours ago

> Why is mental-health crisis not a gating category, the kind where the conversation stops, full stop, and the user is routed to a human? This is one of many questions I can’t find concrete answers for.

I don't know if there are studies or concrete data either way, but it seems at least plausible that continuing the conversation could be more effective (read: saves more lives) than stopping it.

adamnemecek - an hour ago

Autodiff is preventing any meaningful discussion about safety, systems trained with autodiff cannot be made safe.

avazhi - an hour ago

If you are using LLMs for emotional support or social interactions, you’ve got personal problems and that isn’t on the LLM provider to babysit. Same with people who unironically pay for OnlyFans or whatever.

I don’t even work in tech and I detest the Facebook/Zuckerbergs of the world but it’s obnoxious and trite seeing tech companies get scapegoated for what are ultimately social and societal problems, not tech problems.

As a solution it’d prob make sense to start with how disconnected most modern families are in terms of support and accountability.

From ChatGPT to Instagram, tech companies follow the contours of how society already operates.

simonw - 2 hours ago

"There is no independent audit, no time series, no disclosed methodology, so we have no idea whether the real figure is higher, whether it is growing, or how it compares across the other frontier models, none of which publish equivalent data."

Tip for writers: aggressively filter out the "no X, no Y, no Z" pattern from your writing. Whether or not you used AI to help you write it's such a red flag now that you should be actively avoiding it in anything you publish.