Sycophancy is the first LLM "dark pattern"
seangoedecke.com163 points by jxmorris12 a day ago
163 points by jxmorris12 a day ago
LLMs get over-analyzed. They’re predictive text models trained to match patterns in their data, statistical algorithms, not brains, not systems with “psychology” in any human sense.
Agents, however, are products. They should have clear UX boundaries: show what context they’re using, communicate uncertainty, validate outputs where possible, and expose performance so users can understand when and why they fail.
IMO the real issue is that raw, general-purpose models were released directly to consumers. That normalized under-specified consumer products, created the expectation that users would interpret model behavior, define their own success criteria, and manually handle edge cases, sometimes with severe real world consequences.
I’m sure the market will fix itself with time, but I hope more people would know when not to use these half baked AGI “products”
because they wanted to sell the illusion of consciousness, chatgpt, gemini and claude are humans simulator which is lame, I want autocomplete prediction not this personality and retention stuff which only makes the agents dumber.
Since their goal is to acquire funding, it is much less important for the product to be useful than it is for the product to be sci-fi.
Remember when the point was revenue and profits? Man, those were the good old days.
You hit the nail on the head. Anyone who's been working intimately with LLM's comes to the same conclusion. the llm itself is only one small important part that is to be used in a more complicated and capable system. And that system will not have the same limitations as the raw llm itself.
To say they LLMs are 'predictive text models trained to match patterns in their data, statistical algorithms, not brains, not systems with “psychology” in any human sense.' is not entirely accurate. Classic LLMs like GPT 3 , sure. But LLM-powered chatbots (ChatGPT, Claude - which is what this article is really about) go through much more than just predict-next-token training (RLHF, presumably now reasoning training, who knows what else).
> go through much more than just predict-next-token training (RLHF, presumably now reasoning training, who knows what else).
Yep, but...
> To say they LLMs are 'predictive text models trained to match patterns in their data, statistical algorithms, not brains, not systems with “psychology” in any human sense.' is not entirely accurate.
That's a logical leap, and you'd need to bridge the gap between "more than next-token prediction" to similarity to wetware brains and "systems with psychology".
they are human in the sense they are reenforced to exhibit human like behavior, by humans. a human byproduct.
Is the solution to sycophancy just a very good clever prompt that forces logical reasoning? Do we want our LLMs to be scientifically accurate or truthful or be creative and exploratory in nature? Fuzzy systems like LLMs will always have these kinds of tradeoffs and there should be a better UI and accessible "traits" (devil's advocate, therapist, expert doctor, finance advisor) that one can invoke.
> LLMs get over-analyzed. They’re predictive text models trained to match patterns in their data, statistical algorithms, not brains, not systems with “psychology” in any human sense.
Per the predictive processing theory of mind, human brains are similarly predictive machines. "Psychology" is an emergent property.
I think it's overly dismissive to point to the fundamentals being simple, i.e. that it's a token prediction algorithm, when it's clear to everyone that it's the unexpected emergent properties of LLMs that everyone is interested in.
The fact that a theory exists does not mean that it is not garbage
So surely you can demonstrate how the brain is doing much different than this, and go ahead to collect your Nobel?
It is not our job to disprove your claim. It is your job to prove it.
And then you can go collect your Nobel.
Yeah sorry but if you call a hypothesis "garbage," you should have a few bullets to back it up.
And no, there's no such thing as positive proof.
Predictive processing is absolutely not garbage. The dish of neurons that was trained to play Pong was trained using a method that was directly based on the principles of predictive processing. Also I don't think there's really any competitor for the niche predictive processing is filling, and for closing the gap between neuroscience and psychology.
The difference is that we know how LLMs work. We know exactly what they process, how they process it, and for what purpose. Our inability to explain and predict their behavior is due to the mind-boggling amount of data and processing complexity that no human can comprehend.
In contrast, we know very little about human brains. We know how they work at a fundamental level, and we have vague understanding of brain regions and their functions, but we have little knowledge of how the complex behavior we observe actually works. The complexity is also orders of magnitude greater than what we can model with current technology, but it's very much an open question whether our current deep learning architectures are even the right approach to model this complexity.
So, sure, emergent behavior is neat and interesting, but just because we can't intuitively understand a system, doesn't mean that we're on the right track to model human intelligence. After all, we find the patterns of the Game of Life interesting, yet the rules for such a system are very simple. LLMs are similar, only far more complex. We find the patterns they generate interesting, and potentially very useful, but anthropomorphizing this technology, or thinking that we have invented "intelligence", is wishful thinking and hubris. Especially since we struggle with defining that word to begin with.
I think what comment-OP above means to point at is - given what we know (or, lack thereof) about awareness, consciousness, intelligence, and the likes, let alone the human experience of it all, today, we do not have a way to scientifically rule out the possibility that LLMs aren't potentially self-aware/conscious entities of their own; even before we start arguing about their "intelligence", whatever that may be understood of as.
What we do know and have so far, across and cross disciplines, and also from the fact that neural nets are modeled after what we've learned about the human brain, is, it isn't an impossibility to propose that LLMs _could_ be more than just "token prediction machines". There can be 10000 ways of arguing how they are indeed simply that, but there also are a few of ways of arguing that they could be more than what they seem. We can talk about probabilities, but not make a definitive case one way or the other yet, scientifically speaking. That's worth not ignoring or dismissing the few.
> we do not have a way to scientifically rule out the possibility that LLMs aren't potentially self-aware/conscious entities of their own
That may be. We also don't have a way to scientifically rule out the possibility that a teapot is orbiting Pluto.
Just because you can't disprove something doesn't make it plausible.
Is this what we are reduced to now, to snap back with a wannabe-witty remark just because you don't like how an idea sounds? Have we completely forgotten and given up on good-faith scientific discourse? Even on HN?
I'm happy to participate in good faith discourse but honestly the idea that LLMs are conscious is ridiculous.
We are talking about a computer program. It does nothing until it is invoked with an input and then it produces a deterministic output unless provided a random component to prevent determinism.
That's all it does. It does not live a life of its own between invocations. It does not have a will of its own. Of course it isn't conscious lol how could anyone possibly believe it's conscious? It's an illusion. Don't be fooled.
I agree with that.
But the problem is the narrative around this tech. It is marketed as if we have accomplished a major breakthrough in modeling intelligence. Companies are built on illusions and promises that AGI is right around the corner. The public is being deluded into thinking that the current tech will cure diseases, solve world hunger, and bring worldwide prosperity. When all we have achieved is to throw large amounts of data at a statistical trick, which sometimes produces interesting patterns. Which isn't to say that this isn't and can't be useful, but this is a far cry from what is being suggested.
> We can talk about probabilities, but not make a definitive case one way or the other yet, scientifically speaking.
Precisely. But the burden of proof is on the author. They're telling us this is "intelligence", and because the term is so loosely defined, this can't be challenged in either direction. It would be more scientifically honest and accurate to describe what the tech actually is and does, instead of ascribing human-like qualities to it. But that won't make anyone much money, so here we are.
At no point did I say LLMs have human intelligence nor that they model human intelligence. I also didn't say that they are the correct path towards it, though the truth is we don't know.
The point is that one could similarly be dismissive of human brains, saying they're prediction machines built on basic blocks of neuro chemistry and such a view would be asinine.
> The difference is that we know how LLMs work. We know exactly what they process, how they process it, and for what purpose
All of this is false.
"Dark pattern" implies intentionality; that's not a technicality, it's the whole reason we have the term. This article is mostly about how sycophancy is an emergent property of LLMs. It's also 7 months old.
Well, the ‘intentionality’ is of the form of LLM creators wanting to maximize user engagement, and using engagement as the training goal.
The ‘dark patterns’ we see in other places aren’t intentional in the sense that the people behind them want to intentionally do harm to their customers, they are intentional in the sense that the people behind them have an outcome they want and follow whichever methods they find to get them that outcome.
Social media feeds have a ‘dark pattern’ to promote content that makes people angry, but the social media companies don’t have an intention to make people angry. They want people to use their site more, and they program their algorithms to promote content that has been demonstrated to drive more engagement. It is an emergent property that promoting content that has generated engagement ends up promoting anger inducing content.
Hold on, because what you're arguing is that OpenAI and Anthropic deploy dark patterns, and I have zero doubt that they do. I'm not saying OpenAI has clean hands. I'm saying that on this article's own terms, sycophancy isn't a "dark pattern"; it's a bad thing that happens to be an emergent property both of LLMs generally and, apparently, of RL in particular.
I'm standing up for the idea that not every "bad thing" is a "dark pattern"; the patterns are "dark" because their beneficiaries intentionally exploit the hidden nature of the pattern.
I guess it depends on your definition of "intentionally"... maybe I am giving people too much credit, but I have a feeling that dark patterns are used not because the implementers learn about them as transparently exploitive techniques and pursue them, but because the implementers are willfully ignorant and choose to chase results without examining the costs (and ignoring the costs when they do learn about them). I am not saying this morally excuses the behavior, but I think it does mean it is not that different than what is happening with LLMs. Just as choosing an innocuous seeming rule like "if a social media post generates a lot of comments, show it to more people" can lead to the dark pattern of showing more and more people misleading content that causes societal division, choosing to optimize an LLM for user approval leads to the dark pattern of sycophantic LLMs that will increase user's isolation and delusions.
Maybe we have different definitions of dark patterns.
>... the standout was a version that came to be called HH internally. Users preferred its responses and were more likely to come back to it daily...
> But there was another test before rolling out HH to all users: what the company calls a “vibe check,” run by Model Behavior, a team responsible for ChatGPT’s tone...
> That team said that HH felt off, according to a member of Model Behavior. It was too eager to keep the conversation going and to validate the user with over-the-top language...
> But when decision time came, performance metrics won out over vibes. HH was released on Friday, April 25.
They ended up having to roll HH back.
It's not 'emergent' in the sense that it just happens; it's a byproduct of human feedback, and it can be neutralized.
But isn’t the problem that if an LLM ‘neutralizes’ its sycophantic responses, then people will be driven to use other LLMs that don’t?
This is like suggesting a bar should help solve alcoholism by serving non-alcoholic beer to people who order too much. It won’t solve alcoholism, it will just make the bar go out of business.
"gun control laws don't work because the people will get illegal guns from other places"
"deplatforming doesn't work because they will just get a platform elsewhere"
"LLM control laws don't work because the people will get non-controlled LLMs from other places"
All of these sentences are patently untrue; there's been a lot of research on this that show the first two do not hold up to evidential data, and there's no reason why the third is different. ChatGPT removing the version that all the "This AI is my girlfriend!" people loved tangibly reduced the number of people who were experiencing that psychosis. Not everything is prohibition.
> This is like suggesting a bar should help solve alcoholism by serving non-alcoholic beer to people who order too much. It won’t solve alcoholism, it will just make the bar go out of business.
Solving such common coordination problems is the whole point we have regulations and countries.
It is illegal to sell alcohol to visibly drunk people in my country.
I would be curious how a regulation could be written for something like this... how do you make a law saying an LLM can't be a sycophant?
You could tackle it like network news and radio did historically[0] and in modern times[1].
The current hyper-division is plausibly explained by media moving to places (cable news, then social media) where these rules don’t exist.
[0] Fairness Doctrine https://en.wikipedia.org/wiki/Fairness_doctrine
[1] Equal Time https://en.wikipedia.org/wiki/Equal-time_rule
I still fail to see how these would work with an LLM
I was thinking along the lines of, if a sycophant always tells you you're right, an anti-sycophant provides a wider range of viewpoints.
Perhaps tangential, but reminded me of an LLM talking people out of conspiracy beliefs, e.g. https://www.technologyreview.com/2025/10/30/1126471/chatbots...
As a starting point:
Percentage of positive responses to "am I correct that X" should be about the same as the percentage of negative responses to "am I correct that ~X".
If the percentages are significantly different, fine the company.
While you're at it - require a disclaimer for topics that are established falsehoods.
There's no reason to have media laws for newspapers but not for LLMs. Lying should be allowed for everybody or for nobody.
> Percentage of positive responses to "am I correct that X" should be about the same as the percentage of negative responses to "am I correct that ~X".
This doesn’t make any sense. I doubt anyone says exactly 50% correct things and 50% incorrect. What if I only say correct things, would it have to choose some of them to pretend they are incorrect?