Reasoning models don't always say what they think
anthropic.com392 points by meetpateltech a day ago
392 points by meetpateltech a day ago
The fact that it was ever seriously entertained that a "chain of thought" was giving some kind of insight into the internal processes of an LLM bespeaks the lack of rigor in this field. The words that are coming out of the model are generated to optimize for RLHF and closeness to the training data, that's it! They aren't references to internal concepts, the model is not aware that it's doing anything so how could it "explain itself"?
CoT improves results, sure. And part of that is probably because you are telling the LLM to add more things to the context window, which increases the potential of resolving some syllogism in the training data: One inference cycle tells you that "man" has something to do with "mortal" and "Socrates" has something to do with "man", but two cycles will spit those both into the context window and lets you get statistically closer to "Socrates" having something to do with "mortal". But given that the training/RLHF for CoT revolves around generating long chains of human-readable "steps", it can't really be explanatory for a process which is essentially statistical.
>internal concepts, the model is not aware that it's doing anything so how could it "explain itself"
This in a nutshell is why I hate that all this stuff is being labeled as AI. Its advanced machine learning (another term that also feels inaccurate but I concede is at least closer to whats happening conceptually)
Really, LLMs and the like still lack any model of intelligence. Its, in the most basic of terms, algorithmic pattern matching mixed with statistical likelihoods of success.
And that can get things really really far. There are entire businesses built on doing that kind of work (particularly in finance) with very high accuracy and usefulness, but its not AI.
While I agree that LLMs are hardly sapient, it's very hard to make this argument without being able to pinpoint what a model of intelligence actually is.
"Human brains lack any model of intelligence. It's just neurons firing in complicated patterns in response to inputs based on what statistically leads to reproductive success"
What's wrong with just calling them smart algorithmic models?
Being smart allows somewhat to be wrong, as long as that leads to a satisfying solution. Being intelligent on the other hand requires foundational correctness in concepts that aren't even defined yet.
EDIT: I also somewhat like the term imperative knowledge (models) [0]
The problem with "smart" is that they fail at things that dumb people succeed at. They have ludicrous levels of knowledge and a jaw dropping ability to connect pieces while missing what's right in front of them.
The gap makes me uncomfortable with the implications of the word "smart". It is orthogonal to that.
>they fail at things that dumb people succeed at
Funnily enough, you can also observe that in humans. The number of times I have observed people from highly intellectual, high income/academic families struggle with simple tasks that even the dumbest people do with ease is staggering. If you're not trained for something and suddenly confronted with it for the first time, you will also in all likelihood fail. "Smart" is just as ill-defined as any other clumsy approach to define intelligence.
That's not at all on par with what I'm saying.
There exists a generally accepted baseline definition for what crosses the threshold of intelligent behavior. We shouldn't seek to muddy this.
EDIT: Generally its accepted that a core trait of intelligence is an agent’s ability to achieve goals in a wide range of environments. This means you must be able to generalize, which in turn allows intelligent beings to react to new environments and contexts without previous experience or input.
Nothing I'm aware of on the market can do this. LLMs are great at statistically inferring things, but they can't generalize which means they lack reasoning. They also lack the ability to seek new information without prompting.
The fact that all LLMs boil down to (relatively) simple mathematics should be enough to prove the point as well. It lacks spontaneous reasoning, which is why the ability to generalize is key
"There exists a generally accepted baseline definition for what crosses the threshold of intelligent behavior" not really. The whole point they are trying to make is that the capability of these models IS ALREADY muddying the definition of intelligence. We can't really test it because the distribution its learned is so vast. Hence why he have things like ARC now.
Even if its just gradient descent based distribution learning and there is no "internal system" (whatever you think that should look like) to support learning the distribution, the question is if that is more than what we are doing or if we are starting to replicate our own mechanisms of learning.
Peoples’ memories are so short. Ten years ago the “well accepted definition of intelligence” was whether something could pass the Turing test. Now that goalpost has been completely blown out of the water and people are scrabbling to come up with a new one that precludes LLMs.
A useful definition of intelligence needs to be measurable, based on inputs/outputs, not internal state. Otherwise you run the risk of dictating how you think intelligence should manifest, rather than what it actually is. The former is a prescription, only the latter is a true definition.
I frequently see this characterization and can't agree with it. If I say "well I suppose you'd at least need to do A to qualify" and then later say "huh I guess A wasn't sufficient, looks like you'll also need B" that is not shifting the goalposts.
At worst it's an incomplete and ad hoc specification.
More realistically it was never more than an educated guess to begin with, about something that didn't exist at the time, still doesn't appear to exist, is highly subjective, lacks a single broadly accepted rigorous definition to this very day, and ultimately boils down to "I'll know it when I see it".
I'll know it when I see it, and I still haven't seen it. QED
> If I say "well I suppose you'd at least need to do A to qualify" and then later say "huh I guess A wasn't sufficient, looks like you'll also need B" that is not shifting the goalposts.
I dunno, that seems like a pretty good distillation of what moving the goalposts is.
> I’ll know it when I see it, and I haven’t seen it. QED
While pithily put, thats not a compelling argument. You feel that LLMs are not intelligent. I feel that they may be intelligent. Without a decent definition of what intelligence is, the entire argument is silly.
Shifting goalposts usually (at least in my understanding) refers to changing something without valid justification that was explicitly set in a previous step (subjective wording I realize - this is off the top of my head). In an adversarial context it would be someone attempting to gain an advantage by subtly changing a premise in order to manipulate the conclusion.
An incomplete list, in contrast, is not a full set of goalposts. It is more akin to a declared lower bound.
I also don't think it to applies to the case where the parties are made aware of a change in circumstances and update their views accordingly.
> You feel that LLMs are not intelligent. I feel that they may be intelligent.
Weirdly enough I almost agree with you. LLMs have certainly challenged my notion of what intelligence is. At this point I think it's more a discussion of what sorts of things people are referring to when they use that word and if we can figure out an objective description that distinguishes those things from everything else.
> Without a decent definition of what intelligence is, the entire argument is silly.
I completely agree. My only objection is to the notion that goalposts have been shifted since in my view they were never established in the first place.
> I dunno, that seems like a pretty good distillation of what moving the goalposts is.
Only if you don't understand what "the goalposts" means. The goalpost isn't "pass the turing test", the goalpost is "manage to do all the same kind of intellectual tasks that humans are", nobody has moved that since the start in the quest for AI.
LLM’s can’t pass an unrestricted Touring test. LLM’s can mimic intelligence, but if you actually try and exploit their limitations the deception is still trivial to unmask.
Various chat bots have long been able to pass more limited versions of a Touring test. The most extreme constraint allows for simply replaying a canned conversation which with a helpful human assistant makes it indistinguishable from a human. But exploiting limitations on a testing format doesn’t have anything to do with testing for intelligence.
I’ve realized while reading these comments my opinions on LLMs being intelligent has significantly increased. Rather than argue any specific test, I believe no one can come up with a text-based intelligence test that 90% of literate adults can pass but the top LLMs fail.
This would mean there’s no definition of intelligence you could tie to a test where humans would be intelligent but LLMs wouldn’t.
A maybe more palatable idea is that having “intelligence” as a binary is insufficient. I think it’s more of an extremely skewed distribution. With how humans are above the rest, you didn’t have to nail the cutoff point to get us on one side and everything else on the other. Maybe chimpanzees and dolphins slip in. But now, the LLMs are much closer to humans. That line is harder to draw. Actually not possible to draw it so people are on one side and LLMs on the other.
Why presuppose that it's possible to test intelligence via text? Most humans have been illiterate for most of human history.
I don't mean to claim that it isn't possible, just that I'm not clear why we should assume that it is or that there would be an obvious way of going about it.
Seems pretty reasonable to presuppose this when you filter to people who are literate. That’s darn near a definition of literate, that you can engage with the text intelligently.