Measuring progress toward AGI: A cognitive framework

137 points by surprisetalk a day ago

When people imagined AI/AGI, they imagined something that can reason like we can, except at the speed of a computer, which we always envisioned would lead to the singularity. In a short period of time, AI would be so far ahead of us and our existing ideas, that the world would become unrecognizable.

That's not what's happening here, and it's worth remembering: A caveman from 200K years ago would have been just as intelligent as any of us here today, despite not having language or technology, or any knowledge.

In Carolyn Porco's words: "These beings, with soaring imagination, eventually flung themselves and their machines into interplanetary space."

When you think of it that way, it should be obvious that LLMs are not AGI. And that's OK! They're a remarkable piece of technology anyway! It turns out that LLMs are actually good enough for a lot of use cases that would otherwise have required human intelligence.

And I echo ArekDymalski's sentiment that it's good to have benchmarks to structure the discussions around the "intelligence level" of LLMs. That _is_ useful, and the more progress we make, the better. But we're not on the way to AGI.

onlyrealcuzzo - a day ago

The amount of things LLMs can do is insane.
It's interesting to me how much effort the AI companies (and bloggers) put into claiming they can do things they can't, when there's almost an unlimited list of things they actually can do.
- gtowey - 19 hours ago
  
  Only because they have compressed and encoded the entire sum of human knowledge at their disposal. There are models for everything in there, but they can only do what has been done before.
  What's more amazing to me is the average human, only able to hold a relatively small body on knowledge in their mind, can generate things that are completely novel.
  - sharemywin - 7 hours ago
    
    People assume training on past data means no novelty, but novelty comes from recombination. No one has written your exact function, with your exact inputs, constraints, and edge cases, yet an LLM can generate a working version from a prompt. That’s new output. The real limitation isn’t novelty, it’s grounding.
    
    deadbabe - 3 hours ago
    
    That’s only part of it. The true intelligence is knowing what to recombine, and this is where AI falls short.
  - cheevly - 19 hours ago
    
    I hear this constantly. Can you produce something novel, right here, demonstrably, that an LLM couldnt have produced? Nobody ever can, but it’s sure easy to claim.
    
    Procrastes - 18 hours ago
    
    I'm going to assume you mean this seriously, so I will answer with that in mind.
    Yes, I can. - I can build an unusual, but functional piece of furniture, not describe it, not design it. I can create a chair I can sit on it. An LLM is just an algorithm. I am a physically embodied intelligence in a physical world.
    - I can write a good piece of fiction. LLMs have not demonstrated the ability to do that yet. They can write something similar, but it fails on multiple levels if you've been reading any of the most recent examples.
    - I can produce a viable natural intelligence capable of doing anything human beings do (with a couple of decades of care, and training, and love). One of the perks of being a living organism, but that is an intrinsic part of what I am.
    - I can have a novel thought, a feeling, based on qualia that arise from a system of hormones, physics, complex actions and inhibitors, outrageously diverse senses, memories, quirks. Few of which we've even begun to understand let alone simulate.
    - And yes I can both count the 'r's in strawberry, and make you feel a reflection of the joy I feel when my granddaughter's eyes shine when she eats a fresh strawberry, and I think how close we came to losing her one night when someone put 90 rounds through the house next door, just a few feet from where her and her mother were sleeping.
    So yeah, I'm sure I can create things an LLM can't.
    
    motoxpro - 12 hours ago
    
    So the only thing I am seeing here is physical or personal (I have no idea how you feel or what your emotions are. You are a black box just as an LLM is a black box.)
    The only thing you mentioned is the fan fic and I would happily take the bet that an LLM could win out against a skilled person based on a blind vote.
    
    Cyph0n - 19 hours ago
    
    Me personally? No. Us collectively? Absolutely.
    Was an individual mind responsible for us as humanity landing on the moon? No. Could an individual mind have achieved this feat? Also no.
    Put differently, we should be comparing the compressed blob of human knowledge against humanity as a collective rather than as individuals.
    Of course, if my individual mind could be scaled such that it could learn and retain all of human knowledge in a few years, then sure, that would be a fair comparison.
    
    11101010010001 - 18 hours ago
    
    take my pound of flesh.
    
    possibleworlds - 13 hours ago
    
    Highlife music
    
    UncleMeat - 18 hours ago
    
    I want to see an LLM create an entirely novel genre of music that synthesizes influences from many different other genres and then spreads that genre to other musicians. None of this insulated crap. Actual cultural spread of novel ideas.
    
    - 12 hours ago
    
    [deleted]
    
    pylua - 19 hours ago
    
    [dead]
- lich_king - 18 hours ago
  
  Because most of these things are not multi-trillion-dollar ideas. "We found a way to make illustrators, copyeditors, and paralegals, and several dozen other professions, somewhat obsolete" in no way justifies the valuations of OpenAI or Nvidia.
  - jazz9k - 17 hours ago
    
    Perhaps not. But I find myself using LLMs instead ofba search engine like Google.
    This does have value.
    
    IAmGraydon - 16 hours ago
    
    To you, yes, but the compute to return that search costs them far more than a simple search query and on top of that it's hard to monetize.
    
    faangguyindia - 12 hours ago
    
    It doesn't, most of research is cached and most of the inference which is returned is also cached unless you are always asking unique things
    
    IAmGraydon - 5 hours ago
    
    This is literally the first time I've heard this. What is your source? I can type the exact same query three times and though the general meaning may be the same, the actual output is unique every single time. How do you explain this if it's cached?
  - IAmGraydon - 16 hours ago
    
    >Because most of these things are not multi-trillion-dollar ideas.
    That's right, but there's more. When you think about the cost of compute and power for these LLM companies, they have no choice. It MUST be a multi-trillion-dollar idea or it's completely uninvestable. That's the only way they can sucker more and more money into this scheme.
  - reverius42 - 10 hours ago
    
    I don't know about OpenAI, but Nvidia's valuation seems more justifiable based on their actually known revenue and profit, and because it's publicly traded.
    Though if the bubble(?) bursts and Nvidia starts selling fewer units year-over-year, that could be problematic.
- imtringued - a day ago
  
  This reminds me of "Devin". You know, the first "AI software engineer", which had the hype of the day but turned into a huge flop.
  They had ridiculous demos of Devin e.g. working as a freelancer and supposedly earning money from it.
  - roncesvalles - 21 hours ago
    
    We're waaay past the era when getting funded meant your idea had any promise at all.
  - mlmonkey - a day ago
    
    It looks like the company (Cognition) is actively hiring (20+ job openings last I checked). That doesn't sound like a "flop" to me...
    
    skeeter2020 - a day ago
    
    Think about: why would they be hiring actual human beings if Devin actually works? Seems like the purest example of "dogfooding"...
    
    jorvi - 21 hours ago
    
    This generally just keeps being the "the Emperor has no clothes" moment for all these AI bull companies.
    Microsoft just replaced their native Windows Copilot application with an Electron one. Highly ironic.
    Obviously the native version should run much faster and will use less memory. If Copilot (via either GPT or Claude) is so godlike at either agentic or guided coding, why didn't they just improve or rewrite the native Copilot application to be blazing fast, with all known bugs fixed?
    
    notTooFarGone - a day ago
    
    When you think about it, every job opening is a flop in that sense.
    
    paxys - 16 hours ago
    
    WeWork had 12,500 employees at its peak.
- SkyPuncher - 19 hours ago
  
  I've been pushing Opus pretty hard on my personal projects. While repeatability is very hard to do, I'm seeing glimpses of Opus being well beyond human capabilities.
  I'm increasingly convinced that the core mechanism of AGI is already here. We just need to figure out how to tie it together.
  - edgyquant - 19 hours ago
    
    Can you give an example of something beyond the human level you’ve experienced?
    
    cheevly - 19 hours ago
    
    Generating 3000 lines of esoteric rendering code within minutes, to raster generative graphics of anything you can imagine and it just works? From natural language instructions. Seriously think about that my dude.
    
    edgyquant - 18 hours ago
    
    That is amazing but this specific example doesn’t seem all that different from what a compiler does just another level of abstraction higher
    
    lifeformed - 18 hours ago
    
    But that's not what AGI is. Restructuring data the way they do is very impressive but it's fundamentally different from novel creativity.
- beeflet - a day ago
  
  And many of them so unexpected, given the unusual nature of their intellegence emerging from language prediction. They excel wherever you need to digest or produce massive amounts of text. They can synthesize some pretty impressive solutions from pre-existing stuff. Hell, I use it like a thesaurus to sus out words or phrases that are new or on the tip of my tounge. They have a great hold on the general corpus of information, much better than any search engine (even before the internet was cluttered with their output). It's much easier to find concrete words for what you're looking for through an indirect search via an LLM. The fact that, say, a 32GB model seemingly holds approximate knowlege of everything implies some unexplored relationship between inteligence and compression.
  What they can't they do? Pretty much anything reliably or unsupervised. But then again, who can?
  They also tend to fail creatively, given their synthesize existing ideas. And with things involving physical intuition. And tasks involving meta-knowlege of their tokens (like asking them how long a given word is). And they tend to yap too much for my liking (perhaps this could be fixed with an additional thinking stage to increase terseness before reporting to the user)
  - saalweachter - a day ago
    
    My current way of thinking about LLMs is "an echo of human intelligence embedded in language".
    It's kind of like in those sci fi or fantasy stories where someone dies and what's left behind as a ghost in the ether or the machine isn't actually them; it's just an echo, an shallow, incomplete copy.
    
    Lwerewolf - a day ago
    
    Just dust and echoes.
    (:
    
    cgg23 - 19 hours ago
    
    Residue ;)
  - mikestorrent - 16 hours ago
    
    > some unexplored relationship between inteligence and compression.
    I don't think it's unexplored at all, this is basically what information theory is all about. At some level, it becomes incompressible....
- SecretDreams - 20 hours ago
  
  The hype has gotta keep going or the money will dry up. And hype can be quantified by velocity and acceleration, rather than distance. They need to keep the innovation accelerating, or the money stops. This is of course completely unreasonable, but also why the odd claims keep happening.
  - mikestorrent - 16 hours ago
    
    Why would the money dry up when we have companies willing to spend $1000/developer/month on AI tooling when they would have balked at $5/u/mo for some basic tooling 2-3 years ago?
    
    gls2ro - 15 hours ago
    
    First in some cases it is more than $1000/dev/month.
    Those companies spending 1000+/developer are doing it with the same hope that at some point those $1000/month will replace the developer salary per month. Or because by doing so more investors will put more money into them.
    Take away the promise of AI replacing developers and see how much a company is willing to pay for LLMs. It is not zero as there are very good cases for coding assisted by LLM or agentic engineering.
- NooneAtAll3 - a day ago
  
  for example?
  - boca_honey - a day ago
    
    Claiming they can be reliable lawyers.[1]
    Claiming they can give safe, regulated financial advice. [2]
    Claiming you can put your whole operation on autopilot with minimal oversight and no negative consequences. [3]
    [1] https://www.ftc.gov/news-events/news/press-releases/2024/09/...
    [2] https://www.businessinsider.com/generative-ai-exaggeration-o...
    [3] https://www.answerconnect.com/blog/business-tips/ai-customer...
    
    orphea - a day ago
    
    Claiming they will replace software engineers in 6-12 months, every 6 months [4]
    [4] https://finance.yahoo.com/news/anthropic-ceo-predicts-ai-mod...
    
    NooneAtAll3 - a day ago
    
    so you're saying Ai can do all those things?
    or that you can't read that GP was talking about what Ai CAN do?
    
    jacquesm - 20 hours ago
    
    Medical...
  - next_xibalba - a day ago
    
    Well, for starters, they definitively passed the Turing test a few years ago. The fact that many regard them as equivalent in skill to a junior dev is also, IMO, the stuff of science fiction.
    
    - 9 hours ago
    
    [deleted]
    
    NooneAtAll3 - a day ago
    
    they passed, sure
    how do you market that as a product that is needed by other people?
    there are already companies that advertise Ai date partners, Ai therapists and Ai friends - and that gets a lot of flame about being manipulative and harmful
imetatroll - a day ago

This is a bit of an anti-evolutionary perspective. At some point in our past, we were something much less intelligent than we are now. Our intelligence didn't spring out of thin air. Whether or not AI can evolve is yet to be seen I think.
- inerte - 20 hours ago
  
  Sure, but then basically whatever it was, it was not "us". "Us" and our intelligence had to appear at some point. It's 100% not "anti-evolutionary" to say some years ago humans became as mentally capable as a baby born today. We just have to figure out how many years ago that was. It wasn't last decade. As far as I know most anthropologists agree it was around ~70k years ago (not 200k).
- suddenlybananas - a day ago
  
  LLMs are not AGI, something else may be in the future. Acknowledging this has nothing to do with evolution.
- - a day ago
  
  [deleted]
- freeboon - a day ago
  
  I could gather that you disagreed with GP, but I don't see a salient point in your response? You are ostensibly challenging GP on the idea that a homo sapien baby from 200,000 years ago would have been capable of modern mental feats if raised in the present day.
  > This is a bit of an anti-evolutionary perspective.
  Nice, seems like you have something meaningful to add.
  > At some point in our past, we were something much less intelligent than we are now.
  I agree with this, but "at some point in our past"? Is that the essence of this rebuttal?
  > Our intelligence didn't spring out of thin air.
  Again, I could not tell what this means, nor do I see the relevance.
  > Whether or not AI can evolve is yet to be seen I think.
  The OP is very pointedly talking about LLMs. Is that what you mean to reference here with "AI"?
  I implore you to contribute more meaningfully. Especially when leading with statements like "This is a bit of an anti-evolutionary perspective", you ought to elaborate on them. However, your username suggests maybe you are just trolling?
  - sinenomine - 19 hours ago
    
    If you think you are equipped to discuss the topic of evolution of general intelligence in homo, and you haven't read about GWAS and EDU PGS, then at this point you are either a naive layman, or a convinced discourse commando.
    Because it is really hard and hopeless endeavor to make an objective case that the current human populations have similar PGS scores on key mental traits and diseases compared to 200k years ago.
    
    greazy - 10 hours ago
    
    The myth that humans remain unchanged for 200k years is forever parroted as truth.
    What is the origin of this silly myth? Its come from either anatomical similarity of fossils to modern day human or a comparison to modern (5k ago) humans being conflated with 200k humans
    > convinced discourse commando.
    What is a convinced discourse commando?
reverius42 - 10 hours ago

> A caveman from 200K years ago would have been just as intelligent as any of us here today, despite not having language or technology, or any knowledge.
Source? This does not sound possibly true to me (by any common way we might measure intelligence).
- mxkopy - 10 hours ago
  
  The phrase you’re looking for is “anatomically modern human”, which has been around for 200,000 years: https://en.wikipedia.org/wiki/Early_modern_human
mhl47 - a day ago

How do you arrive at the statement that a cavemen would have the same intelligence as a human today? Intelligence is surely not usually defined as the cognitive potential at birth but as the current capability. And the knowledge an average human has today through education surely factors into that.
- technothrasher - a day ago
  
  Your attempt to commingle intelligence and knowledge is not needed to support your initial question. The original statement that a caveman 200K years ago would have the same intelligence as a modern human was blankly asserted without any supporting evidence, and so it is valid to simply question the claim. You do not need to give a counterclaim, as that is unnecessarily shifting the burden of proof.
- Peritract - a day ago
  
  Knowledge is a thing you can use intelligence on, but not a component of intelligence itself.
  - mhl47 - a day ago
    
    The knowledge that everything is made out of atoms/molecules however makes it much easier to reason about your environment. And based on this knowledge you also learn algorithms, how to solve problems etc. I dont think its possible to completely separate knowledge from intelligence.
    
    Jensson - a day ago
    
    But an intelligent being could learn that, do you think they become more intelligent if you tell them things are made out of atoms? To me the answer is very simple, no they don't become more intelligent.
    
    snaking0776 - 21 hours ago
    
    There’s a lot of research out there about the general flexibility of the brain to adapt to whatever stimulus you pump into it. For example taxi cab drivers have larger areas in their hippocampus dedicated to place cells relative to the general population [1]. There’s also all kinds of work studying general flexibility of the brain in response to novel stimulus like the visual cortex of blind people being dedicated to auditory processing [2 is a broad review]. I guess you could argue that the ability to be flexible is intelligence but the timescales over which a brain functionally changes is longer than a general day to day flexibility. Maybe some brains come into an initial state that’s more predisposed to the set of properties that we deem as “intelligence” but development is so stimulus dependent that I think this definition of a fixed intelligence is functionally meaningless. There are definitely differences in what you can learn as you age but anyone stating we have any causal measure of innate intelligence is claiming far more than we actually have evidence for. We have far more evidence to suggest that we can train at least the appearance and usage of “intelligence”. After all no one is born capable of formal logical reasoning and it must be taught [3,4 kind of weak citations foe this claim but there’s a lot to suggest this that I don’t feel like digging up]
    [1] https://pubmed.ncbi.nlm.nih.gov/17024677/ [2] https://www.annualreviews.org/content/journals/10.1146/annur... [3] https://psychologyfor.com/wason-selection-task-what-it-is-an... [4] https://www.tandfonline.com/doi/full/10.1080/14794802.2021.1...
    
    Bewelge - a day ago
    
    Would you also say that you cannot "train" intelligence?
    I would agree that generally, purely acquiring knowledge does not increase intelligence. But I would also argue that intelligence (ie your raw "processing power") can be trained, a bit like a muscle. And acquiring and processing new knowledge is one of the main ways we train that "muscle".
    There's lots of examples where your definition of intelligence (intelligence == raw processing power) either doesn't make sense, or is so narrow that it becomes a meaningless concept. Let's consider feral children (ie humans growing up among animals with no human contact). Apparently they are unable or have trouble learning a human language. There's a theory that there's a critical period after which we are unable to learn certain things. Wouldn't the "ability to learn a language" be considered intelligence? Would you therefore consider a young child more intelligent than any adult?
    And to answer your question, whether learning about atoms makes you more intelligent: Yes, probably. It will create some kind of connections in your brain that didn't exist before. It's a piece of knowledge that can be drawn upon for all of your thinking and it's a piece of knowledge that most humans would not figure out on their own. By basically any sensible definition of intelligence, yes it does improve your intelligence.
    
    Gud - a day ago
    
    Yes, intelligence can be influenced by training(and other things).
  - paganel - a day ago
    
    Separating knowledge from intelligence is not a given.
    
    Jensson - a day ago
    
    You can give an intelligent being knowledge but you can't give a book intelligence. So I think its easy to separate knowledge from intelligence.
    
    cess11 - a day ago
    
    The claim that books know things seems suspicious to me. I consider the act of knowing to be embodied, it is something a person has learned to do and has control over.
    Is that how you approach PDF files? Do you feel it in your bones that these flows of bytes are knowing?
    
    Jensson - a day ago
    
    > The claim that books know things seems suspicious to me
    I didn't say the book knows things, but everyone can agree that books has knowledge in them. Hence something possessing knowledge doesn't make it intelligent.
    For example, when ancient libraries were burnt those civilizations lost a lot of knowledge. Those books possessed knowledge, it isn't a hard concept to understand. Those civilizations didn't lose intelligence, the smart humans were still there, they just lost knowledge.
    
    TeMPOraL - a day ago
    
    Would you consider taking a dump and then butchering an animal and then eating without washing your hands first, to be an issue of intelligence or knowledge?
    The whole thing about washing hands comes from (some approximation of) germ theory of illness, and in practice, it actually just boils down to stories of other people practicing hygiene. So if one's answer here isn't "knowledge", it needs some serious justification.
    Expanding that: can you think of things that are "intelligence" that cannot be reduced like this to knowledge (or combination of knowledge + social expectations)?
    I think in some sense, separating knowledge and intelligence is as dumb a confusion of ideas as separating "code" and "data" (doesn't stop half the industry from believing them to be distinct thing). But I'm willing to agree that hardware-wise, humans today and those from 10 000 years ago, are roughly the same, so if you teleported an infant from 8000 BC to this day, they'd learn to function in our times without a problem. Adults are another thing, brains aren't CPUs, the distinction between software and hardware isn't as clear in vivo as it is in silico, due to properties of the computational medium.
    
    NooneAtAll3 - a day ago
    
    hygiene is set of rules that one learns - it is knowledge
    your brain hearing, comprehending and following those rules - that is intelligence
    why do you keep confusing CPU speed/isa and contents of SSD? and arguing that it's the same thing?