There are no new ideas in AI only new datasets

blog.jxmo.io

283 points by bilsbie 10 hours ago

What John Carmack is exploring is pretty revealing. Train models to play 2D video games to a superhuman level, then ask them to play a level they have not seen before or another 2D video game they have not seen before. The transfer function is negative. So, in my definition, no intelligence has been developed, only expertise in a narrow set of tasks.

It’s apparently much easier to scare the masses with visions of ASI, than to build a general intelligence that can pick up a new 2D video game faster than a human being.

vladimirralev - 5 hours ago

He is not using appropriate models for this conclusion and neither is he using state of the art models in this research and moreover he doesn't have an expensive foundational model to build upon for 2d games. It's just a fun project.
A serious attempt at video/vision would involve some probabilistic latent space that can be noised in ways that make sense for games in general. I think veo3 proves that ai can generalize 2d and even 3d games, generating a video under prompt constraints is basically playing a game. I think you could prompt veo3 to play any game for a few seconds and it will generally make sense even though it is not fine tuned.
- sigmoid10 - 4 hours ago
  
  Veo3's world model is still pretty limited. That becomes obvious very fast once you prompt out of distribution video content (i.e. stuff that you are unlikely to find on youtube). It's extremely good at creating photorealistic surfaces and lighting. It even has some reasonably solid understanding of fluid dynamics for simulating water. But for complex human behaviour (in particular certain motions) it simply lacks the training data. Although that's not really a fault of the model and I'm pretty sure there will be a way to overcome this as well. Maybe some kind of physics based simulation as supplement training data.
- altairprime - 3 hours ago
  
  Is any model currently known to succeed in the scenario that Carmack’s inappropriate model failed?
  - outofpaper - 2 hours ago
    
    No monolithic models but us ng hybrid approaches we've been able to beet humans for some time now.
    
    altairprime - an hour ago
    
    To confirm: hybrid approaches can demonstrate competence at newly-created video games within a short period of exposure, so long as similar game mechanics from other games were incorporated into their training set?
- 317070 - 3 hours ago
  
  What you're thinking of is much more like the Genie model from DeepMind [0]. That one is like Veo, but interactive (but not publically available)
  [0] https://deepmind.google/discover/blog/genie-2-a-large-scale-...
- pshc - 2 hours ago
  
  I think we need a spatial/physics model handling movement and tactics watched over by a high level strategy model (maybe an LLM).
- keerthiko - 3 hours ago
  
  > generating a video under prompt constraints is basically playing a game
  Besides static puzzles (like a maze or jigsaw) I don't believe this analogy holds? A model working with prompt constraints that aren't evolving or being added over the course of "navigating" the generation of the model's output means it needs to process 0 new information that it didn't come up with itself — playing a game is different from other generation because it's primarily about reacting to input you didn't know the precise timing/spatial details of, but can learn that they come within a known set of higher order rules. Obviously the more finite/deterministic/predictably probabilistic the video game's solution space, the more it can be inferred from the initial state, aka reduce to the same type of problem as generating a video from a prompt), which is why models are still able to play video games. But as GP pointed out, transfer function negative in such cases — the overarching rules are not predictable enough across disparate genres.
  > I think you could prompt veo3 to play any game for a few seconds
  I'm curious what your threshold for what constitutes "play any game" is in this claim? If I wrote a script that maps button combinations to average pixel color of a portion of the screen buffer, by what metric(s) would veo3 be "playing" the game more or better than that script "for a few seconds"?
  edit: removing knee-jerk reaction language
  - vladimirralev - 3 hours ago
    
    It's not ideal, but you can prompt it with an image of a game frame, explain the objects and physics in text and let it generate a few frames of gameplay as a substitute for controller input as well as what it expects as an outcome. I am not talking about real interactive gameplay.
    I am just saying we have proof that it can understand complex worlds and sets of rules, and then abide by them. It doesn't know how to use a controller and it doesn't know how to explore the game physics on its own, but those steps are much easier to implement based on how coding agents are able to iterate and explore solutions.
  - hluska - 3 hours ago
    
    [flagged]
    
    keerthiko - 3 hours ago
    
    fair, and I edited my choice of words, but if you're reading that much aggression from my initial comment (which contains topical discussion) to say what you did, you must find the internet a far more savage place than it really is :/
    
    hluska - 2 hours ago
    
    [flagged]
- troupo - 3 hours ago
  
  > I think veo3 proves that ai can generalize 2d and even 3d games
  It doesn't. And you said it yourself:
  > generating a video under prompt constraints is basically playing a game.
  No. It's neither generating a game (that people can play) nor is it playing a game (it's generating a video).
  Since it's not a model of the world in any sense of the word, there are issues with even the most basic object permanenece. E.g. here's veo3 generating a GTA-style video. Oh look, the car spins 360 and ends up on a completely different street than the one it was driving down previously: https://www.youtube.com/watch?v=ja2PVllZcsI
  - vladimirralev - 3 hours ago
    
    It is still doing a great job for a few frames, you could keep it more anchored to the state of the game if you prompt it. Much like you can prompt coding agents to keep a log of all decisions previously made. Permanenece is excellent, it slips often but it mostly because it is not grounded to specific game state by the prompt or by the decision log.
justanotherjoe - 5 hours ago

I don't get why people are so invested in framing it this way. I'm sure there are ways to do the stated objective. John Carmack isn't even an AI guy why is he suddenly the standard.
- qaq - 4 hours ago
  
  Keen includes researchers like Richard Sutton, Joseph Modayil etc. Also John has being doing it full time for almost 5 years now so given his background and aptitude for learning I would imaging by this time he is more of an AI guy then a fairly large percentage of AI PhDs.
- GuB-42 - an hour ago
  
  Who is an "AI guy"? The field as we know it is fairly new. Sure, neural nets are old hat, but a lot has happened in the last few years.
  John Carmack founded Keen technology in 2022 and has been working seriously on AI since 2019. From his experience in the video game industry, he knows a thing or two about linear algebra and GPUs, that is the underlying maths and the underlying hardware.
  So, for all intent and purposes, he is an "AI guy" now.
  - amelius - 41 minutes ago
    
    But the logic seems flawed.
    He has built an AI system that fails to do X.
    That does not mean there isn't an AI system that can do X. Especially considering that a lot is happening in AI, as you say.
    Anyway, Carmack knows a lot about optimizing computations on modern hardware. In practice, that happens to be also necessary for AI. However, it is not __sufficient__ for AI.
- surecoocoocoo - 38 minutes ago
  
  Ah some No True Scotsman
  Not sure why justanotherjoe is a credible resource on who is and isn’t expert in some new dialectic and euphemism for machine state management. You’re that nobody to me :shrug:
  Yann LeCun is an AI guy and has simplified it as “not much more than physical statistics.”
  WWhole lot of AI is decades old info theory books applied to modern computer.
  Either a mem value is or isn’t what’s expected. Either an entire matrix of values is or isn’t what’s expected. Store the results of some such rules. There’s your model.
  The words are made up and arbitrary because human existence is arbitrary. You’re being sold on a bridge to nowhere.
- varjag - 4 hours ago
  
  What in your opinion constitutes an AI guy?
- raincole - 4 hours ago
  
  Because it "confirms" what they already believe in.
- refulgentis - 4 hours ago
  
  Names >> all, and increasingly so.
  One phenomena that bared this to me, in a substantive way, was noticing an increasing # of reverent comments re: Geohot in odd places here, that are just as quickly replied to by people with a sense of how he works, as opposed to the keywords he associates himself with. But that only happens here AFAIK.
  Yapping, or, inducing people to yap about me, unfortunately, is much more salient to my expected mindshare than the work I do.
  It's getting claustrophobic intellectually, as a result.
  Example from the last week is the phrase "context engineering" - Shopify CEO says he likes it better than prompt engineering, Karpathy QTs to affirm, SimonW writes it up as fait accompli. Now I have to rework my site to not use "prompt engineering" and have a Take™ on "context engineering". Because of a couple tweets + a blog reverberating over 2-3 days.
  Nothing against Carmack, or anyone else named, at all. i.e. in the context engineering case, they're just sharing their thoughts in realtime. (i.e. I don't wanna get rolled up into a downvote brigade because it seems like I'm affirming the loose assertion Carmack is "not an AI guy", or, that it seems I'm criticizing anyone's conduct at all)
  EDIT: The context engineering example was not in reference to another post at the time of writing, now one is the top of front page.
  - dvfjsdhgfv - 3 hours ago
    
    > Now I have to rework my site to not use "prompt engineering" and have a Take™ on "context engineering". Because of a couple tweets + a blog reverberating over 2-3 days.
    The difference here is that your example shows a trivial statement and a change period of 3 days, whereas what Carmack is doing is taking years.
    
    refulgentis - 2 hours ago
    
    Right. Nothing against Carmack. Grew up on the guy. I haven't looked into, at all, into any of the disputed stuff, and should actively proclaim I'm a yuge Carmack fanboy.
- sieabahlpark - 4 hours ago
  
  [dead]
YokoZar - 6 hours ago

I wonder if this is a case of overfitting from allowing the model to grow too large, and if you might cajole it into learning more generic heuristics by putting some constraints on it.
It sounds like the "best" AI without constraint would just be something like a replay of a record speedrun rather than a smaller set of heuristics of getting through a game, though the latter is clearly much more important with unseen content.
smokel - 5 hours ago

The subject you are referring to is most likely Meta-Reinforcement Learning [1]. It is great that John Carmack is looking into this, but it is not a new field of research.
[1] https://instadeep.com/2021/10/a-simple-introduction-to-meta-...
Uehreka - 3 hours ago

These questions of whether the model is “really intelligent” or whatever might be of interest to academics theorizing about AGI, but to the vast swaths of people getting useful stuff out of LLMs, it doesn’t really matter. We don’t care if the current path leads to AGI. If the line stopped at Claude 4 I’d still keep using it.
And like I get it, it’s fun to complain about the obnoxious and irrational AGI people. But the discussion about how people are using these things in their everyday lives is way more interesting.
bthornbury - an hour ago

This generalization issue in RL in specific was detailed by OpenAI in 2018
https://arxiv.org/pdf/1804.03720
fullshark - 4 hours ago

Just sounds like an example of overfitting. This is all machine learning at its root.
ferguess_k - 6 hours ago

Can you please explain "the transfer function is negative"?
I'm wondering whether one has tested with the same model but on two situations:
1) Bring it to superhuman level in game A and then present game B, which is similar to A, to it.
2) Present B to it without presenting A.
If 1) is not significantly better than 2) then maybe it is not carrying much "knowledge", or maybe we simply did not program it correctly.
- tough - 6 hours ago
  
  I think the problem is we train models to pattern match, not to learn or reason about world models
  - singron - 5 hours ago
    
    I think this is clearly a case of over fitting and failure to generalize, which are really well understood concepts. We don't have to philosophize about what pattern matching really means.
  - NBJack - 6 hours ago
    
    In other words, they learn the game, not how to play games.
    
    fsmv - 5 hours ago
    
    They memorize the answers not the process to arrive at answers
    
    EternalFury - 3 hours ago
    
    They learn the value of specific actions in specific contexts based on the rewards they received during their play time. Specific actions and specific contexts are not transferable for various reasons. John quoted that varying frame rates and variable latency between action and effect really confuse the models.
    
    nightpool - 3 hours ago
    
    Okay, so fuzz the frame rate and latency? That feels very easy to fix.
    
    wredcoll - 16 minutes ago
    
    Good point, you should write to John Carmack and let him know you've figured out the problem.
    
    IshKebab - 5 hours ago
    
    This has been disproven so many times... They clearly do both. You can trivially prove this yourself.
    
    0xWTF - 5 hours ago
    
    > You can trivially prove this yourself.
    Given the long list of dead philosophers of mind, if you have a trivial proof, would you mind providing a link?
    
    pdabbadabba - 4 hours ago
    
    It’s really easy: go to Claude and ask it a novel question. It will generally reason its way to a perfectly good answer even if there is no direct example of it in the training data.
    
    keerthiko - 3 hours ago
    
    When LLM's come up with answers to questions that aren't directly exampled in the training data, that's not proof at all that it reasoned its way there — it can very much still be pattern matching without insight from the actual code execution of the answer generation.
    If we were taking a walk and you asked me for an explanation for a mathematical concept I have not actually studied, I am fully capable of hazarding a casual guess based on the other topics I have studied within seconds. This is the default approach of an LLM, except with much greater breadth and recall of studied topics than I, as a human, have.
    This would be very different than if we sat down at a library and I applied the various concepts and theorems I already knew to make inferences, built upon them, and then derived an understanding based on reasoning of the steps I took (often after backtracking from several reasoning dead ends) before providing the explanation.
    If you ask an LLM to explain their reasoning, it's unclear whether it just guessed the explanation and reasoning too, or if that was actually the set of steps it took to get to the first answer they gave you. This is why LLMs are able to correct themselves after claiming strawberry has 2 rs, but when providing (guessing again) their explanations they make more "relevant" guesses.