Yann LeCun raises $1B to build AI that understands the physical world
wired.com459 points by helloplanets a day ago
459 points by helloplanets a day ago
https://web.archive.org/web/20260310153721/https://www.wired...
https://www.ft.com/content/e5245ec3-1a58-4eff-ab58-480b6259a... (https://archive.md/5eZWq)
Justifiable. There are a lot more degrees of freedom in world models. LLMs are fundamentally capped because they only learn from static text -- human communications about the world -- rather than from the world itself, which is why they can remix existing ideas but find it all but impossible to produce genuinely novel discoveries or inventions. A well-funded and well-run startup building physical world models (grounded in spatiotemporal understanding, not just language patterns) would be attacking what I see as the actual bottleneck to AGI. Even if they succeed only partially, they may unlock the kind of generalization and creative spark that current LLMs structurally can't reach. I don't understand this view. How I see it the fundamental bottleneck to AGI is continual learning and backpropagation. Models today are static, and human brains don't learn or adapt themselves with anything close to backpropagation. World models don't solve any of these problems; they are fundamentally the same kind of deep learning architectures we are used to work with. Heck, if you think learning from the world itself is the bottleneck, you can just put a vision-action LLM on a reinforcement learning loop in a robotic/simulated body. > I don't understand this view. How I see it the fundamental bottleneck to AGI is continual learning and backpropagation. Models today are static, and human brains don't learn or adapt themselves with anything close to backpropagation. Even with continuous backpropagation and "learning", enriching the training data, so called online-learning, the limitations will not disappear. The LLMs will not be able to conclude things about the world based on fact and deduction. They only consider what is likely from their training data. They will not foresee/anticipate events, that are unlikely or non-existent in their training data, but are bound to happen due to real world circumstances. They are not intelligent in that way. Whether humans always apply that much effort to conclude these things is another question. The point is, that humans fundamentally are capable of doing that, while LLMs are structurally not. The problems are structural/architectural. I think it will take another 2-3 major leaps in architectures, before these AI models reach human level general intelligence, if they ever reach it. So far they can "merely" often "fake it" when things are statistically common in their training data. Humans are notoriously bad at formal logic. The Wason selection task is the classic example: most people fail a simple conditional reasoning problem unless it’s dressed up in familiar social context, like catching cheaters. That looks a lot more like pattern matching than rule application. Kahneman’s whole framework points the same direction. Most of what people call “reasoning” is fast, associative, pattern-based. The slow, deliberate, step-by-step stuff is effortful and error-prone, and people avoid it when they can. And even when they do engage it, they’re often confabulating a logical-sounding justification for a conclusion they already reached by other means. So maybe the honest answer is: the gap between what LLMs do and what most humans do most of the time might be smaller than people assume. The story that humans have access to some pure deductive engine and LLMs are just faking it with statistics might be flattering to humans more than it’s accurate. Where I’d still flag a possible difference is something like adaptability. A person can learn a totally new formal system and start applying its rules, even if clumsily. Whether LLMs can genuinely do that outside their training distribution or just interpolate convincingly is still an open question. But then again, how often do humans actually reason outside their own “training distribution”? Most human insight happens within well-practiced domains. > The Wason selection task is the classic example: most people fail a simple conditional reasoning problem unless it’s dressed up in familiar social context, like catching cheaters. I've never heard about the Wason selection task, looked it up, and could tell the right answer right away. But I can also tell you why: because I have some familiarity with formal logic and can, in your words, pattern-match the gotcha that "if x then y" is distinct from "if not x then not y". In contrast to you, this doesn't make me believe that people are bad at logic or don't really think. It tells me that people are unfamiliar with "gotcha" formalities introduced by logicians that don't match the everyday use of language. If you added a simple additional to the problem, such as "Note that in this context, 'if' only means that...", most people would almost certainly answer it correctly. Mind you, I'm not arguing that human thinking is necessarily more profound from what what LLMs could ever do. However, judging from the output, LLMs have a tenuous grasp on reality, so I don't think that reductionist arguments along the lines of "humans are just as dumb" are fair. There's a difference that we don't really know how to overcome. As they say, "think about how smart the average person is, then realize half the population is below that". There are far more haikus than opuses walking this planet. We keep benchmarking models against the best humans and the best human institutions - then when someone points out that swarms, branching, or scale could close the gap, we dismiss it as "cheating". But that framing smuggles in an assumption that intelligence only counts if it works the way ours does. Nobody calls a calculator a cheat for not understanding multiplication - it just multiplies better than you, and that's what matters. LLMs are a different shape of intelligence. Superhuman on some axes, subpar on others. The interesting question isn't "can they replicate every aspect of human cognition" - it's whether the axes they're strong on are sufficient to produce better than human outcomes in domains that matter. Calculators settled that question for arithmetic. LLMs are settling it for an increasingly wide range of cognitive work. The fact that neither can flip a burger is irrelevant. Humans don't have a monopoly on intelligence. We just had a monopoly on generality and that moat is shrinking fast. The "God of the gaps" theory is a theological and philosophical viewpoint where gaps in scientific knowledge are cited as evidence for the existence and direct intervention of a divine creator. It asserts that phenomena currently unexplained by science—such as the origin of life or consciousness—are caused by God. We are doing inversion of God of gaps to "LLM of Gaps" where gaps in LLM capabilities are considered inherently negative and limiting Quoting the Wikipedia article's formulation of the task for clarity: > You are shown a set of four cards placed on a table, each of which has a number on one side and a color on the other. The visible faces of the cards show 3, 8, blue and red. Which card(s) must you turn over in order to test that if a card shows an even number on one face, then its opposite face is blue? Confusion over the meaning of 'if' can only explain why people select the Blue card; it can't explain why people fail to select the Red card. If 'if' meant 'if and only if', then it would still be necessary to check that the Red card didn't have an even number. But according to Wason[0], "only a minority" of participants select (the study's equivalent of) the Red card. [0] https://web.mit.edu/curhan/www/docs/Articles/biases/20_Quart... > If you added a simple additional to the problem, such as "Note that in this context, 'if' only means that...", most people would almost certainly answer it correctly. Agreed. More broadly, classical logic isn't the only logic out there. Many logics will differ on the meaning of implication if x then y. There's multiple ways for x to imply y, and those additional meanings do show up in natural language all the time, and we actually do have logical systems to describe them, they are just lesser known. Mapping natural language into logic often requires a context that lies outside the words that were written or spoken. We need to represent into formulas what people actually meant, rather than just what they wrote. Indeed the same sentence can be sometimes ambiguous, and a logical formula never is. As an aside, I wanna say that material implication (that is, the "if x then y" of classical logic) deeply sucks, or rather, an implication in natural language very rarely maps cleanly into material implication. Having an implication if x then y being vacuously true when x is false is something usually associated with people that smirk on clever wordplays, rather than something people actually mean when they say "if x then y" Agree with much of your comment. Though note that as GP said, on the Wason selection task, people famously do much better when it's framed in a social context. That at least partially undermines your theory that its lack of familiarity with the terminology of formal logic. I for the life of me could not solve the <18 example from wikipedia. but the number/color one is super easy Your response contains a performative contradiction: you are asserting that humans are naturally logical while simultaneously committing several logical errors to defend that claim. This comment would be a lot more useful with an enumeration of those logical errors. commenter’s specific claim—that adding a note about the definition of "if" would solve the problem—is a moving the goalposts fallacy and a tautology. The comment also suffers from hasty generalization (in their experience the test isn't hard) and special pleading (double standard for LLM and humans). When someone tells you "you can have this if you pay me", they don't mean "you can also have it if you don't pay". They are implicitly but clearly indicating you gotta pay. It's as simple as that. In common use, "if x then y" frequently implies "if not x then not y". Pretending that it's some sort of a cognitive defect to interpret it this way is silly. In the original studies, most people made an error that can't be explained by that misunderstanding: they failed to select the card showing 'not y'. > Kahneman’s whole framework points the same direction. Most of what people call “reasoning” is fast, associative, pattern-based. The slow, deliberate, step-by-step stuff is effortful and error-prone, and people avoid it when they can. And even when they do engage it, they’re often confabulating a logical-sounding justification for a conclusion they already reached by other means. Some references on that https://en.wikipedia.org/wiki/Thinking,_Fast_and_Slow https://thedecisionlab.com/reference-guide/philosophy/system... System 1 really looks like a LLM (indeed completing a phrase is an example of what it can do, like, "you either die a hero, or you live enough to become the _"). It's largely unconscious and runs all the time, pattern matching on random stuff System 2 is something else and looks like a supervisor system, a higher level stuff that can be consciously directed through your own will But the two systems run at the same time and reinforce each other Brilliant insight. The success of LLM reasoning, ie “telling yourself a story”, has greatly increased my belief that humans are actually much less impressive than they seem. I do think it’s mostly pattern matching and a bunch of interacting streams analogous to LLM tokens. Obviously the implementations are different, because nature has to be robust and learn online, but I do not think we are as different from these machines as most people assume. There’s a reason Hofstadter et al. reacted as they did even to the earlier models. > The story that humans have access to some pure deductive engine and LLMs are just faking it with statistics might be flattering to humans more than it’s accurate. Your point rings true with most human reasoning most of the time. Still, at least some humans do have the capability to run that deductive engine, and it seems to be a key part (though not the only part) of scientific and mathematical reasoning. Even informal experimentation and iteration rest on deductive feedback loops. > Even with continuous backpropagation and "learning" That's what I said. Backpropagation cannot be enough; that's not how neurons work in the slightest. When you put biological neurons in a Pong environment they learn to play not through some kind of loss or reward function; they self-organize to avoid unpredictable stimulation. As far as I know, no architecture learns in such an unsupervised way. https://www.sciencedirect.com/science/article/pii/S089662732... Forgive me for being ignorant - but 'loss' in supervised learning ML context encode the difference between how unlikely (high loss) or likely (low loss) was the network in predicting the output based on the input. This sounds very similar to me as to what neurons do (avoid unpredictable stimulation) So, I have been thinking about this for a little while. Image a model f that takes a world x and makes a prediciton y. At a high-level, a traditional supervised model is trained like this f(x)=y' => loss(y',y) => how good was my prediction? Train f through backprop with that error. While a model trained with reinforcement learning is more similar to this. Where m(y) is the resulting world state of taking an action y the model predicted. f(x)=y' => m(y')=z => reward(z) => how good was the state I was in based on my actions? Train f with an algorithm like REINFORCE with the reward, as the world m is a non-differentiable black-box. While a group of neurons is more like predicting what is the resulting word state of taking my action, g(x,y), and trying to learn by both tuning g and the action taken f(x). f(x)=y' => m(y')=z => g(x,y)=z' => loss(z,z') => how predictable was the results of my actions? Train g normally with backprop, and train f with an algorithm like REINFORCE with negative surprise as a reward. After talking with GPT5.2 for a little while, it seems like Curiosity-driven Exploration by Self-supervised Prediction[1] might be an architecture similar to the one I described for neurons? But with the twist that f is rewarded by making the prediction error bigger (not smaller!) as a proxy of "curiosity". So can't you just use how real neurons learn as training data to to learn how to learn the same way? I think people MOSTLY foresee and anticipate events in OUR training data, which mostly comprises information collected by our senses. Our training data is a lot more diverse than an LLMs. We also leverage our senses as a carrier for communicating abstract ideas using audio and visual channels that may or may not be grounded in reality. We have TV shows, video games, programming languages and all sorts of rich and interesting things we can engage with that do not reflect our fundamental reality. Like LLMs, we can hallucinate while we sleep or we can delude ourselves with untethered ideas, but UNLIKE LLMs, we can steer our own learning corpus. We can train ourselves with our own untethered “hallucinations” or we can render them in art and share them with others so they can include it in their training corpus. Our hallucinations are often just erroneous models of the world. When we render it into something that has aesthetic appeal, we might call it art. If the hallucination helps us understand some aspect of something, we call it a conjecture or hypothesis. We live in a rich world filled with rich training data. We don’t magically anticipate events not in our training data, but we’re also not void of creativity (“hallucinations”) either. Most of us are stochastic parrots most of the time. We’ve only gotten this far because there are so many of us and we’ve been on this earth for many generations. Most of us are dazzled and instinctively driven to mimic the ideas that a small minority of people “hallucinate”. There is no shame in mimicking or being a stochastic parrot. These are critical features that helped our ancestors survive. > We can steer our own learning corpus This is critical. We have some degree of attentional autonomy. And we have a complex tapestry of algorithms running in thalamocortical circuits that generate “Nows”. Truncation commands produce sequences of acts (token-like products). > They will not foresee/anticipate events, that are unlikely or non-existent in their training data, but are bound to happen due to real world circumstances. They are not intelligent in that way. Can you be a bit more specific at all bounds? Maybe via an example? The main difference is humans are learning all the time and models learn batch wise and forget whatever happened in a previous session unless someone makes it part of the training data so there is a massive lag. Whoever cracks the continuous customized (per user, for instance) learning problem without just extending the context window is going to be making a big splash. And I don't mean cheats and shortcuts, I mean actually tuning the model based on received feedback. Why not just provide more compute for say, 1 billion token context for each user to mimic continuous learning. Then retrain the model in the background to include learnings. The user wouldn’t know if the continuous learning came from the context or the model retrained. It wouldn’t matter. Continuous learning seems to be a compute and engineering problem. Because that re-training is not strong enough to hold, or so it seems. The same dumb factual errors keep coming up on different generations of the same models. I've yet to see proof that something 'stuck' from model to model. They get better in a general sense but not in the specific sense that what was corrected stays put, not from session to session and not from one generation to the next. My solution is to have this massive 'boot up' prompt but it becomes extremely tedious to maintain. > Models today are static, and human brains don't learn or adapt themselves with anything close to backpropagation. While I suspect latter is a real problem (because all mammal brains* are much more example-efficient than all ML), the former is more about productisation than a fundamental thing: the models can be continuously updated already, but that makes it hard to deal with regressions. You kinda want an artefact with a version stamp that doesn't change itself before you release the update, especially as this isn't like normal software where specific features can be toggled on or off in isolation of everything else. * I think. Also, I'm saying "mammal" because of an absence of evidence (to my *totally amateur* skill level) not evidence of absence. they can be continuously updated, assuming you re-run representative samples of the training set through them continuously. Unlike a mammal brain which preserves the function of neurons unless they activate in a situation which causes a training signal, deep nets have catastrophic forgetting because signals get scattered everywhere. If you had a model continuously learning about you in your pocket, without tons of cycles spent "remembering" old examples. In fact, this is a major stumbling block in standard training, sampling is a huge problem. If you just iterate through the training corpus, you'll have forgotten most of the english stuff by the time you finish with chinese or spanish. You have to constantly mix and balance training info due to this limitation. The fundamental difference is that physical neurons have a discrete on/off activation, while digital "neurons" in a network are merely continuous differentiable operations. They also don't have a notion of "spike timining dependency" to avoid overwriting activations that weren't related to an outcome. There are things like reward-decay over time, but this applies to the signal at a very coarse level, updates are still scattered to almost the entire system with every training example. yes those are bottlenecks that world models don't solve. but the promise of world models is, unlike LLMs, they might be able to learn things about the world that humans haven't written. For example, we still don't fully know how insects fly. A world model could be trained on thousands of videos of insects and make a novel observation about insect trajectories. The premise is that despite being here for millenia, humans have only observed a tiny fraction of the world. So I do buy his idea. But I disagree that you need world models to get to human level capabilities. IMO there's no fundamental reason why models can't develop human understanding based on the known human observations. You could have continual learning on text and still be stuck in the same "remixing baseline human communications" trap. It's a nasty one, very hard to avoid, possibly even structurally unavoidable. As for the "just put a vision LLM in a robot body" suggestion: People are trying this (e.g. Physical Intelligence) and it looks like it's extraordinarily hard! The results so far suggest that bolting perception and embodiment onto a language-model core doesn't produce any kind of causal understanding. The architecture behind the integration of sensory streams, persistent object representations, and modeling time and causality is critically important... and that's where world models come in. The fact that models aren't continually updating seems more like a feature. I want to know the model is exactly the same as it was the last time I used it. Any new information it needs can be stored in its context window or stored in a file to read the next it needs to access it. > The fact that models aren't continually updating seems more like a feature. I think this is true to some extent: we like our tools to be predictable. But we’ve already made one jump by going from deterministic programs to stochastic models. I am sure the moment a self-evolutive AI shows up that clears the "useful enough" threshold we’ll make that jump as well. Stochastic and unpredictability aren't exactly the same. I would claim current LLMs are generally predictable even if it is not as predictable as a deterministic program. No, but my point is that to some extent we value determinism. By making the jump to stochastic models we already move away from the status quo; further jumps are entirely possible. Depending on use case we can accept more uncertainty if it comes with benefits. I also don’t think there is a reason to believe that self-learning models must be unpredictable. Persistent memory through text in the context window is a hack/workaround. And generally: > I want to know the model is exactly the same as it was the last time I used it. What exactly does that gain you, when the overall behavior is still stochastic? But still, if it's important to you, you can get the same behavior by taking a model snapshot once we crack continuous learning. It’s a feature of a good tool, but a sentient intelligence is more than just a tool Unless you use your oen local models then you don't even know when OpenAI or Anthropic tweaked the model less or more. One week it's a version x, next week it's a version y. Just like your operating system is continuously evolving with smaller patches of specific apps to whole new kernel version and new OS release. There is still a huge gap between a model continuously updating itself and weekly patches by a specialist team. The former would make things unpredictable. It's pretty simple... the word circle and what you can correlate to it via english language description has somewhat less to do with reality than a physical 3D model of a circle and what it would do in an environment. You can't just add more linguistic description via training data to change that. It doesn't really matter that you can keep back propagating because what you are back propagating over is fundamentally and qualitatively less rich. If your model is poor, no amount of learning can fix it. If you don't think your model architecture is limited, you aren't looking hard enough. I don’t understand your view. Reality is that we need some way to encode the rules of the world in a more definitive way. If we want models to be able to make assertive claims about important information and be correct, it’s very fair to theorize they might need a more deterministic approach than just training them more. But it’s just a theory that this will actually solve the problem. Ultimately, we still have a lot to learn and a lot of experiments to do. It’s frankly unscientific to suggest any approaches are off the table, unless the data & research truly proves that. Why shouldn’t we take this awesome LLM technology and bring in more techniques to make it better? A really, really basic example is chess. Current top AI models still don’t know how to play it (https://www.software7.com/blog/ai_chess_vs_1983_atari/) The models are surely trained on source material that include chess rules, and even high level chess games. But the models are not learning how to play chess correctly. They don’t have a model to understand how chess actually works — they only have a non-deterministic prediction based on what they’ve seen, even after being trained on more data than any chess novice has ever seen about the topic. And this is probably one of the easiest things for AI to stimulate. Very clear/brief rules, small problem space, no hidden information, but it can’t handle the massive decision space because its prediction isn’t based on the actual rules, but just “things that look similar” (And yeah, I’m sure someone could build a specific LLM or agent system that can handle chess, but the point is that the powerful general purpose models can’t do it out of the box after training.) Maybe more training & self-learning can solve this, but it’s clearly still unsolved. So we should definitely be experimenting with more techniques. > Reality is that we need some way to encode the rules of the world in a more definitive way I mean, sure. But do world models the way LeCun proposes them solves this? I don't think so. JEPAs are just an unsupervised machine learning model at the end of the day; they might end up being better that just autoregressive pretraining on text+images+video, but they are not magic. For example, if you train a JEPA model on data of orbital mechanics, will it learn actually sensible algorithms to predict the planets' motions or will it just learn a mix of heuristic? I don't understand why online learning is that necessary. If you took Einstein at 40 and surgically removed his hippocampus so he can't learn anything he didn't already know (meaning no online learning), that's still a very useful AGI. A hippocampus is a nice upgrade to that, but not super obviously on the critical path. > If you took Einstein at 40 and surgically removed his hippocampus so he can't learn anything he didn't already know (meaning no online learning), that's still a very useful AGI. I like how people are accepting this dubious assertion that Einstein would be "useful" if you surgically removed his hippocampus and engaging with this. It also calls this Einstein an AGI rather than a disabled human??? He basically said that himself: "Reading, after a certain age, diverts the mind too much from its creative pursuits. Any man who reads too much and uses his own brain too little falls into lazy habits of thinking". -- Albert Einstein I guess the sheer amount and also variety of information you would need to pre-encode to get an Einstein at 40 is huge. Every day stream of high resolution video feed and actions and consequences and thoughts and ideas he has had until the age of 40 of every single moment. That includes social interactions, like a conversation and mimic of the other person in combination with what was said and background knowledge about the other person. Even a single conversation's data is a huge amount of data. But one might say that the brain is not lossless ... True, good point. But in what way is it lossy? Can that be simulated well enough to learn an Einstein? What gives events significance is very subjective. Kinda a moot point in my eyes because I very much doubt you can arrive at the same result without the same learning process. That's true. Though could that hippocampus-less Einstein be able to keep making novel complex discoveries from that point forward? Seems difficult. He would rapidly reach the limits of his short term memory (the same way current models rapidly reach the limits of their context windows). Iirc LeCunn talks about a self organizing hierarchy of real world objects and imo this is exactly how the human brain actually learns Who knows? Perhaps attention really is all you need. Maybe our context window is really large. Or our compression is really effective. Perhaps adding external factors might be able to indirectly teach the models to act more in line with social expectations such as being embarrassed to repeat the same mistake, unlocking the final piece of the puzzle. We are still stumbling in the dark for answers. The reason LLMs fail today is because there’s no meaning inherent to the tokens they produce other than the one captured by cooccurrence within text. Efforts like these are necessary because so much of “general intelligence” is convention defined by embodied human experience, for example arrows implying directionality and even directionality itself. Agents have the ability of continual learning. Putting stuff you have learned into a markdown file is a very "shallow" version of continual learning. It can remember facts, yes, but I doubt a model can master new out-of-distribution tasks this way. If anything, I think that Google's Titans[1] and Hope[2] architectures are more aligned with true continual learning (without being actual continual learning still, which is why they call it "test-time memorization"). I have had it master tasks by doing this. The first time it tries to solve an issue it may take a long time, but it documents its findings and how it was able to do it and then it applies that knowledge the next time the task comes up. The sum of human knowledge is more than enough to come up with innovative ideas and not every field is working directly with the physical world. Still I would say there's enough information in the written history to create virtual simulation of 3d world with all ohysical laws applying (to a certain degree because computation is limited). What current LLMs lack is inner motivation to create something on their own without being prompted. To think in their free time (whatever that means for batch, on demand processing), to reflect and learn, eventually to self modify. I have a simple brain, limited knowledge, limited attention span, limited context memory. Yet I create stuff based what I see, read online. Nothing special, sometimes more based on someone else's project, sometimes on my own ideas which I have no doubt aren't that unique among 8 billions of other people. Yet consulting with AI provides me with more ideas applicable to my current vision of what I want to achieve. Sure it's mostly based on generally known (not always known to me) good practices. But my thoughts are the same way, only more limited by what I have slowly learned so far in my life. > virtual simulation of 3d world Virtual simulations are not substitutable for the physical world. They are fundamentally different theory problems that have almost no overlap in applicability. You could in principle create a simulation with the same mathematical properties as the physical world but no one has ever done that. I'm not sure if we even know how. Physical world dynamics are metastable and non-linear at every resolution. The models we do build are created from sparse irregular samples with large error rates; you often have to do complex inference to know if a piece of data even represents something real. All of this largely breaks the assumptions of our tidy sampling theorems in mathematics. The problem of physical world inference has been studied for a couple decades in the defense and mapping industries; we already have a pretty good understanding of why LLM-style AI is uniquely bad at inference in this domain, and it mostly comes down to the architectural inability to represent it. Grounded estimates of the minimum quantity of training data required to build a reliable model of physical world dynamics, given the above properties, is many exabytes. This data exists, so that is not a problem. The models will be orders of magnitude larger than current LLMs. Even if you solve the computer science and theory problems around representation so that learning and inference is efficient, few people are prepared for the scale of it. (source: many years doing frontier R&D on these problems) > You could in principle create a simulation with the same mathematical properties as the physical world but no one has ever done that. I'm not sure if we even know how. What do you mean by that? Simulating physics is a rich field, which incidentally was one of the main drivers of parallel/super computing before AI came along. The mapping of the physical world onto a computer representation introduces idiosyncratic measurement issues for every data point. The idiosyncratic bias, errors, and non-repeatability changes dynamically at every point in space and time, so it can be modeled neither globally nor statically. Some idiosyncratic bias exhibits coupling across space and time. Reconstructing ground truth from these measurements, which is what you really want to train on, is a difficult open inference problem. The idiosyncratic effects induce large changes in the relationships learnable from the data model. Many measurements map to things that aren't real. How badly that non-reality can break your inference is context dependent. Because the samples are sparse and irregular, you have to constantly model the noise floor to make sure there is actually some signal in the synthesized "ground truth". In simulated physics, there are no idiosyncratic measurement issues. Every data point is deterministic, repeatable, and well-behaved. There is also much less algorithmic information, so learning is simpler. It is a trivial problem by comparison. Using simulations to train physical world models is skipping over all the hard parts. I've worked in HPC, including physics models. Taking a standard physics simulation and introducing representative idiosyncratic measurement seems difficult. I don't think we've ever built a physics simulation with remotely the quantity and complexity of fine structure this would require. I'm probably missing most of your point, but wouldn't the fact that we have inverse problems being applied in real-world situations somewhat contradict your qualms? In those cases too, we have to deal with noisy real-world information. I'll admit I'm not very familiar with that type of work - I'm in the forward solve business - but if assumptions are made on the sensor noise distribution, couldn't those be inferred by more generic models? I realize I'm talking about adding a loop on top of an inverse problem loop, which is two steps away (just stuffing a forward solve in a loop is already not very common due to cost and engineering difficulty). Or better yet, one could probably "primal-adjoint" this and just solve at once for physical parameters and noise model, too. They're but two differentiable things in the way of a loss function. Is this like some scale-independent version of Heisenberg's Uncertainty Principle? I guess you need two things to make that happen. First, more specialization among models and an ability to evolve, else you get all instances thinking roughly the same thing, or deer in the headlights where they don't know what of the millions of options they should think about. Second, fewer guardrails; there's only so much you can do by pure thought. The problem is, idk if we're ready to have millions of distinct, evolving, self-executing models running wild without guardrails. It seems like a contradiction: you can't achieve true cognition from a machine while artificially restricting its boundaries, and you can't lift the boundaries without impacting safety. Thank you for not saying "language", but "text". It's true, but it's also true that text is very expressive. Programming languages (huge, formalized expressiveness), math and other formal notation, SQL, HTML, SVG, JSON/YAML, CSV, domain specific encoding ie. for DNA/protein sequences, for music, verilog/VHDL for hardware, DOT/Graphviz/Mermaid, OBJ for 3D, Terraform/Nix, Dockerfiles, git diffs/patches, URLs etc etc. The scope is very wide and covers enough to be called generic especially if you include multi modalities that are already being blended in (images, videos, sound). I'm cheering for Yann, hope he's right and I really like his approach to openness (hope he'll carry it over to his new company). At the same time current architectures do exist now and do work, by far exceeding his or anybody's else expectations and continue doing so. It may also be true they're here to stay for long on text and other supported modalities as cheaper to train. It's just not true LLMs are limited to "static text". Data is data. Sensory input is still just data, and multimodal models has been a thing for a while. Ongoing learning and more extensive short term memory is a challenge, and so I am all for research in alternative architectures, but so much of the discourse about the limitations of LLMs act as if they have limitations they do not have. I'm gonna be a cynic and say this is money following money and Yann LeCun is an excellent salesman. I 100% guarantee that he will not be holding the bag when this fails. Society will be protecting him. On that proviso I have zero respect for this guy. Um, why would anyone be "holding the bag" and who needs protecting by society? He's not taking out a loan, he's getting capital investment in a startup. People are gambling that he will do well and make money for them. If they gamble wrong, that's on them. Society won't be doing anything either way because investors in startups that fail don't get anything. Okay but most modern LLMs are multimodal, and it’s fairly easy to make an LLM multimodal. Also there is no evidence that novel discoveries are more than remixes. This is heavily debated but from what we’ve seen so far I’m not sure I would bet against remix. World models are great for specific kinds of RL or MPC. Yann is betting heavily on MPC, I’m not sure I agree with this as it’s currently computationally intractable at scale Agree. LLMs operate in the domain of language and symbols, but the universe contains much more than that. Humans also learn a great deal from direct phenomenological experience of the world, even without putting those experiences into words.
I remember a talk by Yann LeCun where he pointed out that in just the first couple of years of life, a human baby is exposed to orders of magnitude more sensory data (vision, sound, etc.) than what current LLMs are typically trained on. This seems like a major limitation of purely language-based models. I have a pet peeve with the concept of "a genuinely novel discovery or invention", what do you imagine this to be? Can you point me towards a discovery or invention that was "genuinely novel", ever? I don't think it makes sense conceptually unless you're literally referring to discovering new physical things like elements or something. Humans are remixers of ideas. That's all we do all the time. Our thoughts and actions are dictated by our environment and memories; everything must necessarily be built up from pre-existing parts. W Brian Arthur's book "The Nature of Technology" provides a framework for classifying new technology as elemental vs innovative that I find helpful. For example the Huntley-Mcllroy diff operates on the phenomenon that ordered correspondence survives editing. That was an invention (discovery of a natural phenomenon and a means to harness it). Myers diff improves the performance by exploiting the fact that text changes are sparse. That's innovation. A python app using libdiff, that's engineering.
And then you might say in terms of "descendants": invention > innovation > engineering. But it's just a perspective. Novel things can be incremental. I don't think LLMs can do that either, at least I've never seen one do it. Suno is transformer-based; in a way it's a heavily modified LLM. You can't get Suno to do anything that's not in its training data. It is physically incapable of inventing a new musical genre. No matter how detailed the instructions you give it, and even if you cheat and provide it with actual MP3 examples of what you want it to create, it is impossible. The same goes for LLMs and invention generally, which is why they've made no important scientific discoveries. You can learn a lot by playing with Suno. I don't see how this is an architectural problem though. The problem is that music datasets are highly multimodal, and the training process is relying almost entirely on this dataset instead of incorporating basic musical knowledge to allow it to explore a bit further. That's what happens when computer scientists aim to "upset" a field without consulting with experts in said field. Genuinely novel discovery or invention? Einstein’s theory of relativity springs to mind, which is deeply counter-intuitive and relies on the interaction of forces unknowable to our basic Newtonian senses. There’s an argument that it’s all turtles (someone told him about universes, he read about gravity, etc), but there are novel maths and novel types of math that arise around and for such theories which would indicate an objective positive expansion of understanding and concept volume. Einstein was heavily inspired by Mach: https://en.wikipedia.org/wiki/Mach%27s_principle Nah - Poincare & Lorentz did quite a bit of groundwork on relativity and its implications before Einstein put it all together. You're right that world models are the bottleneck, but people underestimate the staggering complexity gap between modeling the physical world and modeling a one-dimensional stream of text. Not only is the real world high-dimensional, continuous, noisy, and vastly more information dense, it's also not something for which there is an abundance of training data. A few years ago I've made this simple thought experiment to convince myself that LLM's won't achieve superhuman level (in the sense of being better than all human experts): Imagine that we made an LLM out of all dolphin songs ever recorded, would such LLM ever reach human level intelligence? Obviously and intuitively the answer is NO. Your comment actually extended this observation for me sparking hope that systems consuming natural world as input might actually avoid this trap, but then I realized that tool use & learning can in fact be all that's needed for singularity while consuming raw data streams most of the time might actually be counterproductive. Imagine that we made an LLM out of all dolphin songs ever recorded, would such LLM ever reach human level intelligence? It could potentially reach super-dolphin level intelligence I mean no offense here, but I really don't like this attitude of "I thought for a bit and came up with something that debunks all of the experts!". It's the same stuff you see with climate denialism, but it seems to be considered okay when it comes to AI. As if the people that spend all day every day for decades have not thought of this. Dataset limitations have been well understood since the dawn of statistics-based AI, which is why these models are trained on data and RL tasks that are as wide as possible, and are assessed by generalization performance. Most of the experts in ML, even the mathematically trained ones, within the last few years acknowledge that superintelligence (under a more rigorous definition than the one here) is quite possible, even with only the current architectures. This is true even though no senior researcher in the field really wants superintelligence to be possible, hence the dozens of efforts to disprove its potential existence. Gotta say, good luck with that effort. Lenat started Cyc 42 years ago, and after a while it seemed to disappear. 'Understanding' the 'physical world' is something that a few -may- start to approach intuitively after a decade or five of experience. (Einstein, Maxwell, et.al.) But the idea of feeding a machine facts and equations ... and dependence on human observations ... seems unlikely to lead to 'mastering the physical world'. Let alone for $1Billon. Was Alphago's move 37 original? In the last step of training LLMs, reinforcement learning from verified rewards, LLMs are trained to maximize the probability of solving problems using their own output, depending on a reward signal akin to winning in Go. It's not just imitating human written text. Fwiw, I agree that world models and some kind of learning from interacting with physical reality, rather than massive amounts of digitized gym environments is likely necessary for a breakthrough for AGI. The term LLM is confusing your point because VLMs belong to the same bin according to Yann. Using the term autoregressive models instead might help. Whether it is text or an image, it is just bits for a computer. A token can represent anything. Sure, but don't conflate the representation format with the structure of what's being represented. Everything is bits to a computer, but text training data captures the flattened, after-the-fact residue of baseline human thought: Someone's written description of how something works. (At best!) A world model would need to capture the underlying causal, spatial, and temporal structure of reality itself -- the thing itself, that which generates those descriptions. You can tokenize an image just as easily as a sentence, sure, but a pile of images and text won't give you a relation between the system and the world. A world model, in theory, can. I mean, we ought to be sufficient proof of this, in a sense... It’s worth noting how our human relationship or understanding of our world model changed as our tools to inspect and describe our world advanced. So when we think about capturing any underlying structure of reality itself, we are constrained by the tools at hand. The capability of the tool forms the description which grants the level of understanding. why LLMs (transformers trained on multimodal token sequences, potentially containing spatiotemporal information) can't be a world model? I really hate the world model terminology, but the actual low level gripe between LeCunn and autoregressive LLMs as they stand now is the fact that the loss function needs to reconstruct the entirety of the input. Anything less than pixel perfect reconstruction on images is penalized. Token by token reconstruction also is biased towards that same level of granularity. The density of information in the spatiotemporal world is very very great, and a technique is needed to compress that down effectively. JEPAs are a promising technique towards that direction, but if you're not reconstructing text or images, it's a bit harder for humans to immediately grok whether the model is learning something effectively. I think that very soon we will see JEPA based language models, but their key domain may very well be in robotics where machines really need to experience and reason about the physical the world differently than a purely text based world. Isn't the Sora video model a ViT with spatiotemporal inputs (so they've found a way to compress that down), but at the same time LeCunn wouldn't consider that a world model? VideoGen models have to have decoder output heads that reproduce pixel level frames. The loss function involes producing plausible image frames that requires a lot of detailed reconstruction. I assume that when you get out of bed in the morning, the first thing you dont do is paint 1000 1080p pictures of what your breakfast looks like. LeCunns models predict purely in representation space and output no pixel scale detailed frames. Instead you train a model to generate a dower dimension representation of the same thing from different views, penalizing if the representation is different ehen looking at the same thing https://medium.com/state-of-the-art-technology/world-models-... > One major critique LeCun raises is that LLMs operate only in the realm of language, which is a simple, discrete space compared to the continuous, complex physical world we live in. LLMs can solve math problems or answer trivia because such tasks reduce to pattern completion on text, but they lack any meaningful grounding in physical reality. LeCun points out a striking paradox: we now have language models that can pass the bar exam, solve equations, and compute integrals, yet “where is our domestic robot? Where is a robot that’s as good as a cat in the physical world?” Even a house cat effortlessly navigates the 3D world and manipulates objects — abilities that current AI notably lacks. As LeCun observes, “We don’t think the tasks that a cat can accomplish are smart, but in fact, they are.” But they don't only operate on language? They operate on token sequences, which can be images, coordinates, time, language, etc. It’s an interesting observation, but I think you have it backwards. The examples you give are all using discrete symbols to represent something real and communicating this description to other entities. I would argue that all your examples are languages. Whats the first L stand for? Thats not just vestogial, their model of the world is formed almost exclusively from language rather than a range of things contributing significantly like for humans. The biggest thing thats missing is actual feedback to their decisions. They have no "idea of that because transformers and embeddings dont model that yet. And langiage descriptions and image representations of feedback arent enough. They are too disjointed. It needs more How is a Linear stream of symbols able to capture the relationships of a real world? It's like the people who are so hyped up about voice controlled computers. Like you get a linear stream of symbols is a huge downgrade in signals, right? I don't want computer interaction to be yet more simplified and worsened. Compare with domain experts who do real, complicated work with computers, like animators, 3D modelers, CAD, etc. A mouse with six degrees of freedom, and a strong training in hotkeys to command actions and modes, and a good mental model of how everything is working, and these people are dramatically more productive at manipulating data than anyone else. Imagine trying to talk a computer through nudging a bunch of vertexes through 3D space while flexibly managing modes of "drag" on connected vertexes. It would be terrible. And no, you would not replace that with a sentence of "Bot, I want you to nudge out the elbow of that model" because that does NOT do the same thing at all. An expert being able to fluidly make their idea reality in real time is just not even remotely close to the instead "Project Manager/mediocre implementer" relationship you get prompting any sort of generative model. The models aren't even built to contain specific "Style", so they certainly won't be opinionated enough to have artistic vision, and a strong understanding of what does and does not work in the right context, or how to navigate "My boss wants something stupid that doesn't work and he's a dumb person so how do I convince him to stop the dumb idea and make him think that was his idea?" >We don’t think the tasks that a cat can accomplish are smart, but in fact, they are. https://en.wikipedia.org/wiki/Moravec%27s_paradox All the things we look at as "Smart" seem to be the things we struggle with, not what is objectively difficult, if that can even be defined. There will be no "unlocking of AGI" until we develop a new science capable of artificial comprehension. Comprehension is the cornucopia that produces everything we are, given raw stimulus an entire communicating Universe is generated with a plethora of highly advanceds predator/prey characters in an infinitely complex dynamic, and human science and technology have no lead how to artificially make sense of that in a simultaneous unifying whole. That's comprehension. > There are a lot more degrees of freedom in world models. Perhaps for the current implementations this is true. But the reason the current versions keep failing is that world dynamics has multiple orders of magnitude fewer degrees of freedom than the models that are tasked to learn them. We waste so much compute learning to approximate the constraints that are inherent in the world, and LeCun has been pressing the point the past few years that the models he intends to design will obviate the excess degrees of freedom to stabilize training (and constrain inference to physically plausible states). If my assumption is true then expect Max Tegmark to be intimately involved in this new direction. > LLMs are fundamentally capped because they only learn from static text -- human communications about the world -- rather than from the world itself, which is why they can remix existing ideas but find it all but impossible to produce genuinely novel discoveries or inventions. No hate, but this is just your opinion. The definition of "text" here is extremely broad – an SVG is text, but it's also an image format. It's not incomprehensible to imagine how an AI model trained on lots of SVG "text" might build internal models to help it "visualise" SVGs in the same way you might visualise objects in your mind when you read a description of them. The human brain only has electrical signals for IO, yet we can learn and reason about the world just fine. I don't see why the same wouldn't be possible with textual IO. Yeah I don't even think you'd need to train it. You could probably just explain how SVG works (or just tell it to emit coordinates of lines it wants to draw), and tell it to draw a horse, and I have to imagine it would be able to do so, even if it had never been trained on images, svg, or even cartesian coordinates. I think there's enough world model in there that you could simply explain cartesian coordinates in the context, it'd figure out how those map to its understanding of a horse's composition, and output something roughly correct. It'd be an interesting experiment anyway. But yeah, I can't imagine that LLMs don't already have a world model in there. They have to. The internet's corpus of text may not contain enough detail to allow a LLM to differentiate between similar-looking celebrities, but it's plenty of information to allow it to create a world model of how we perceive the world. And it's a vastly more information-dense means of doing so. Really? As if not everyone told him the last 10 years, especially Gary Marcus which he ridiculed on Twitter at every occasion and now silently like a dog returning home switches to Gary's position. As if anyone was waiting for this, even 5 years ago this was old news, Tenenbaum is building world models for a long time. People in pop venture capital culture don't seem to know what is going on in research. Makes them easier to milk. Honestly, how do people who know so little have this much confidence to post here? I had lunch with Yann last August, about a week after Alex Wang became his "boss." I asked him how he felt about that, and at the time he told me he would give it a month or two and see how it goes, and then figure out if he should stay or find employment elsewhere. I told him he ought to just create his own company if he decides to leave Meta to chase his own dream, rather than work on the dream's of others. That said, while I 100% agree with him that LLM's won't lead to human-like intelligence (I think AGI is now an overloaded term, but Yann uses it in its original definition), I'm not fully on board with his world model strategy as the path forward. > I'm not fully on board with his world model strategy as the path forward can you please elaborate on your strategy as the path forward? You have to understand the strategy of all the other players: Build attention-grabbing, monetizable models that subsidize (at least in part) the run up to AGI. Nobody is trying to one-shot AGI. They're grinding and leveling up while (1) developing core competencies around every aspect of the problem domain and (2) winning users. I don't know if Meta is doing a good job of this, but Google, Anthropic, and OpenAI are. Trying to go straight for the goal is risky. If the first results aren't economically viable or extremely exciting, the lab risks falling apart. This is the exact point that Musk was publicly attacking Yann on, and it's likely the same one that Zuck pressed. > Trying to go straight for the goal is risky. That's the point of it. You need to take more risk for different approach. Same as what OpenAI did initially. > But this is not an applied AI company. There is absolutely no doubt about Yann's impact on AI/ML, but he had access to many more resources in Meta, and we didn't see anything. It could be a management issue, though, and I sincerely wish we will see more competition, but from what I quoted above, it does not seem like it. Understanding world through videos (mentioned in the article), is just what video models have already done, and they are getting pretty good (see Seedance, Kling, Sora .. etc). So I'm not quite sure how what he proposed would work. "and we didn't see anything" is not justified at all. Meta absolutely has (or at least had) a word class industry AI lab and has published a ton of great work and open source models (granted their LLM open source stuff failed to keep up with chinese models in 2024/2025 ; their other open source stuff for thins like segmentation don't get enough credit though). Yann's main role was Chief AI Scientist, not any sort of product role, and as far as I can tell he did a great job building up and leading a research group within Meta. He deserved a lot of credit for pushing Meta to very open to publishing research and open sourcing models trained on large scale data. Just as one example, Meta (together with NYU) just published "Beyond Language Modeling: An Exploration of Multimodal Pretraining" (https://arxiv.org/pdf/2603.03276) which has a ton of large-experiment backed insights. Yann did seem to end up with a bit of an inflated ego, but I still consider him a great research lead. Context: I did a PhD focused on AI, and Meta's group had a similar pedigree as Google AI/Deepmind as far as places to go do an internship or go to after graduation. For instance, under Yann's direction Meta FAIR produced the ESM protein sequence model, which is less hyped than AlphaFold, but has been incredibly influential. They achieved great performance without using multiple alignments as an input/inductive bias. This is incredibly important for large classes of proteins where multiple alignments are pretty much noise. I wasn't criticising his scientific contribution at all, that's why I started my comment by appraising what he did. Creating a startup has to be about a product. When you raise 1B, investors are expecting returns, not papers. > Creating a startup has to be about a product. When you raise 1B, investors are expecting returns, not papers. Speaking of returns - Apple absolutely fucked Meta ads with the privacy controls, which trashed ad performance, revenue and share price. Meta turned things around using AI, with Yann as the lead researcher. Are you willing to give him credit for that? Revenue is now greater than pre-Apple-data-lockdown How much of Meta's increased revenue is attributed to AI? I think Meta "turned things around" by bypassing privacy controls [1]. [1] https://9to5mac.com/2025/08/21/meta-allegedly-bypassed-apple... > I think Meta "turned things around" by bypassing privacy controls Why would Apple be complicit on this for years? >> but he had access to many more resources in Meta, and we didn't see anything > I wasn't criticising his scientific contribution at all, that's why I started my comment by appraising what he did. You were criticising his output at Facebook, though, but he was in the research group at facebook, not a product group, so it seems like we did actually see lots of things? they are not expecting returns at 1B+, just for some one to pay more than they paid six months ago > There is absolutely no doubt about Yann's impact on AI/ML, but he had access to many more resources in Meta, and we didn't see anything. That's true for 99% of the scientists, but dismissing their opinion based on them not having done world shattering / ground breaking research is probably not the way to go. > I sincerely wish we will see more competition I really wish we don't, science isn't markets. > Understanding world through videos The word "understanding" is doing a lot of heavy lifting here. I find myself prompting again and again for corrections on an image or a summary and "it" still does not "understand" and keeps doing the same thing over and over again. Do not keep bad results in context. You have to purge them to prevent them from effecting the next output. LLMs deceptively capable, but they don’t respond like a person. You can’t count on implicit context. You can’t count on parts of the implicit context having more weight than others. Most folks get paid a lot more in a corporate job than tinkering at home - using the 'follow the money' logic it would make sense they would produce their most inspired works as 9-5 full stack engineers. But often passion and freedom to explore are often more important than resources In an interview, Yann mentioned that one reason he left Meta was that they were very focused on LLMs and he no longer believed LLMs were the path forward to reaching AGI. llama models pushed the envelope for a while, and having them "open-weight" allowed a lot of tinkering. I would say that most of fine tuned evolved from work on top of llama models. Llama wasn’t Yann LeCun’s work and he was openly critical of LLMs, so it’s not very relevant in this context. Source: himself https://x.com/ylecun/status/1993840625142436160 (“I never worked on any Llama.”) and a million previous reports and tweets from him. He founded FAIR and the team in Paris that ultimately worked on the early Llama versions. FAIR was founded in 2015 and Llama's first release was in 2023. Musk co-founded OpenAI in 2015 but no reasonable person credits ChatGPT in 2022 to him. > My only contribution was to push for Llama 2 to be open sourced. Quite a big contribution in practice. Sure, but I don't that's relevant in a startup with 1B VC money either. Meta can afford to (attempt to) commoditize their complement. That's such a terrible take. For a hot minute Meta had a top 3 LLM and open sourced the whole thing, even with LeCunn's reservations around the technology. At the same time Meta spat out huge breakthroughs in: - 3d model generation - Self-supervised label-free training (DINO). Remember Alexandr Wang built a multibillion dollar company just around having people in third world countries label data, so this is a huge breakthrough. - A whole new class of world modeling techniques (JEPAs) - SAM (Segment anything) > - Self-supervised label-free training (DINO). Remember Alexandr Wang built a multibillion dollar company just around having people in third world countries label data, so this is a huge breakthrough. If it was a breakthrough, why did Meta acquire Wang and his company? I'm genuinely curious. Wang fits the profile of a possible successor ceo for meta.
Young, hit it big early, hit the ai book early straight out of college. Obviously not woke (just look at his public statements). Unfotunately the dude knows very little about ai or ml research. He's just another wealthy grifter. At this point decision making at Meta is based on Zuckerberg's vibes, and i suspect the emperor has no clothes. > we didn't see anything. Is it a troll? Even if we just ignore Llama, Meta invented and released so many foundational research and open source code. I would say that the computer vision field would be years behind if Meta didn't publish some core research like DETR or MAE. You should ignore Llama because by his own admission, >My only contribution was to push for Llama 2 to be open sourced. He founded the team that worked on fasttext, llama and other similarly impactful projects. I can’t reconcile this dichotomy: most of the landmark deep learning papers were developed with what, by today’s standards, were almost ridiculously small training budgets — from Transformers to dropout, and so on. So I keep wondering: if his idea is really that good — and I genuinely hope it is — why hasn’t it led to anything truly groundbreaking yet? It can’t just be a matter of needing more data or more researchers. You tell me :-D Its a matter of needing more time, which is a resource even SV VCs are scared to throw around. Look at the timeline of all these advancements and how long it took Lecun introduced backprop for deep learning back in 1989
Hinton published about contrastive divergance in next token prediction in 2002
Alexnet was 2012
Word2vec was 2013
Seq2seq was 2014
AiAYN was 2017
UnicornAI was 2019
Instructgpt was 2022 This makes alot of people think that things are just accelerating and they can be along for the ride. But its the years and years of foundational research that allows this to be done. That toll has to be paid for the successsors of LLMs to be able to reason properly and operate in the world the way humans do. That sowing wont happen as fast as the reaping did. Lecun was to plant those seeds, the others who onky was to eat the fruit dont get that they have to wait If his ideas had real substance, we would have seen substantial results by now.
He introduced I-JEPA in 2023, so almost three years ago at this point. If he still hasn’t produced anything truly meaningful after all these years at Meta, when is that supposed to happen? Yann LeCun has been at Facebook/Meta since December 2013. Your chronological sequence is interesting, but it refers to a time when the number of researchers and the amount of compute available were a tiny fraction of what they are today. Yann LeCun seeks $5B+ valuation for world model startup AMI (Amilabs). He has hired LeBrun to the helm as CEO. AMI has also hired LeFunde as CFO and LeTune as head of post-training. They’re also considering hiring LeMune as Head of Growth and LePrune to lead inference efficiency. https://techcrunch.com/2025/12/19/yann-lecun-confirms-his-ne... Why didn't they just call it LeLabs? I was thinking the same, are all people he hires LeSomething like those working at Bolson Construction having -son as a suffix. First grinding LEetcode, now having to have 'Le' in the name? I have no chance in AI industry... The guy overseeing the funds is called LeFunde and the guy doing the fine-tuning LeTune?? This couldn't have happened sooner, for 2 reasons. 1) the world has become a bit too focused on LLMs (although I agree that the benefits & new horizons that LLMs bring are real). We need research on other types of models to continue. 2) I almost wrote "Europe needs some aces". Although I'm European, my attitude is not at all that one of competition. This is not a card game. What Europe DOES need is an ATTRACTIVE WORKPLACE, so that talent that is useful for AI can also find a place to work here, not only overseas! So it is a startup? I expected it in fact from his reply to my concern. In my opinions, to explore the unknown, I think an institute like Mila, led by Yoshua Bengio, would have been more fitting. But Yann LeCun's career and his reply to my rant[1] speak for himself. I wonder how he is going to make money. Aside all my concerns, I wish him the best. > You're absolutely right. Only large and profitable companies can afford to do actual research. All the historically impactful industry labs (AT&T Bell Labs, IBM Research, Xerox PARC, MSR, etc) were with companies that didn't have to worry about their survival. They stopped funding ambitious research when they started losing their dominant market position. Regardless of your opinion of Yann or his views on auto regressive models being "sufficient" for what most would describe as AGI or ASI, this is probably a good thing for Europe. We need more well capitalized labs that aren't US or China centric and while I do like Mistral, they just haven't been keeping up on the frontier of model performance and seem like they've sort of pivoted into being integration specialists and consultants for EU corporations. That's fine and they've got to make money, but fully ceding the research front is not a good way to keep the EU competitive. LeCun's technical approach with AMI will likely be based on JEPA, which is also a very different approach than most US-based or Chinese AI labs are taking. If you're looking to learn about JEPA, LeCun's vision document "A Path Towards Autonomous Machine Intelligence" is long but sketches out a very comprehensive vision of AI research:
https://openreview.net/pdf?id=BZ5a1r-kVsf Training JEPA models within reach, even for startups. For example, we're a 3-person startup who trained a health timeseries JEPA. There are JEPA models for computer vision and (even) for LLMs. You don't need a $1B seed round to do interesting things here. We need more interesting, orthogonal ideas in AI. So I think it's good we're going to have a heavyweight lab in Europe alongside the US and China. Have you published anything about your health time series model? Sounds interesting! Sure! Here’s a description: https://www.empirical.health/blog/wearable-foundation-model-... Thanks! This is very neat. BTW, I went to your website looking for this, but didn't find your blog. I do now see that it's linked in the footer, but I was looking for it in the hamburger menu. Thanks! We need to re-do the top navigation / hamburger menu -- we've added a bunch of new things in the past few months, and it badly needs to be re-organized. Very interesting. I am keenly interested in this space and coincidentally had my blood drawn this morning. That said, have you considered that “Measure 100+ biomarkers with a single blood draw” combined with "heart health is a solved problem” reads a lot like Theranos? FWIW, the single blood draw is 6-8 vials -- so we're not claiming to get 100 biomarkers from a single drop. The point of that is mostly that it just takes one appointment / is convenient. This is very cool work! I have a quick follow-up: in the biomarker prediction task, what horizon (ie. how far into the future) did you set for the predictions? Prediction is hard beyond an hour, so it'd be impressive if your model handles that. The prediction task is set up as predicting the next measured biomarkers based on a week of wearable data. So it's not necessarily predicting into the future, but predicting dataset Y given dataset X. The specific biomarkers being predicted are the ones most relevant to heart health, like cholesterol or HbA1c. These tend to be more stable from hour to hour -- they may vary on a timescale of weeks as you modify your diet or take medications. oh nice, i actually used you guys for some labs a few months ago. Glad you're competing with function & superpower Appreciate your work! Healthcare is a regulated industry. Everything (Research, proposals, FDA submissions, Compliance docs, Accreditation Standards, etc.) is documented and follows a process, which means there's a lot of thesis. You can't sneak in anything unverified or unreliable. Why does healthcare need a JEPA\World model? Regulation is quickly catching up to modern AI techniques; for the most part, the approach is to verify outputs rather than process. For example, Utah's pilot to let AI prescribe medications has doctors check the first N prescriptions of each medication. Medicare is starting to pay for AI-enabled care, but tying payment to objective biomarkers like cholesterol or blood pressure actually got better. I've been working to understand the potential uses for JEPA. Outside of video, has anyone made a list of any type (geared towards dummies like me)? There seem to be other news articles mentioning that they are setting up in Singapore as their base. https://www.straitstimes.com/business/ai-godfather-raises-1-... Hm, Singapour looks more like "one of their base"; they will have offices in Paris, Montréal, Singapour and New York (according to both this article and the interview Yann Le Cun did this morning on France Inter, the most listened radio in France). Of course, each relevant newspaper on those areas highlight that it's coming to their place, but it really seems to be distributed. Probably just a satellite office. Might be to be close to some of Yann's collaborators like Xavier Bresson at NUS That's a Singaporian newspaper, though; not sure if it's objectively their main base, or just one of them Which would be a good idea, as a European. I'd hate to see the investment go to waste on taxes that are spent on stupid shit anyway. Should go into R&D not fighting bureaucracy. "Show me the incentive and I will show you the outcome." Almost certainly the IP will be held in Singapore for tax reasons. > they are setting up in Singapore as their base Europe in general has been tightening up their rules / taxes / laws around startups / companies especially tech and remote. It's been less friendly. these days. Yann Le Cun litteraly said this morning on the radio in France that it is headquarted in Paris and will pay taxes in France. Go figure… No he said something like “well yes, only for the parts of profits made in France” Why would it be any other way? French people have this pipe dream all others french people to pay 75% of what they produce worldwide to pay for their retreats, hospital, useless schools system and all theirs “comité Théodule” For such companies, France also offers generous R&D tax credits (Crédit Impôt Recherche): companies can recover roughly 30% of eligible R&D expenses incurred in France as a tax credit, which can eventually be refunded (in cash) if the company has no taxable profit. Is that alongside 100% of R&D expenses amortized in taxes when a company has taxable profit covering them? This is a singaporean news article from a singporean company[0] (Had to look it up) As such, They are more likely to talk about singapore news and exaggerate the claims. Singapore isn't the Key location. From what I am seeing online, France is the major location. Singapore is just one of the more satellite like offices. They have many offices around the world it seems. > Europe in general has been tightening up their rules / taxes / laws around startups / companies especially tech and remote. Like? Care to provide any specific examples? "Europe" is a continent composed of various countries, most of which have been doing a lot to make it easier for startups and companies in general. While I’d love there to be a European frontier model, I do very much enjoy mistral. For the price and speed it outperforms any other model for my use cases (language learning related formatting, non-code non-research). Partner in a fund that wrote a small check into this — I have no private knowledge of the deal - while I agree that one’s opinion on auto regressive models doesn’t matter, I think the fact of whether or not the auto regressive models work matters a lot, and particularly so in LeCun’s case. What’s different about investing in this than investing in say a young researcher’s startup, or Ilya’s superintelligence? In both those cases, if a model architecture isn’t working out, I believe they will pivot. In YL’s case, I’m not sure that is true. In that light, this bet is a bet on YL’s current view of the world. If his view is accurate, this is very good for Europe. If inaccurate, then this is sort of a nothing-burger; company will likely exit for roughly the investment amount - that money would not have gone to smaller European startups anyway - it’s a wash. FWIW, I don’t think the original complaint about auto-regression “errors exist, errors always multiply under sequential token choice, ergo errors are endemic and this architecture sucks” is intellectually that compelling. Here: “world model errors exist, world model errors will always multiply under sequential token choice, ergo world model errors are endemic and this architecture sucks.” See what I did there? On the other hand, we have a lot of unused training tokens in videos, I’d like very much to talk to a model with excellent ‘world’ knowledge and frontier textual capabilities, and I hope this goes well. Either way, as you say, Europe needs a frontier model company and this could be it. I don't think it's "regardless", your opinion on LeCun being right should be highly correlated to your opinion on whether this is good for Europe. If you think that LLMs are sufficient and RSI is imminent (<1 year), this is horrible for Europe. It is a distracting boondoggle exactly at the wrong time. It's sufficient to think that there is a chance that they will not be, however, for there to be a non-zero value to fund other approaches. And even if you think the chance is zero, unless you also think there is a zero chance they will be capable of pivoting quickly, it might still be beneficial. I think his views are largely flawed, but chances are there will still be lots of useful science coming out of it as well. Even if current architectures can achieve AGI, it does not mean there can't also be better, cheaper, more effective ways of doing the same things, and so exploring the space more broadly can still be of significant value. I think LeCun has been so consistently wrong and boneheaded for basically all of the AI boom, that this is much, much more likely to be bad than good for Europe. Probably one of the worst people to give that much money to that can even raise it in the field. LeCun was stubbornly 'wrong and boneheaded' in the 80s, but turned out to be right. His contention now is that LLMs don't truly understand the physical world - I don't think we know enough yet to say whether he is wrong. Could you please elaborate on what he was wrong about? He said that LLMs wouldn't have common sense about how the real world physically works, because it's so obvious to humans that we don't bother putting it into text. This seems pretty foolish honestly given the scale of internet data, and even at the time LLMs could handle the example he said they couldn't I believe he didn't think that reasoning/CoT would work well or scale like it has Whenever I see claims about AGI being reachable through large language models, it reminds me of the miasma theory of disease. Many respectable medical professionals were convinced this was true, and they viewed the entire world through this lens. They interpreted data in ways that aligned with a miasmatic view. Of course now we know this was delusional and it seems almost funny in retrospect. I feel the same way when I hear that 'just scale language models' suddenly created something that's true AGI, indistinguishable from human intelligence. > Whenever I see claims about AGI being reachable through large language models, it reminds me of the miasma theory of disease. Whenever I see people think the model architecture matters much, I think they have a magical view of AI. Progress comes from high quality data, the models are good as they are now. Of course you can still improve the models, but you get much more upside from data, or even better - from interactive environments. The path to AGI is not based on pure thinking, it's based on scaling interaction. To remain in the same miasma theory of disease analogy, if you think architecture is the key, then look at how humans dealt with pandemics... Black Death in the 14th century killed half of Europe, and none could think of the germ theory of disease. Think about it - it was as desperate a situation as it gets, and none had the simple spark to keep hygiene. The fact is we are also not smart from the brain alone, we are smart from our experience. Interaction and environment are the scaffolds of intelligence, not the model. For example 1B users do more for an AI company than a better model, they act like human in the loop curators of LLM work. If I'm understanding you, it seems like you're struck by hindsight bias. No one knew the miasma theory was wrong... it could have been right! Only with hindsight can we say it was wrong. Seems like we're in the same situation with LLMs and AGI. The miasma theory of disease was "not even wrong" in the sense that it was formulated before we even had the modern scientific method to define the criteria for a theory in the first place. And it was sort of accidentally correct in that some non-infectious diseases are caused by airborne toxins. Plenty of scientific authorities believed in it through the 19th century, and they didn't blindly believe it: it had good arguments for it, and intelligent people weighed the pros and cons of it and often ended up on the side of miasma over contagionism. William Farr was no idiot, and he had sophisticated statistical arguments for it. And, as evidence that it was a scientific theory, it was abandoned by its proponents once contagionism had more evidence on its side. It's only with hindsight that we think contagionism is obviously correct. > Only with hindsight can we say it was wrong It really depends what you mean by 'we'. Laymen? Maybe. But people said it was wrong at the time with perfectly good reasoning. It might not have been accessible to the average person, but that's hardly to say that only hindsight could reveal the correct answer. It's unintuitive to me that architecture doesn't matter - deep learning models, for all their impressive capabilities, are still deficient compared to human learners as far as generalisation, online learning, representational simplicity and data efficiency are concerned. Just because RNNs and Transformers both work with enormous datasets doesn't mean that architecture/algorithm is irrelevant, it just suggests that they share underlying primitives. But those primitives may not be the right ones for 'AGI'. If model arch doesn't matter much how come transformers changed everything? Luck. RNNs can do it just as good, Mamba, S4, etc - for a given budget of compute and data. The larger the model the less architecture makes a difference. It will learn in any of the 10,000 variations that have been tried, and come about 10-15% close to the best. What you need is a data loop, or a data source of exceptional quality and size, data has more leverage. Architecture games reflect more on efficiency, some method can be 10x more efficient than another. That's not how I read the transformer stuff around the time it was coming out: they had concrete hypotheses that made sense, not just random attempts at striking it lucky. In other words, they called their shots in advance. I'm not aware that we have notably different data sources before or after transformers, so what confounding event are you suggesting transformers 'lucked' in to being contemporaneous with? Also, why are we seeing diminishing returns if only the data matters. Are we running out of data? The premise is wrong, we are not seeing diminishing returns. By basically any metric that has a ratio scale, AI progress is accelerating, not slowing down.
A_D_E_P_T - a day ago
andy12_ - a day ago
zelphirkalt - a day ago
perfmode - 15 hours ago
lich_king - 15 hours ago
mirekrusin - an hour ago
ghywertelling - 27 minutes ago
retsibsi - 3 hours ago
nextaccountic - 4 hours ago
edanm - 12 hours ago
the_mar - 6 hours ago
lugu - 11 hours ago
jacquesm - 10 hours ago
lugu - 9 hours ago
lich_king - 10 hours ago
retsibsi - 2 hours ago
nextaccountic - 5 hours ago
rhubarbtree - 2 hours ago
jonahx - 8 hours ago
andy12_ - a day ago
torginus - 20 hours ago
andy12_ - 19 hours ago
latentsea - 6 hours ago
steego - 18 hours ago
robwwilliams - 11 hours ago
jstummbillig - 21 hours ago
jacquesm - 10 hours ago
aurareturn - 5 hours ago
jacquesm - 2 hours ago
ben_w - a day ago
program_whiz - 12 hours ago
the_black_hand - 3 hours ago
A_D_E_P_T - a day ago
10xDev - a day ago
kergonath - a day ago
10xDev - 18 hours ago
kergonath - 12 minutes ago
lxgr - 12 hours ago
edgyquant - 9 hours ago
jnd-cz - 19 hours ago
10xDev - 18 hours ago
stanfordkid - 8 hours ago
slashdave - 4 hours ago
anon7000 - 6 hours ago
andy12_ - an hour ago
energy123 - a day ago
staticman2 - 20 hours ago
daxfohl - 18 hours ago
zelphirkalt - a day ago
a-french-anon - 20 hours ago
andy12_ - a day ago
edgyquant - 9 hours ago
nurettin - 10 hours ago
mxkopy - 7 hours ago
charcircuit - 17 hours ago
andy12_ - 17 hours ago
charcircuit - 8 hours ago
jnd-cz - 19 hours ago
jandrewrogers - 16 hours ago
MITSardine - 13 hours ago
jandrewrogers - 11 hours ago
MITSardine - an hour ago
infinite8s - 10 hours ago
daxfohl - 18 hours ago
mirekrusin - an hour ago
vidarh - an hour ago
ljm - 11 hours ago
thinkling - 8 hours ago
mountainriver - 7 hours ago
roromainmain - 10 hours ago
Unearned5161 - a day ago
davidfarrell - a day ago
0x3f - 21 hours ago
A_D_E_P_T - a day ago
hodgehog11 - 11 hours ago
bonesss - 20 hours ago
chpatrick - 14 hours ago
jungturk - 16 hours ago
jimbo808 - 8 hours ago
masteranza - 13 hours ago
kadushka - 11 hours ago
hodgehog11 - 11 hours ago
8bitsrule - 2 hours ago
robrenaud - 18 hours ago
whiplash451 - 20 hours ago
10xDev - a day ago
A_D_E_P_T - a day ago
firecall - a day ago
energy123 - a day ago
LarsDu88 - 17 hours ago
energy123 - 16 hours ago
LarsDu88 - 10 hours ago
ForHackernews - a day ago
energy123 - a day ago
kergonath - 21 hours ago
samrus - 21 hours ago
mrguyorama - 16 hours ago
mrguyorama - 16 hours ago
bsenftner - a day ago
uoaei - 7 hours ago
kypro - 16 hours ago
daxfohl - 10 hours ago
_s_a_m_ - 8 hours ago
ml-anon - 18 hours ago
chriskanan - 13 hours ago
yalok - 10 hours ago
echelon - 6 hours ago
YetAnotherNick - 2 hours ago
Oras - a day ago
andreyk - 17 hours ago
nextos - 4 hours ago
Oras - 16 hours ago
overfeed - 9 hours ago
yellow_lead - 4 hours ago
overfeed - 3 hours ago
magicalist - 16 hours ago
JMiao - 14 hours ago
stein1946 - 20 hours ago
GorbachevyChase - 17 hours ago
torginus - 19 hours ago
lee - 9 hours ago
boccaff - 21 hours ago
oefrha - 21 hours ago
rockinghigh - 9 hours ago
oefrha - 7 hours ago
alecco - 18 hours ago
oefrha - 17 hours ago
LarsDu88 - 17 hours ago
Oras - 16 hours ago
LarsDu88 - 10 hours ago
YetAnotherNick - 19 hours ago
famouswaffles - 16 hours ago
rockinghigh - 9 hours ago
_giorgio_ - a day ago
samrus - 20 hours ago
_giorgio_ - 18 hours ago
az226 - 20 hours ago
vit05 - 20 hours ago
adamors - 20 hours ago
dude250711 - 19 hours ago
vrganj - 11 hours ago
mihaitoth - 19 hours ago
sbinnee - 9 hours ago
ZeroCool2u - a day ago
brandonb - 20 hours ago
sanderjd - 19 hours ago
brandonb - 18 hours ago
sanderjd - 18 hours ago
brandonb - 16 hours ago
smugma - 16 hours ago
brandonb - 14 hours ago
mkeoliya - 15 hours ago
brandonb - 14 hours ago
volkk - 14 hours ago
mandeepj - 14 hours ago
brandonb - 13 hours ago
tomrod - 14 hours ago
Brajeshwar - a day ago
Signez - a day ago
fnands - 21 hours ago
stingraycharles - a day ago
RamblingCTO - 17 hours ago
throwpoaster - 20 hours ago
re-thc - a day ago
Signez - a day ago
ttoinou - 21 hours ago
lotsofpulp - 14 hours ago
ttoinou - 13 hours ago
roromainmain - 21 hours ago
storus - 20 hours ago
Imustaskforhelp - 19 hours ago
sofixa - 13 hours ago
barrell - 19 hours ago
vessenes - 19 hours ago
jsnell - 20 hours ago
vidarh - 17 hours ago
Tenoke - 15 hours ago
ainch - 14 hours ago
gozucito - 15 hours ago
conradkay - 10 hours ago
Insanity - 17 hours ago
visarga - 16 hours ago
awakeasleep - 16 hours ago
nradov - 15 hours ago
scarmig - 13 hours ago
0x3f - 15 hours ago
ainch - 14 hours ago
0x3f - 16 hours ago
visarga - 16 hours ago
0x3f - 16 hours ago
jsnell - 15 hours ago