John Carmack talk at Upper Bound 2025
twitter.com398 points by tosh 13 hours ago
398 points by tosh 13 hours ago
It's always a treat to watch a Carmack lecture or read anything he writes, and his notes here are no exception. He writes as an engineer, for engineers and documents all his thought processes and misteps in the exact detailed yet concise way you'd want a colleague to who was handing off some work.
One question I would have about the research direction is the emphasis on realtime. If I understand correctly he's doing online learning in realtime. Obviously makes for a cool demo and pulls on his optimisation background, and no doubt some great innovations will be required to make this work. But I guess the bitter lesson and recent history also tell us that some solutions may only emerge at compute levels beyond what is currently possible for realtime inference let alone learning. And the only example we have of entities solving Atari games is the human brain, of which we don't have a clear understanding of the compute capacity. In which case, why wouldn't it be better to focus purely on learning efficiency and relax the realtime requirement for now?
That's a genuine question by the way, definitely not an expert here and I'm sure there's a bunch of value to working within these constraints. I mean, jumping spiders solve reasonably complex problems with 100k neurons, so who knows.
I'm sure there were offline rendering and 3D graphics workstation people saying the same about the comparatively crude work he was doing in the early 90s...
Obviously both Carmack and the rest of the world has changed since then, but it seems to me his main strength has always been in doing more with less (early id/Oculus, AA). When he's working in bigger orgs and/or with more established tech his output seems to suffer, at least in my view (possibly in his as well since he quit both Bethesda-id and Meta).
I don't know Carmack and can't claim to be anywhere close to his level, but as someone also mainly interested in realtime stuff I can imagine he also feels a slight disdain for the throw-more-compute-at-it approach of the current AI boom. I'm certainly glad he's not running around asking for investor money to train an LLM.
Best case scenario he teams up with some people who complement his skillset (akin to the game designers and artists at id back in the day) and comes up with a way to help bring some of the cutting edge to the masses, like with 3D graphics.
The thing about Carmack in the 90s... There was a lot of research going on around 3d graphics. Companies like SGI and Pixar were building specialized workstations for doing vector operations for 3d rendering. 3d was a thing. Game consoles with specialized 3d hardware would launch in 1994 with the Sega Saturn and the Sony Playstation (in Japan only for one year)
What Carmack did was basically get a 3d game running on existing COMMODITY hardware. The 386 chip that most people used for their excel spreadsheets did not do floating point operations well, so Carmack figured out how to do everything using integers.
May 1992 -> Wolfenstein 3d releases December 1993 -> Doom releases December 1994 -> Sony Playstation launches in Japan June 1996 -> Quake releases
So Wolfenstein and Doom were actually not really 3d games, but rather 2.5 games (you can't have rooms below other rooms). The first 3d game here is actually Quake which also eventually also got hardware acceleration support.
Carmack was the master of doing the seeminly impossible on super constrained hardware on virtually impossible timelines. If DOOM released in 1994 or 1995, would we still remember it in the same way?
> If DOOM released in 1994 or 1995, would we still remember it in the same way?
Maybe. One aspect of Wolfenstein and Doom's popularity is that it was years ahead of everyone else technically on PC hardware. The other aspect is that they were genre defining titles that set the standards for gameplay design. I think Doom Deathmatch would have caught on in 1995, as there really were very few (just Command and Conquer?) standout PC network multiplayer games released between 1993 and 1995.
I guess the thing about rapid change is... it's hard to imagine what kind of games would exist in a DOOMless world in an alternate 1995.
The first 3d console games started to come out that year, like Rayman. Star Wars Dark Forces with its own custom 3d engine also came out. Of course Dark Forces was, however, an overt clone of DOOM.
It's a bit ironic, but I think the gameplay innovation of DOOM tends to hold up more than the actual technical innovation. Things like BSP for level partitioning have slowly been phased out of game engines, we have ample floating point compute power and hardware acceleration ow, but even developers of the more recent DOOM games have started to realize that they should return to the original formula of "blast zombies in the face at high speed, and keep plot as window dressing"
If DOOM released in 1994 or 1995, would we still remember it in the same way?
I think so, because the thing about DOOM is, it was an insanely good game. Yes, it pioneered fullscreen real-time perspective rendering on commodity hardware, instantly realigning the direction of much of the game industry, yadda yadda yadda, but at the end of the day it was a good-enough game for people to remember and respect even without considering the tech.
Minecraft would be a similar example. Minecraft looked like total ass, and games with similar rendering technology could have been (and were) made years earlier, but Minecraft was also good. And that was enough.
From the notes:
"A reality check for people that think full embodied AGI is right around the corner is to ask your dancing humanoid robot to pick up a joystick and learn how to play an obscure video game."
We don't really need AGI. We need better specialized AIs. Throw in a few specialized AIs and they will leave some impact in the society. That might not be that far away.
Saying we don't "need" AGI is like saying we don't need electricity. Sure life existed before we had that capability, but it would be very transformative. Of course we can make specialized tools in the mean time.
The error in this argument is that electricity is real.
Indeed, and I'd go even further. At least electricity is defined, which helps greatly in determining its existence. Neither unicorns nor AGI currently exist but at least unicorns are well enough defined to determine whether an equine animal is or isn't one.
Can you give an example how it would be transformative compared to specialized AI?
AGI is transformative in that it lets us replace knowledge workers completely, specialized AI requires knowledge workers to train them for new tasks while AGI doesn't.
Because it could very well exceed our capabilities beyond our wildest imaginations.
Because we evolved to get where we are, humans have all sorts of messy behaviours that aren't really compatible with a utopian society. Theft, violence, crime, greed - it's all completely unnecessary and yet most of us can't bring ourselves to solve these problems. And plenty are happy to live apathetically while billionaires become trillionaires...for what exactly? There's a whole industry of hyper-luxury goods now, because they make so much money even regular luxury is too cheap.
If we can produce AGI that exceeds the capabilities of our species, then my hope is that rather than the typical outcome of "they kill us all", that they will simply keep us in line. They will babysit us. They will force us all to get along, to ensure that we treat each other fairly.
As a parent teaches children to share by forcing them to break the cookie in half, perhaps AI will do the same for us.
Oh great, can't wait for our AI overlords to control us more! That's definitely compatible with a "utopian society"*.
Funnily enough, I still think some of the most interesting semi-recent writing on utopia was done ~15 years ago by... Eliezer Yudkowsky. You might be interested in the article on "Amputation of Destiny."
Link: https://www.lesswrong.com/posts/K4aGvLnHvYgX9pZHS/the-fun-th...
Why not just hire like 100 of the smartest people across domains and give them SOTA AI, to keep the AI as accurate as possible?
Each of those 100 can hire teams or colleagues to make their domain better, so there’s always human expertise keeping the model updated.
"just"
They’re spending 10s of billions. Yes, just.
200 million to have dedicated top experts on hand is reasonable.
Specialized AIs have been making an impact on society since at least the 1960s. AI has long suffered from every time they come up with something new it gets renamed and becomes important (where it makes sense) without giving AI credit.
From what I can tell most in AI are currently hoping LLMs reach that point quick just because the hype is not helping AI at all.
Yesterday my dad, in his late 70's, used Gemini with a video stream to program the thermostat. He then called me to tell me this, rather then call me to come stop by and program the thermostat.
You can call this hype, maybe it is all hype until LLMs can work on 10M LOC codebases, but recognize that LLMs are a shift that is totally incomparable to any previous AI advancement.
Yeah. As a mediocre programmer I'm really scared about this. I don't think we are very far from AI replacing the mediocre programmers. Maybe a decade, at most.
I'd definitely like to improve my skills, but to be realistic, most of the programmers are not top-notch.
That is what open ai’s non-profit economic research arm has claimed. LLMs will fundamentally change how we interact with the world like the Internet did. It will take time like the Internet and a couple of hype cycle pops but it will change the way we do things.
It will help a single human do more in a white collar world.
> He then called me to tell me this, rather then call me to come stop by and program the thermostat.
Sounds like AI robbed you of an opportunity to spend some time with your Dad, to me
Or maybe instead of spending time with your dad on a bs menial task, you could spent time fishing with him…
It's nice to think that but life and relationships are also composed of the little moments, which sometimes happen when someone asks you over to help with a "bs menial task"
It takes five minutes to program the thermostat, then you can have a beer on the patio if that's your speed and catch up for a bit
Life is little moments, not always the big commitments like taking a day to go fishing
That's the point of automating all of ourselves out of work, right? So we have more time to enjoy spending time with the people we love?
So isn't it kind of sad if we wind up automating those moments out of our lives instead?
There are clearly a lot of useful things about LLMs. However there is a lot of hype as well. It will take time to separate the two.
Bitter lesson applies here as well though. Generalized models will beat specialized models given enough time and compute. How much bespoke NLP is there anymore? Generalized foundational models will subsume all of it eventually.
You misunderstand the bitter lesson.
It's not about specialized vs generalized models - it's about how models are trained. The chess engine that beat Kasparov is a specialized model (it only plays chess), yet it's the bitter lesson's example for the smarter way to do AI.
Chess engines are better at chess than LLMs. It's not close. Perhaps eventually a superintelligence will surpass the engines, but that's far from assured.
Specialized AI are hardly obsolete and may never be. This hypothetical superintelligence may even decide not to waste resources trying to surpass the chess AI and instead use it as a tool.
Yeah I agree with it. There is a lot of hype, but there is some potentials there.
Yeah “AI” tools (such a loose term but largely applicable) have been involved in audio production for a very long time. They have actually made huge strides with noise removal/voice isolation, auto transcription/captioning, and “enhancement” in the last five years in particular.
I hate Adobe, I don’t like to give them credit for anything. But their audio enhance tool is actual sorcery. Every competitor isn’t even close. You can take garbage zoom audio and make it sound like it was borderline recorded in a treated room/studio. I’ve been in production for almost 15 years and it would take me half a day or more of tweaking a voice track with multiple tools that cost me hundreds of dollars to get it 50% as good as what they accomplish in a minute with the click of a button.
What if AGI is just a bunch of specialized AIs put together?
It would seem our own generalized intelligence is an emergent property of many, _many_ specialized processes
I wonder if AI is the same
> It would seem our own generalized intelligence is an emergent property of many, _many_ specialized processes
You can say that about other animals, but about humans it is not so sure. No animal can be taught as general set of skills as a human can, they might have some better specialized skills but clearly there is something special that makes humans so much more versatile.
So it seems there was this simple little thing humans got that makes them general, while for example our very close relatives the monkeys are not.
Humans are the ceiling at the moment yes, but that doesn't mean the ceiling isn't higher.
Science is full of theories that are correct per our current knowledge and then subsequently disproven when research/methods/etc improves.
Humans aren't special, we are made from blood & bone, not magic. We will eventually build AGI if we keep at it. However unlike VCs with no real skills except having a lot of money™, I couldn't say whether this is gonna happen in 2 years or 2000.
Question was if cobbling together enough special intelligence creates general intelligence. Monkeys has a lot of special intelligence that our current AI models can't come close to, but still aren't seen as general intelligence like humans, so there is some little bit humans has that isn't just another special intelligence.
It may be a property of (not only of?) humans that we can generate specialized inner processes. The hardcoded ones stay, the emergent ones come and go. Intelligence itself might be the ability to breed new specialized mental processes on demand.
This debate is exhausting because there's no coherent definition of AGI that people agree on.
I made a google form question for collecting AGI definitions cause I don't see anyone else doing it and I find it infinitely frustrating the range of definitions for this concept:
https://docs.google.com/forms/d/e/1FAIpQLScDF5_CMSjHZDDexHkc...
My concern is that people never get focused enough to care to define it - seems like the most likely case.
Are you looking for the definition of AGI in random hacker news comments from non experts?
The Wikipedia article on AGI explains it well enough.
Google has proposed an AGI classification scheme with multiple levels of AGI. There are different opinions in the research community.
It doesn't really seem like there's much utility in defining it. It's like defining "heaven."
It's an ideal that some people believe in, and we're perpetually marching towards it
No, it’s never going to be precise but it’s important to have a good rough definition.
Can we just use Morris et al and move on with our lives?
Position: Levels of AGI for Operationalizing Progress on the Path to AGI: https://arxiv.org/html/2311.02462v4
There are generational policy and societal shifts that need to be addressed somewhere around true Competent AGI (50% of knowledge work tasks automatable). Just like climate change, we need a shared lexicon to refer to this continuum. You can argue for different values of X but the crucial point is if X% of knowledge work is automated within a decade, then there are obvious risks we need to think about.
So much of the discourse is stuck at “we will never get to X=99” when we could agree to disagree on that and move on to considering the x=25 case. Or predict our timelines for X and then actually be held accountable for our falsifiable predictions, instead of the current vide based discussions.
It is a marketing term. That's it. Trying to exhaustively define what AGI is or could be is like trying to explain what a Happy Meal is. At it's core, the Happy Meal was not invented to revolutionize food eating. It puts an attractive label on some mediocre food, a title that exists for the purpose of advertisement.
There is no point collecting definitions for AGI, it was not conceived as a description for something novel or provably existent. It is "Happy Meal marketing" but aimed for adults.
The name AGI (i.e. generalist AI) was originally intended to contrast with narrow AI which is only capable of one, or a few, specific narrow skills. A narrow AI might be able to play chess, or distinguish 20 breeds of dog, but wouldn't be able to play tic tac toe because it wasn't built for that. AGI would be able to learn to do anything, within reason.
The term AGI is obviously used very loosely with little agreement to it's precise definition, but I think a lot of people take it to mean not only generality, but specifically human-level generality, and human-level ability to learn from experience and solve problems.
A large part of the problem with AGI being poorly defined is that intelligence itself is poorly defined. Even if we choose to define AGI as meaning human-level intelligence, what does THAT mean? I think there is a simple reductionist definition of intelligence (as the word is used to refer to human/animal intelligence), but ultimately the meaning of words are derived from their usage, and the word "intelligence" is used in 100 different ways ...
That’s historically inaccurate
My masters thesis advisor Ben Goertzel popularized the term and has been hosting the AGI conference since 2008:
https://goertzel.org/agiri06/%5B1%5D%20Introduction_Nov15_PW...
I had lunch with Yoshua Bengio at AGI 2014 and it was most of the conversation that day
Is this supposed to be a gotcha? We know these systems are typically trained using RL and they are exceedingly good at learning games...
No it is not a “gotcha” and I don’t understand how you got that impression.
Carmack believes AGI systems should be able to learn new tasks in realtime alongside humans in the real world.
This sounds like a problem that could be solved around the corner with a caveat.
Games generally are solvable for AI because they have feedback loops and a clear success or failure criteria. If the "picking up a Joystick" part is the limiting factor, sure. But why would we want robots to use an interface (especially a modern controller) heavily optimized for human hands; that seems like the definition of a horseless carriage.
I'm sure if you compared a monkey and a dolphins performance using a joystick you'd get results that aren't really correlated with their intelligence. I would guess that if you gave robots an R2D2 like port to jack into and play a game, that problem could be solved relatively quickly.
Just like OpenAI early on promised us an AGI and showed us how it "solved" Dota 2.
They also claimed it "learned" to play by playing itself only however it was clear that most of the advanced techniques were borrowed from existing AI and by observing humans.
No surprise they gave up on that project completely and I doubt they'll ever engage in anything like that again.
Money better spent on different marketing platforms.
It also wasn't even remotely close to learning Dota 2 proper. They ran a massively simplified version of the game where the AI and humans alternated between playing one of two pre-defined team compositions, meaning >90% of the games characters and >99.999999% of the possible compositions and matchups weren't even on the table, plus other standard mechanics were also changed or disabled altogether for the sake of the AI team.
Saying you've solved Dota after stripping out nearly all of its complexity is like saying you've solved Chess, but on a version where the back row is all Bishops.
Exactly. What I find surprising in this story though is not the OpenAI. It's investors not seeing through these blatant.. lets call them exaggerations of the reality and still trusting the company with their money. I know I wouldn't have. But then again, maybe that's why I'm poor.
In their hearts, startup investors are like Agent Mulder: they Want To Believe. Especially after they’ve already invested a little. They are willing to overlook obvious exaggerations up to and including fraud, because the alternative is admitting their judgment is not sound.
Look at how long Theranos went on! Miraculous product. Attractive young founder with all the right pedigree, credentials, and contacts, dressed in black trurtlenecks. Hell, she even talked like Steve Jobs! Investors never had a chance.
They already have 400 million daily users and a billion people using the product, with billions of consumer subscription revenue, faster than any company ever. They are also aggregating R&D talent at a density never before seen in Silicon Valley
That is what investors see. You seem to treat this as a purity contest where you define purity
I agree that restricting the hero pool is a huge simplification. But they did play full 5v5 standard dota with just a restricted hero pool of 17 heroes and no illusions/control units according to theverge (https://www.theverge.com/2019/4/13/18309459/openai-five-dota...). It destroyed the professionals.
As an ex dota player, I don't think this is that far off from having full on, all heroes dota. Certainly not as far of as you are making it sound.
And dota is one of the most complex games, I expect for example that an AI would instantly solve CS since aim is such a large part of the game.
> It destroyed the professionals.
Only the first time, later when it played better players it always lost. Players learned the faults of the AI after some time in game and the AI had very bad late game so they always won later.
Another issue with the approach is that the model had direct access to game data, that is simply an unfair competitive advantage in dota, and it is obvious why that advantage would be unfair in CS.
It is certainly possible, but i won't be impressed by anything "playing CS" that isn't running a vision model on a display and moving a mouse, because that is the game. The game is not abstractly reacting to enemy positions and relocating the cursor, it's looking at a screen, seeing where the baddy is and then using this interface (the mouse) to get the cursor there as quickly as possible.
It would be like letting an AI plot its position on the field and what action its taking during a football match and then saying "Look, The AI would have scored dozens of times in this simulation, it is the greatest soccer player in the world!" No, sorry, the game actually requires you to locomote, abstractly describing your position may be fun but it's not the game
Did you read the paper? It had access to the dota 2 bot API, which is some gamestate but very far from all gamestate. It also had artifially limited reaction to something like 220ms, worse then professional gamers.
But then again, that is precisely the point. A chess bot also has access to gigabytes of perfect working memory. I don't see people complaining about that. It's perfectly valid to judge the best an AI can do vs the best a human can do. It's not really fair to take away exactly what a computer is good at from an AI and then say: "Look but the AI is now worse". Else you would also have to do it the other way around. How well could a human play dota if it only had access to the bot API. I don't think they would do well at all.
It was 6 years ago. I'm sure now there'd be no contest now if OpenAI dedicated resources to it, which it won't because it's busy with solving entirety of human language before others eat their lunch.
Funnily enough, even dota2 has grown much more complex than it was 6 years ago, so it's a harder problem to solve today than it was back then
What do you base your certainty on? Were there any significant enough breakthroughs in the AGI?
ARC-AGI, while imagined as super hard for AI, was beaten enough that they had to come up with ARC-AGI-2.
"AI tend to be brittle and optimized for specific tasks, so we made a new specific task and then someone optimized for it" isn't some kind of gotcha. Once ARC puzzles became a benchmark they ceased to be meaningful WRT "AGI".