The Waymo World Model
waymo.com577 points by xnx 6 hours ago
577 points by xnx 6 hours ago
Suddenly all this focus on world models by Deep mind starts to make sense. I've never really thought of Waymo as a robot in the same way as e.g. a Boston Dynamics humanoid, but of course it is a robot of sorts.
Google/Alphabet are so vertically integrated for AI when you think about it. Compare what they're doing - their own power generation , their own silicon, their own data centers, search Gmail YouTube Gemini workspace wallet, billions and billions of Android and Chromebook users, their ads everywhere, their browser everywhere, waymo, probably buy back Boston dynamics soon enough (they're recently partnered together), fusion research, drugs discovery.... and then look at ChatGPT's chatbot or grok's porn. Pales in comparison.
Google has been doing more R&D and internal deployment of AI and less trying to sell it as a product. IMHO that difference in focus makes a huge difference. I used to think their early work on self-driving cars was primarily to support Street View in thier maps.
There was a point in time when basically every well known AI researcher worked at Google. They have been at the forefront of AI research and investing heavily for longer than anybody.
It’s kind of crazy that they have been slow to create real products and competitive large scale models from their research.
But they are in full gear now that there is real competition, and it’ll be cool to see what they release over the next few years.
I also think the presence of Sergey Brin has been making a difference in this.
Please, Google was terrible about using the tech the had long before Sundar, back when Brin was in charge.
Google Reader is a simple example: Googl had by far the most popular RSS reader, and they just threw it away. A single intern could have kept the whole thing running, and Google has literal billions, but they couldn't see the value in it.
I mean, it's not like being able to see what a good portion of America is reading every day could have any value for an AI company, right?
Google has always been terrible about turning tech into (viable, maintained) products.
Is there an equivalent to Godwin's law wrt threads about Google and Google Reader?
See also: any programming thread and Rust.
I never get the moaning about killing Reader. It was never about popularity or user experience.
Reader had to be killed because it [was seen as] a suboptimal ad monetization engine. Page views were superior.
Was Google going to support minimizing ads in any way?
How is this relevant? At best it’s tangentially related and low effort
Took a while but I got to the google reader post. Self host tt-rss, it's much better
Ex-googler: I doubt it, but am curious for rationale (i know there was a round of PR re: him “coming back to help with AI.” but just between you and me, the word on him internally, over years and multiple projects, was having him around caused chaos b/c he was a tourist flitting between teams, just spitting out ideas, but now you have unclear direction and multiple teams hearing the same “you should” and doing it)
That makes sense. A "secret shopper" might be a better way to avoid that but wouldn't give him the strokes of being the god in the room.
Oh ffs, we have an external investor who behaves like that. Literally set us back a year on pet nonsense projects and ideas.
> It’s kind of crazy that they have been slow to create real products and competitive large scale models from their research.
I always thought they deliberately tried to contain the genie in the bottle as long as they could
It has always felt to me that the LLM chatbots were a surprise to Google, not LLMs, or machine learning in general.
Not true at all. I interacted with Meena[1] while I was there, and the publication was almost three years before the release of ChatGPT. It was an unsettling experience, felt very science fiction.
[1]: https://research.google/blog/towards-a-conversational-agent-...
The surprise was not that they existed: There were chatbots in Google way before ChatGPT. What surprised them was the demand, despite all the problems the chatbots have. The pig problem with LLMs was not that they could do nothing, but how to turn them into products that made good money. Even people in openAI were surprised about what happened.
In many ways, turning tech into products that are useful, good, and don't make life hell is a more interesting issue of our times than the core research itself. We probably want to avoid the valuing capturing platform problem, as otherwise we'll end up seeing governments using ham fisted tools to punish winners in ways that aren't helpful either
The uptake forced the bigger companies to act. With image diffusion models too - no corporate lawyer would let a big company release a product that allowed the customer to create any image...but when stable diffusion et al started to grow like they did...there was a specific price of not acting...and it was high enough to change boardroom decisions
Well, I must say ChatGPT felt much more stable than Meena when I first tried it. But, as you said, it was a few years before ChatGPT was publicly announced :)
Google and OpenAI are both taking very big gambles with AI, with an eye towards 2036 not 2026. As are many others, but them in particular.
It'll be interesting to see which pays off and which becomes Quibi
> Suddenly all this focus on world models by Deep mind starts to make sense
Google's been thinking about world models since at least 2018: https://arxiv.org/abs/1803.10122
FWIW I understood GP to mean that it suddenly makes sense to them, not that there’s been a sudden focus shift at google.
>I've never really thought of Waymo as a robot in the same way as e.g. a Boston Dynamics humanoid, but of course it is a robot of sorts.
So for the record, that's 3+ years behind Tesla.Aren't they still using safety drivers or safety follow cars and in fewer cities? Seems Tesla is pretty far behind.
What do you think I said that you're contradicting?
IMO the presence of safety drivers is just a sensible "as low as reasonably achievable" measure during the early rollout. I'm not sure that can be (fairly) used as a point against them.
I'm comfortably with Tesla "sparing no expense" for safety, since I think we all (including Tesla) understand that this isn't the ultimate implementation. In fact, I think it would be a scandal if Tesla failed to do exactly that. Damned if you do and damned if you don't, apparently.
I don't know if Tesla claiming they're doing something carries weight anymore.
Tesla built something like this for FSD training, they presented many years ago. I never understood why they did productize it. It would have made a brilliant Maps alternative, which country automatically update from Tesla cars on the road. Could live update with speed cameras and road conditions. Like many things they've fallen behind
No Lidar anymore on the 2026 Volvo models ES60 and EX60. See for example: https://www.jalopnik.com/2032555/volvo-ends-luminar-lidar-20...
I love Volvo, am considering buying one in a couple weeks actually, but they're doing nothing interesting in terms of ADAS, as far as I can tell. It seems like they're limited to adaptive cruise control and lane keeping, both of which have been solved problems for more than a decade.
It sounds like they removed Lidar due to supplier issues and availability, not because they're trying to build self-driving cars and have determined they don't need it anymore.
Is lane keeping really a solved problem? Just last year one of my brand new rented cars tried to kill me a few times when I tried it again, and so far not even the simple lane leaving detection mechanism worked properly in any of the tried cars when it was raining.
I’d suggest doing some research on software quality. Two years back I was all for buying one (I was considering an EX40), but I got myself into some Facebook groups for owners and was shocked at the dreadful reports of quality of the software and it completely put me off. I got an ID4 instead. Reports about the EX90 have been dreadful. I was very interested, and I still admire their look and build when they drive by - but it killed my enthusiasm to buy one for a few years until they get it right.
Without Lidar + the terrible quality of tesla onboard cameras.. street view would look terrible. The biggest L of elon's career is the weird commitment to no-lidar. If you've ever driven a Tesla, it gives daily messages "the left side camera is blocked" etc.. cameras+weather don't mix either.
At first I gave him the benefit of the doubt, like that weird decision of Steve Jobs banning Adobe Flash, which ran most of the fun parts of the Internet back then, that ended up spreading HTML5. Now I just think he refused LIDAR on purely aesthetic reasons. The cost is not even that significant compared to the overall cost of a Tesla.
That one was motivated by the need of controlling the app distribution channel, just like they keep the web as a second class citizen in their ecosystem nowadays.
he didn't refuse it. MobileEye or whoever cut Tesla off because they were using the lidar sensors in a way he didn't approve. From there he got mad and said "no more lidar!"
False. Mobileye never used lidar. Lmao where do you all come up with this
I think Elon announced Tesla was ditching LIDAR in 2019.[0] This was before Mobileye offered LIDAR. Mobileye has used LIDAR from Luminar Technologies around 2022-2025. [1][2] They were developing their own lidar, but cancelled it. [3] They chose Innoviz Technologies as their LIDAR partner going forward for future product lines. [4]
0: https://techcrunch.com/2019/04/22/anyone-relying-on-lidar-is...
1: https://static.mobileye.com/website/corporate/media/radar-li...
2: https://www.luminartech.com/updates/luminar-accelerates-comm...
3: https://www.youtube.com/watch?v=Vvg9heQObyQ&t=48s
4: https://ir.innoviz.tech/news-events/press-releases/detail/13...
The original Mobileye EyeQ3 devices that Tesla began installing in their cars in 2013 had only a single forward facing camera. They were very simple devices, only intended to be used for lane keeping. Tesla hacked the devices and pushed them beyond their safe design constraints.
Then that guy got decapitated when his Model S drove under a semi-truck that was crossing the highway and Mobileye terminated the contract. Weirdly, the same fatal edge case occurred 2 more times at least on Tesla's newer hardware.
https://en.wikipedia.org/wiki/List_of_Tesla_Autopilot_crashe...
https://www.mobileye.com/news/mobileye-to-end-internal-lidar...
Um, yes they did.
No idea if it had any relation to Tesla though.
His stated reason was that he wanted the team focused on the driving problem, not sensor fusion "now you have two problems" problems. People assumed cost was the real reason, but it seems unfair to blame him for what people assumed. Don't get me wrong, I don't like him either, but that's not due to his autonomous driving leadership decisions, it's because of shitting up twitter, shitting up US elections with handouts, shitting up the US government with DOGE, seeking Epstein's "wildest party," DARVO every day, and so much more.
Sensor fusion is an issue, one that is solvable over time and investment in the driving model, but sensor-can't-see-anything is a show stopper.
Having a self-driving solution that can be totally turned off with a speck of mud, heavy rain, morning dew, bright sunlight at dawn and dusk.. you can't engineer your way out of sensor-blindness.
I don't want a solution that is available to use 98% of the time, I want a solution that is always-available and can't be blinded by a bad lighting condition.
I think he did it because his solution always used the crutch of "FSD Not Available, Right hand Camera is Blocked" messaging and "Driver Supervision" as the backstop to any failure anywhere in the stack. Waymo had no choice but to solve the expensive problem of "Always Available and Safe" and work backwards on price.
LIDAR is notoriously easy to blind, what are you on about? Bonus meme: LIDAR blinds you(r iPhone camera)!
Yeah its absurd. As a Tesla driver, I have to say the autopilot model really does feel like what someone who's never driven a car before thinks it's like.
Using vision only is so ignorant of what driving is all about: sound, vibration, vision, heat, cold...these are all clues on road condition. If the car isn't feeling all these things as part of the model, you're handicapping it. In a brilliant way Lidar is the missing piece of information a car needs without relying on multiple sensors, it's probably superior to what a human can do, where as vision only is clearly inferior.
The inputs to FSD are:
7 cameras x 36fps x 5Mpx x 30s
48kHz audio
Nav maps and route for next few miles
100Hz kinematics (speed, IMU, odometry, etc)
Source: https://youtu.be/LFh9GAzHg1c?t=571So if they’re already “fusioning” all these things, why would LIDAR be any different?
Tesla went nothing-but-nets (making fusion easy) and Chinese LIDAR became cheap around 2023, but monocular depth estimation was spectacularly good by 2021. By the time unit cost and integration effort came down, LIDAR had very little to offer a vision stack that no longer struggled to perceive the 3D world around it.
Also, integration effort went down but it never disappeared. Meanwhile, opportunity cost skyrocketed when vision started working. Which layers would you carve resources away from to make room? How far back would you be willing to send the training + validation schedule to accommodate the change? If you saw your vision-only stack take off and blow past human performance on the march of 9s, would you land the plane just because red paint became available and you wanted to paint it red?
I wouldn't completely discount ego either, but IMO there's more ego in the "LIDAR is necessary" case than the "LIDAR isn't necessary" at this point. FWIW, I used to be an outspoken LIDAR-head before 2021 when monocular depth estimation became a solved problem. It was funny watching everyone around me convert in the opposite direction at around the same time, probably driven by politics. I get it, I hate Elon's politics too, I just try very hard to keep his shitty behavior from influencing my opinions on machine learning.
> but monocular depth estimation was spectacularly good by 2021
It's still rather weak and true monocular depth estimation really wasn't spectacularly anything in 2021. It's fundamentally ill posed and any priors you use to get around that will come to bite you in the long tail of things some driver will encounter on the road.
The way it got good is by using camera overlap in space and over time while in motion to figure out metric depth over the entire image. Which is, humorously enough, sensor fusion.
It was spectacularly good before 2021, 2021 is just when I noticed that it had become spectacularly good. 7.5 billion miles later, this appears to have been the correct call.
depth estimation is but one part of the problem— atmospheric and other conditions which blind optical visible spectrum sensors, lack of ambient (sunlight) and more. lidar simply outperforms (performs at all?) in these conditions. and provides hardware back distance maps, not software calculated estimation
Lidar fails worse than cameras in nearly all those conditions. There are plenty of videos of Tesla's vision-only approach seeing obstacles far before a human possibly could in all those conditions on real customer cars. Many are on the old hardware with far worse cameras
Always thought the case was for sensor redundancy and data variety - the stuff that throws off monocular depth estimation might not throw off a lidar or radar.
Monocular depth estimation can be fooled by adversarial images, or just scenes outside of its distribution. It's a validation nightmare and a joke for high reliability.
It isn't monocular though. A Tesla has 2 front-facing cameras, narrow and wide-angle. Beyond that, it is only neural nets at this point, so depth estimation isn't directly used; it is likely part of the neural net, but only the useful distilled elements.
Better than I expected. So this was 3 days ago, is this for all previously models or is there a cut off date here?
Fog, heavy rain, heavy snow, people running between cars or from an obstructed view…
None of these technologies can ever be 100%, so we’re basically accepting a level of needless death.
Musk has even shrugged off FSD related deaths as, “progress”.
Humans: 70 deaths in 7 billion miles
FSD: 2 deaths in 7 billion miles
Looks like FSD saves lives by a margin so fat it can probably survive most statistical games.
Isn't there a great deal of gaming going on with the car disengaging FSD milliseconds before crashing? Voila, no "full" "self" driving accident; just another human failing [*]!
[*] Failing to solve the impossible situation FSD dropped them into, that is.
Nope. NHTSA's criteria for reporting is active-within-30-seconds.
https://www.nhtsa.gov/laws-regulations/standing-general-orde...
If there's gamesmanship going on, I'd expect the antifan site linked below to have different numbers, but it agrees with the 2 deaths figure for FSD.
Is that the official Tesla stat? I've heard of way more Tesla fatalities than that..
This is absolutely a Musk defender. FSD and Tesla related deaths are much higher.