Project Genie: Experimenting with infinite, interactive worlds

blog.google

531 points by meetpateltech 13 hours ago


jlhawn - 9 hours ago

Now I can't stop thinking about _The Experience Machine_ by Andy Clark. It theorizes that this is how humans navigate and experience the real world: Our brains generate what we think the world around is like and our senses don't so much directly process visual information but instead act like a kind of loss function for our internal simulations. Then we use that error to update our internal model of the world.

In this view, we are essentially living inside a high-fidelity generative model. Our brains are constantly 'hallucinating' a predicted reality based on past experience and current goals. The data from our senses isn't the source of the image; it's the error signal used to calibrate that internal model. Much like Genie 3 uses latent actions and frames to predict the next state of a world, our brains use 'Active Inference' to minimize the gap between what we expect and what we experience.

It suggests that our sense of 'reality' isn't a direct recording of the world, but a highly optimized, interactive simulation that is continuously 'regularized' by the photons hitting our retinas.

in-silico - 11 hours ago

Everyone here seems too caught up in the idea that Genie is the product, and that its purpose is to be a video game, movie, or VR environment.

That is not the goal.

The purpose of world models like Genie is to be the "imagination" of next-generation AI and robotics systems: a way for them to simulate the outcomes of potential actions in order to inform decisions.

ollin - 12 hours ago

Really great to see this released! Some interesting videos from early-access users:

- https://youtu.be/15KtGNgpVnE?si=rgQ0PSRniRGcvN31&t=197 walking through various cities

- https://x.com/fofrAI/status/2016936855607136506 helicopter / flight sim

- https://x.com/venturetwins/status/2016919922727850333 space station, https://x.com/venturetwins/status/2016920340602278368 Dunkin' Donuts

- https://youtu.be/lALGud1Ynhc?si=10ERYyMFHiwL8rQ7&t=207 simulating a laptop computer, moving the mouse

- https://x.com/emollick/status/2016919989865840906 otter airline pilot with a duck on its head walking through a Rothko inspired airport

WarmWash - 12 hours ago

The actual breakthrough with Genie is being able to turn around and look back, and seeing the same scene that was there before. A few other labs have similar world simulators, but they all struggle badly with keeping coherence of things not in view. Hence why they always walk forwards and never look around.

krunck - 12 hours ago

The more of this I see the more I want to spend time away from screens and doing those things I love to do in the real world.

sy26 - 12 hours ago

I have been confused for a long time why FB is not motivated enough to invest in world models, it IS the key to unblock their "metaverse" vision. And instead they let go Yann LeCun.

montebicyclelo - 13 hours ago

Reminds me of this [1] HN post from 9 months ago, where the author trained a neural network to do world emulation from video recordings of their local park — you can walk around in their interactive demo [2].

I don't have access to the DeepMind demo, but from the video it looks like it takes the idea up a notch.

(I don't know the exact lineage of these ideas, but a general observation is that it's a shame that it's the norm for blog posts / indie demos to not get cited.)

[1] https://news.ycombinator.com/item?id=43798757

[2] https://madebyoll.in/posts/world_emulation_via_dnn/demo/

phailhaus - 12 hours ago

I have no idea why Google is wasting their time with this. Trying to hallucinate an entire world is a dead-end. There will never be enough predictability in the output for it to be cohesive in any meaningful way, by design. Why are they not training models to help write games instead? You wouldn't have to worry about permanence and consistency at all, since they would be enforced by the code, like all games today.

Look at how much prompting it takes to vibe code a prototype. And they want us to think we'll be able to prompt a whole world?

0xcb0 - 13 hours ago

I keep on repeating myself, but it feels like I'm living in the future. Can't wait to hook this up to my old Oculus glasses and let Genie create a fully realistic sailing simulator for me, where I can train sailing with realistic conditions. On boats I'd love to sail.

If making games out of these simulations work, it't be the end for a lot of big studios, and might be the renaissance for small to one person game studios.

consumer451 - 6 hours ago

Related:

> Diego Rivas, Shlomi Fruchter, and Jack Parker-Holder from the Project Genie team join host Logan Kilpatrick for an in-depth discussion on Google DeepMind’s latest breakthrough in world models. Project Genie is an experimental research prototype that allows users to generate, explore, and interact with infinitely diverse, photorealistic worlds in real-time. Learn more about the shift from passive video generation to interactive media, the technical challenges of maintaining world consistency and memory, and how these models serve as an essential training ground for AI agents.

https://www.youtube.com/watch?v=Ow0W3WlJxRY

ofrzeta - 12 hours ago

I don't know ... it's impressive and all but the result always looks kind of dead.

jacquesm - 7 hours ago

Isn't that more or less the theme of the movie 'the Thirteenth floor?'

https://www.youtube.com/watch?v=Cbjhr-H2nxQ

pedalpete - 9 hours ago

This is what we were building in 2018 with Ayvri, starting from 3d tiles with the aim of building a real-world view by using AI to essentailly re-paint and add detail to what was essentially a high-resolution and faster loading Google Earth (for outside cities, we didn't have building data).

We saw a very diverse group of users, the common uses was paragliders, gliders, and pilots who wanted to view their or other peoples flights. Ultramarathons, mountain bike and some road-races where it provided an interactive way to visualize the course from any angle and distance. Transportation infrastructure to display train routes to be built. The list goes on.

meetpateltech - 13 hours ago

Google Deepmind Page: https://deepmind.google/models/genie/

Try it in Google Labs: https://labs.google/projectgenie

(Project Genie is available to Google AI Ultra subscribers in the US 18+.)

artisin - 9 hours ago

Best case, Google DeepMind cracks AGI by letting agents learn for themselves inside simulated worlds. Worst case, they've invented the greatest, most expensive screensaver generator in human history.

nickandbro - 13 hours ago

This could be the future of film. Instead of prompting where you don't know what the model will produce, you could use fine-grained motion controls to get the shot you are looking for. If you want to adjust the shot after, you could just checkpoint the model there, by taking a screenshot, and rerun. Crazy.

reneberlin - 3 hours ago

The subscribers to simulations from the pr0n-industry and the billions of lonely humanoids will suffocate in their VR-headsets, if we don't think about sensors to watch their oxygen-levels.

speak_on - 10 hours ago

Compared to DeepMind's Genie 3 demo, this appears to have more morphing issues and less user interactivity with environmental consistency. Is this a stripped down version?

mosquitobiten - 13 hours ago

Every character goes forward only, permanence is still out of reach apparently.

ge96 - 12 hours ago

Damn that was crazy the picture of the tabletop setup/cardboard robot and it becomes 3D interactive.

lurker616 - 2 hours ago

Are there any open source projects like this?

0x1ceb00da - 6 hours ago

Someone please create a world with this: https://giphy.com/gifs/6pUjuQQX9kEfSe604w

bpiche - 10 hours ago

This is the plot of The Peripheral, right? Love the way the second half of that book turned out. Never finished Agency..

Havoc - 8 hours ago

Are world models from the perspective of an observer in the world or zoomed out?

Or in gaming terms do these models think FPS or RTS?

Text models and pixel grid vision models is easy but struggling to wrap my head around what world model "sees" so to speak.

- 7 hours ago
[deleted]
binsquare - 10 hours ago

It's ability to simulate physics intact is actually a huge breakthrough.

I can't even fathom what it would be like for the future of simulation and physical world when it gets far more accurate and realistic.

RivieraKid - 12 hours ago

This would be really cool if polished and integrated with VR.

forrestthewoods - 2 hours ago

SPIN THE CAMERA 1080 DEGREES YOU COWARDS

The only test I ever want to see with these frame-gen models is a full 1080 degree camera spin. Miss me with that 30 degree back and forth crap. I want multiple full turns. Some jitter and a little back-and-forth wobble is fine. But I want multiple full spins.

I’m pretty sure I know why they don’t do this :)

hn_user_9876 - 7 hours ago

This is a very interesting development. The implications for interactive world-building are quite significant.

bigblind - 9 hours ago

Anyone else going to try it and just keep getting a 404 page?

dangoodmanUT - 6 hours ago

this will go crazy for kids - being able to run around as a doll or action figure in their room

srameshc - 12 hours ago

What’s the endgame here? For a small gaming studio, what are the actual implications?

user_hn_827 - 8 hours ago

This is a fascinating project. The idea of infinite interactive worlds is a huge leap for gaming and simulation.

artur_makly - 8 hours ago

let's reboot Leisure Suit Larry ;-)

dominick-cc - 9 hours ago

Finally all my anime figurines will come to life

sebasv_ - 8 hours ago

I am stumped. Am I misreading, or are the folks at Google deliberately confounding two interpretations of "world model"? Dont get me wrong, this is really cool, and it will undoubtedly have its use. But what I am seeing is an LLM that can generate textures to be fed into a human-coded 3d engine (the "world model" that is demonstrated), and I fail to see how that brings us closer to AGI. For AGI we need "world models" as in "belief systems". The AI model must be able to reason about (learned) dynamics, which I dont see reflected in the text or video.

spullara - 7 hours ago

has the person who designed the movement control ever played a video game?

lysace - 5 hours ago

Prediction: In true Google fashion they will never spend enough time on this really cool tech demo to make it really useful in any way.

In 6-12 months they will announce another really cool tech demo. And so on.

They have been doing this for decades. To us this seems like the starting point of something really cool. To them it's a delivery; finally time to move on to something else.

adventured - 11 hours ago

This is as good of a place to mark it as any.

Humanity goes into the box and it never comes back out. It's better in there than it is out there for 99% of the population.

kittikitti - 6 hours ago

This is really great and to me, it feels like another ChatGPT moment. Thank you Google! This product can easily leap them over the competition. I had originally dismissed Yan Lecun's take on world models and I now feel foolish.

cloudflare728 - 12 hours ago

We will probably see Ready Player One in a few decades. Hoping to stay alive till then.

throwaway314155 - 7 hours ago

The "How we're building responsibly" section has nothing to do with acting responsibly. It should be called "Limitations" instead. Section reads LLM generated honestly.

almosthere - 8 hours ago

So what is it doing in the real world, microwaving an elephant on high with 80kw every second and pouring out all the water in an sub-saharan African well every 4 minutes?

anxtyinmgmt - 12 hours ago

Demis stays cooking

moohaad - 12 hours ago

everyone will make his own game now

cyrusradfar - 7 hours ago

Now let's cross this with the game of life with a lot more processing and see what happens.

gambiting - 10 hours ago

>>How we’re building responsibly

How are you justifying the enormous energy cost this toy is using, exactly?

I don't find anything "responsible" about this. And it doesn't even seem like something that has any actual use - it's literally just a toy.

JaiRathore - 12 hours ago

I now believe we live in a simulation

mupuff1234 - 10 hours ago

If only Google had the technology for game streaming... Oh wait

RIP Stadia.

uyribackgy - 6 hours ago

[dead]

TacoCommander - 10 hours ago

[dead]

rationalfaith - 11 hours ago

[dead]

seanmozeik - 8 hours ago

[dead]

analog8374 - 11 hours ago

If creating an infinite world is so trivially easy (relatively speaking) then occam suggests that this world is generated.