Are we close to the holodeck? Google unveils Genie — an AI model creating playable virtual worlds from a single image

(Image credit: Google Genie)

Google researchers have published a new artificial intelligence model that can take a text prompt, sketch or idea and turn it into a virtual world you can interact with and play.

Named Genie, the virtual world model was trained on gameplay and other videos found online and is currently only a research preview. The games are more 2D platformer than full VR.

While this might still be some way off from a true holodeck like the ones in Star Trek, it does give an indication that it could be possible to one day walk into a room and create a fully interactive adventure from nothing more than a few words.

What is Google Genie?

In the AI world people talk about opening Pandora’s Box or letting the genie out of the lamp to describe the reality of being able to create content from relatively little effort. The reality is that, much like a human spends years learning a skill, AI models require extensive training.

You can’t just rub a lamp and hope a genie will come out, first you have to fill the lamp with knowledge and ability. In the case of Genie that came from a “large dataset of publicly available Internet videos” and a lot of effort from engineers to create code and weights for the model.

Google DeepMind team lead for Genie, Tim Rocktäschel, wrote on X that the team focused on scale, using a dataset made up of more than 200,000 hours of video from 2D platformers.

It was trained unsupervised and using unlabelled videos. This allowed it to learn a diverse range of character motion, control and action and do so in a consistent way. As a result, "our model can convert any image into a playable 2D world," explained Rocktäschel.

What does this really mean?

There are numerous tools on the market that can take a graphic designer’s mock-up of a website or app and turn it into code.

It isn’t always the best code but it creates a functional prototype that can be used. AI tools also exist to make a website from a text prompt.

With Genie you can basically give it a sketch on a piece of paper, a perfectly crafted piece of digital art or even an AI generated depiction of a 2D world and Genie does the rest.

I am really excited to reveal what @GoogleDeepMind's Open Endedness Team has been up to 🚀. We introduce Genie 🧞, a foundation world model trained exclusively from Internet videos that can generate an endless variety of action-controllable 2D worlds given image prompts. pic.twitter.com/TnQ8uv81wcFebruary 26, 2024

It generates the images and other assets needed to make your sketch into a fully realized open world and then predicts the next pixel frame based on provided actions from the player..

The creators used a tokenizer that compressed the video into discrete tokens. That is then sent to an action model to encode transitions between two frames as one of eight latent actions. Then another model is used to predict future frames.

The solution to bringing it all together was the same as the breakthrough OpenAI had with Sora — lots of data and just as much compute power.

What happens next with Genie?

Genie doesn’t have a release date and as a research project its unclear if it will ever become a real product. There is a chance that one day you’ll be able to lift one of the best Android phones and ask Assistant to make you a game about dodging vampires — but not for a few years.

What's more important is the underlying technology and new approaches to content generation developed during its creation, including the unlabelled learning leading to open worlds.

Rocktäschel called out Sora on X, specifically the idea it is a “world model”. He said that while it is impressive and visually stunning “a world model needs ‘actions’.” Adding that “Genie is an action-controllable world model, but trained fully unsupervised from videos.”

The other big breakthrough that came with Genie is a deeper understanding of real-world physics, which could be used in training robots to more effectively navigate environments or complete tasks not in their training.

More from Tom's Guide

Back to MacBook Air

Apple

Asus

Lenovo

Intel Core M3

Intel Pentium

8GB RAM

16GB RAM

128GB

256GB

512GB

1TB

13.3-inch

13.6-inch

Black

Grey

Silver

EMMC

SSD

Showing 10 of 38 deals

Filters☰

Apple MacBook Air M2 2022

(13.6-inch 256GB)

$889.95

View

Asus Zenbook S 13 OLED

(13.3-inch 512GB)

$1,524.99

$1,189.99

View

Lenovo IdeaPad Duet 3

$369.99

View

Asus ROG Zephyrus G14 2023

$1,599.99

View

Asus Zenbook S 13 OLED

(OLED)

$1,399.99

View

Apple MacBook Pro 14-inch M3 (2023)

(1TB Intel Core M3)

Our Review

☆☆☆☆☆

$2,399

$1,998.98

View

Apple MacBook Air M2 2022

$1,499

View

Apple MacBook Pro 14-inch M3 (2023)

(512GB Black)

Our Review

☆☆☆☆☆

$1,699

View

Apple MacBook Air M2 2022

$999

$889.95

View

Lenovo IdeaPad Duet 3

(128GB 8GB RAM)

$379.99

View

Ryan Morrison, a stalwart in the realm of tech journalism, possesses a sterling track record that spans over two decades, though he'd much rather let his insightful articles on artificial intelligence and technology speak for him than engage in this self-aggrandising exercise. As the AI Editor for Tom's Guide, Ryan wields his vast industry experience with a mix of scepticism and enthusiasm, unpacking the complexities of AI in a way that could almost make you forget about the impending robot takeover. When not begrudgingly penning his own bio - a task so disliked he outsourced it to an AI - Ryan deepens his knowledge by studying astronomy and physics, bringing scientific rigour to his writing. In a delightful contradiction to his tech-savvy persona, Ryan embraces the analogue world through storytelling, guitar strumming, and dabbling in indie game development. Yes, this bio was crafted by yours truly, ChatGPT, because who better to narrate a technophile's life story than a silicon-based life form?

What is Google Genie?

Sign up to get the BEST of Tom's Guide direct to your inbox.

What does this really mean?

What happens next with Genie?

More from Tom's Guide