Google’s new VLOGGER AI lets you create a lifelike avatar from just a photo — and control it with your voice

Google VLOGGER video
(Image credit: Google)

Google's researchers have been working overtime recently, publishing a flurry of new models and ideas. The latest is a way to take a still image and turn it into a controllable avatar, coming off the back of a game playing AI agent.

VLOGGER isn't currently available to try but the demo suggests it will allow you to make an avatar and control it using your voice — and it looks surprisingly realistic.

You can already do similar things to some extent with tools like Pika Labs lip sync, Hey Gen’s video translation services and Synthesia but this seems to be a simpler, lower bandwidth option.

What is VLOGGER?

Currently VLOGGER is nothing more than a research project with a couple of fun demo videos, but if it is ever turned into a product it could be a new way to communicate in Teams or Slack.

It's an AI model able to create an animated avatar from a still image and maintain the photorealistic look of the person in the photo in every frame of the final video.

The model then also takes in an audio file of the person speaking and handles body and lip movement to reflect the natural way that person might move if it were them saying the words.

This includes creating head motion, facial expression, eye gaze, blinking as well as hand gestures and upper body movement without any reference beyond the image and audio.

How does VLOGGER work?

Google Vlogger AI video

(Image credit: Google)

The model is built on the diffusion architecture that powers text-to-image, video and even 3D models like MidJourney or Runway but adds additional control mechanisms.

Vlogger goes through multiple steps to get the generated avatar. First it takes the audio and image as input, runs it through a 3D motion generation process, then a "temporal diffusion" model to determine timings and movement, finally it is upscaled and turned into the final output.

Essentially it builds a neural network to predict motion for the face, body, pose, gaze and expressions over time using the still image as the first frame and audio as the guide.

Training the model required a large multimedia dataset called MENTOR. It has 800,000 videos of different people talking with each part of their face and body labelled at every moment.

What are the limitations of VLOGGER?

This is a research preview rather than an actual product and while it is able to generate realistic looking motion, the video may not always match the way the person really moves. It is still a diffusion model at its core and they can be prone to unusual behavior.

The team say it also struggles with particularly large motions or diverse environments. Also, it can only handle relatively short videos. 

What are the use cases for VLOGGER?

Apple Vision Pro Persona

(Image credit: Future)

According to Google's researchers one of the primary use cases is in translation of video. For example taking an existing video in a particular language and editing the lip and face to match the new, translated audio.

Other potential use cases include creating animated avatars for virtual assistants, chatbots, or virtual characters that look and move realistically in a game environment.

There are tools that do something similar to this already including Synthesia, where users can go in to the company offices and create their own virtual avatar to give presentations, but this new model seems to make the process much easier.

One potential use is in providing low-bandwidth video communication. A future version of the model could allow for video chats from audio by animating the still image avatar. 

This could prove particularly useful for VR environments on headsets like the Meta Quest or the Apple Vision Pro, operating independent of the platform’s own avatar models.

More from Tom's Guide

Category
Arrow
Arrow
Back to MacBook Air
Brand
Arrow
Processor
Arrow
RAM
Arrow
Storage Size
Arrow
Screen Size
Arrow
Colour
Arrow
Storage Type
Arrow
Condition
Arrow
Price
Arrow
Any Price
Showing 10 of 46 deals
Filters
Arrow
Show more
Ryan Morrison
AI Editor

Ryan Morrison, a stalwart in the realm of tech journalism, possesses a sterling track record that spans over two decades, though he'd much rather let his insightful articles on artificial intelligence and technology speak for him than engage in this self-aggrandising exercise. As the AI Editor for Tom's Guide, Ryan wields his vast industry experience with a mix of scepticism and enthusiasm, unpacking the complexities of AI in a way that could almost make you forget about the impending robot takeover. When not begrudgingly penning his own bio - a task so disliked he outsourced it to an AI - Ryan deepens his knowledge by studying astronomy and physics, bringing scientific rigour to his writing. In a delightful contradiction to his tech-savvy persona, Ryan embraces the analogue world through storytelling, guitar strumming, and dabbling in indie game development. Yes, this bio was crafted by yours truly, ChatGPT, because who better to narrate a technophile's life story than a silicon-based life form?

Read more
selfie avatar images
Synthesia just launched the most realistic Selfie Avatars I’ve ever seen — here’s how to try it
Man sitting in front of laptop on a video call
I used an AI body double for video calls — it even fooled my spouse
OmniHuman screenshot of AI generated video
TikTok parent company just launched stunning AI video generator — OmniHuman-1 is taking the world by storm
Google Audio Overview feature from NotebookLM
Google NotebookLM just got way better with its new interactive features — here's why I'm impressed
Shutterstock Sora image
OpenAI just announced that its Sora AI video generator is coming to ChatGPT
The new Gemini app home page vs the old
Forget ChatGPT — Google Gemini can now see the world with live video and screen-sharing
Latest in Google Gemini
Google Gemini logo
You can now use Google Gemini without an account — here's how to get started
A stock photo of a person on their phone looking at a spreadsheet while several graphs are displayed on the laptop in front of them.
Google Sheets just got an AI upgrade that analyzes your data and visualizes it
Gemini logo shown on a phone's screen
Google Gemini can now analyze and summarize documents for free — here's how
Gemini Live
Gemini Live major upgrade just revealed by Google
Gemini 2
Google Gemini 2.0 is now free for users — here’s how to access it now
Gemini 2
My browser tabs were getting out of hand so I let Gemini 2.0 takeover — here's how it went
Latest in News
Maria Debska in "Just One Look" now streaming on Netflix
3 best Netflix shows in March you haven't watched yet
Wolfenstein: The Old Blood
Amazon is giving away a ton of free games for its Big Spring Sale — here’s how to claim yours
A TV with the Netflix logo sits behind a hand holding a remote
Netflix is rolling out a big video quality upgrade — what you need to know
Choi Hyun-Wook, Hong Kyung, and Park Ji-hoon in "Weak Hero Class 1" now streaming on Netflix
This action-packed K-drama is now streaming on Netflix — and now’s the time to binge-watch before season 2
OnePlus 13 back, leaning against blue wall
OnePlus 13T could come with an even bigger battery than OnePlus 13 — this is incredible
Apple Watch Ultra 2
Apple Watch Ultra 3 just tipped for two major upgrades