Generating Expressive 3D Facial Animations From Audio

Researchers from NVIDIA and the independent game developer Remedy Entertainment developed an automated real-time deep learning technique to create 3D facial animations from audio with low latency.
Using a TITAN Xp GPU and the cuDNN-accelerated Theano deep learning framework, the researchers trained their neural network on nearly ten minutes of high-quality audio and expression data obtained from two human actors. They had the actors speak one to three pangrams (sentences that are designed to contain as many different phonemes as possible, in several different emotional tones to provide a good coverage of the range of expression), and in-character material (preliminary version of the script) which leverages the fact that an actor’s performance of a character is often heavily biased in terms of emotional and expressive range for various dramatic and narrative reasons.
As mentioned in their paper: “Our deep neural network learns a mapping from input waveforms to the 3D vertex coordinates of a face model, and simultaneously discovers a compact, latent code that disambiguates the variations in facial expression that cannot be explained by the audio alone. During inference, the latent code can be used as an intuitive control for the emotional state of the face puppet.

This study’s primary goal was to model the speaking style of a single actor, but the deep learning technique yields reasonable results even for speakers of different gender with different accents or languages. The technique can be used for in-game dialogue, low-cost localization, virtual reality avatars, and telepresence.
The team from NVIDIA Research and Remedy Entertainment will present their paper next week at SIGGRAPH in Los Angeles.
Learn more about how NVIDIA is bringing AI to the graphics industry at SIGGRAPH >

Generating Expressive 3D Facial Animations From Audio

Related resources

Tags

About the Authors

Generating Expressive 3D Facial Animations From Audio

Related resources

Tags

About the Authors

Comments

Related posts

Spotlight: Convai Reinvents Non-Playable Character Interactions

Create Lifelike Avatars with AI Animation and Speech Features in NVIDIA ACE

Inception Spotlight: Supercharging Synthetic Speech with Resemble AI

NVIDIA Omniverse Audio2Face Available Later This Week in Open Beta

Generating Character Animations from Speech with AI

Related posts

Rethinking How to Train Diffusion Models

Generative AI Research Spotlight: Personalizing Text-to-Image Models

Most Popular NVIDIA Technical Blog Posts of 2023: Generative AI, LLMs, Robotics, and Virtual Worlds Breakthroughs

Teaching AVs the Language of Human Driving Behavior with Trajeglish

Generative AI Research Spotlight: Demystifying Diffusion-Based Models