EmoSpaceTime: Decoupling Emotion and Content through Contrastive Learning for Expressive 3D Speech Animation

Nov 21, 2024·
P. Witzig
,
S. Solenthaler
,
M. Gross
Dr. Rafael Wampfler
Dr. Rafael Wampfler
Abstract
We present EmoSpaceTime, a method for generating expressive 3D speech animation by explicitly decoupling emotion and semantic content through contrastive learning. Existing speech animation approaches entangle emotional style with phonetic content in their learned representations, limiting the ability to control expressive output independently of the spoken words. EmoSpaceTime learns a factorized latent space where emotion and content are disentangled, enabling fine-grained control over emotional expressivity at inference time. A contrastive training objective ensures that representations from the same emotional register are pulled together while those from different emotions are pushed apart, independent of semantic content. We demonstrate that EmoSpaceTime generates animations that are simultaneously emotionally consistent and semantically coherent, with user studies validating the quality and controllability of the expressive output.
Type
Publication
In Proceedings of the 17th ACM SIGGRAPH Conference on Motion, Interaction and Games (MIG ‘24), Arlington, USA
publications
Dr. Rafael Wampfler
Authors
Senior Researcher & Lecturer

I am a Senior Researcher & Lecturer at the Computer Graphics Laboratory of ETH Zurich, and a Research Consultant at Disney Research. I am leading the Digital Character AI projects at CGL. My research interests include conversational digital characters, affective computing, human-computer interaction, and applied machine learning.

My vision is to create intelligent digital humans that can naturally communicate, understand, and support people across domains such as education and mental health. My research focuses on multimodal artificial intelligence for interactive digital humans, developing models that combine large language models, affective computing, and data-driven animation to create embodied conversational agents endowed with autonomous agency, consistent values, and beliefs.

My work bridges machine learning, human–computer interaction, and computer graphics to enable AI systems such as Digital Einstein and interactive patient avatars for psychotherapy training and health education.