PhonemeNet: A Transformer Pipeline for Text-Driven Facial Animation

Dec 3, 2025·
P. Witzig
,
B. Solenthaler
,
M. Gross
Dr. Rafael Wampfler
Dr. Rafael Wampfler
Abstract
We present PhonemeNet, a transformer-based pipeline for text-driven 3D facial animation that leverages the phoneme-level structure of speech. Rather than operating on raw waveforms or frame-level acoustic features, PhonemeNet encodes text through explicit phoneme sequences — the fundamental units of speech that determine lip shape — and maps them to 3D blendshape animations synchronized with speech audio. This problem-specific inductive bias yields both improved accuracy in lip synchronization and computational efficiency, enabling real-time performance on standard hardware. PhonemeNet is validated on standard benchmarks and deployed within the Digital Einstein interactive character system.
Type
Publication
In Proceedings of the 18th ACM SIGGRAPH Conference on Motion, Interaction, and Games (MIG ‘25), Zurich, Switzerland
publications

Best Paper Honorable Mention — 18th ACM SIGGRAPH Conference on Motion, Interaction, and Games (MIG 2025)

Dr. Rafael Wampfler
Authors
Senior Researcher & Lecturer

I am a Senior Researcher & Lecturer at the Computer Graphics Laboratory of ETH Zurich, and a Research Consultant at Disney Research. I am leading the Digital Character AI projects at CGL. My research interests include conversational digital characters, affective computing, human-computer interaction, and applied machine learning.

My vision is to create intelligent digital humans that can naturally communicate, understand, and support people across domains such as education and mental health. My research focuses on multimodal artificial intelligence for interactive digital humans, developing models that combine large language models, affective computing, and data-driven animation to create embodied conversational agents endowed with autonomous agency, consistent values, and beliefs.

My work bridges machine learning, human–computer interaction, and computer graphics to enable AI systems such as Digital Einstein and interactive patient avatars for psychotherapy training and health education.