Multimodal Dialog Act Classification for Conversations With Digital Characters

Jul 8, 2024·
P. Witzig
,
R. Constantin
,
N. Kovačević
Dr. Rafael Wampfler
Dr. Rafael Wampfler
Abstract
We present a multimodal dialog act classification system for conversations with embodied digital characters. Dialog act classification — categorizing utterances by their communicative intent (e.g., question, statement, greeting, clarification) — is a critical component for enabling appropriate agent responses. Unlike standard dialog act classification, conversations with digital characters involve spontaneous, often fragmented speech and require real-time classification for interactive use. We develop a multimodal classifier integrating lexical features from transformer text encoders with acoustic prosodic features, evaluated on a dataset of naturalistic conversations with the Digital Einstein system. Multimodal fusion substantially outperforms text-only classification, and we achieve latency compatible with real-time interactive deployment.
Type
Publication
In Proceedings of the 6th International Conference on Conversational User Interfaces (CUI), Luxembourg
publications
Dr. Rafael Wampfler
Authors
Senior Researcher & Lecturer

I am a Senior Researcher & Lecturer at the Computer Graphics Laboratory of ETH Zurich, and a Research Consultant at Disney Research. I am leading the Digital Character AI projects at CGL. My research interests include conversational digital characters, affective computing, human-computer interaction, and applied machine learning.

My vision is to create intelligent digital humans that can naturally communicate, understand, and support people across domains such as education and mental health. My research focuses on multimodal artificial intelligence for interactive digital humans, developing models that combine large language models, affective computing, and data-driven animation to create embodied conversational agents endowed with autonomous agency, consistent values, and beliefs.

My work bridges machine learning, human–computer interaction, and computer graphics to enable AI systems such as Digital Einstein and interactive patient avatars for psychotherapy training and health education.