<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Natural Language Processing | Dr. Rafael Wampfler</title><link>https://rafael-wampfler.github.io/tags/natural-language-processing/</link><atom:link href="https://rafael-wampfler.github.io/tags/natural-language-processing/index.xml" rel="self" type="application/rss+xml"/><description>Natural Language Processing</description><generator>HugoBlox Kit (https://hugoblox.com)</generator><language>en-us</language><lastBuildDate>Mon, 01 Jan 2024 00:00:00 +0000</lastBuildDate><image><url>https://rafael-wampfler.github.io/media/icon_hu_d100f07c298b9e73.png</url><title>Natural Language Processing</title><link>https://rafael-wampfler.github.io/tags/natural-language-processing/</link></image><item><title>Dialog Act Classification</title><link>https://rafael-wampfler.github.io/projects/dialog-act/</link><pubDate>Mon, 01 Jan 2024 00:00:00 +0000</pubDate><guid>https://rafael-wampfler.github.io/projects/dialog-act/</guid><description>&lt;h2 id="overview"&gt;Overview&lt;/h2&gt;
&lt;p&gt;For a conversational agent to respond appropriately, it must understand not just &lt;em&gt;what&lt;/em&gt; a user says, but &lt;em&gt;why&lt;/em&gt; they said it — the communicative intent behind their utterance. Dialog Act (DA) classification is the task of categorizing utterances by their function in conversation (e.g., question, assertion, greeting, request, clarification). This project develops multimodal dialog act classifiers tailored for interactions with digital characters.&lt;/p&gt;
&lt;h2 id="motivation"&gt;Motivation&lt;/h2&gt;
&lt;p&gt;Standard dialog act classification systems are trained on text transcriptions alone. In real-world interactions with embodied agents, however, users communicate through a rich combination of speech prosody, gaze, gesture, and lexical content. A question delivered with rising intonation carries different meaning than the same words spoken flatly; a greeting accompanied by eye contact differs from one delivered distractedly.&lt;/p&gt;
&lt;p&gt;For digital characters that must respond naturally in real time, dialog act classification must therefore be multimodal — integrating acoustic, linguistic, and where available, visual signals — and must operate with low latency to support interactive response times.&lt;/p&gt;
&lt;h2 id="approach"&gt;Approach&lt;/h2&gt;
&lt;p&gt;Our multimodal dialog act classifier integrates:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Lexical features&lt;/strong&gt;: Encoded via transformer-based text encoders fine-tuned on dialog corpora&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Acoustic features&lt;/strong&gt;: Prosodic signals including pitch, energy, and speech rate, extracted from the raw audio signal&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Temporal context&lt;/strong&gt;: Conversation history modeling to resolve ambiguous acts through discourse-level context&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The system is evaluated on naturalistic conversations with digital characters — a challenging setting because users frequently use fragmented, spontaneous speech rather than complete, grammatical sentences. The classifier is optimized for both accuracy and latency, enabling real-time use within the Digital Einstein pipeline.&lt;/p&gt;
&lt;h2 id="key-results"&gt;Key Results&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Demonstrated that multimodal integration (text + acoustic features) significantly outperforms text-only baselines for dialog act classification in digital character conversations&lt;/li&gt;
&lt;li&gt;Achieved real-time classification latency compatible with interactive agent deployment&lt;/li&gt;
&lt;li&gt;Provided insights into which dialog acts are most frequently misclassified in human-agent interaction, informing future system design&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="publication"&gt;Publication&lt;/h2&gt;
&lt;p&gt;P. Witzig, R. Constantin, N. Kovačević and &lt;strong&gt;R. Wampfler&lt;/strong&gt; (2024). &lt;em&gt;Multimodal Dialog Act Classification for Conversations With Digital Characters&lt;/em&gt;. Proceedings of the 6th International Conference on Conversational User Interfaces (CUI), Luxembourg, Luxembourg, July 08–10, 2024, pp. 1–14.&lt;/p&gt;</description></item></channel></rss>