Multimodal Hidden Markov Models for Persistent Emotional State Tracking

arXiv cs.AI·Anamika Ragu, Aneesh Jonelagadda

3d ago

·~2 min·5/14/2026·en·1

Quick Take

Proposes a lightweight framework for tracking emotional states in conversations using multimodal data.

Key Points

Models emotional dynamics with sticky factorial HDP-HMMs.
Evaluates regime prediction quality using various metrics.
Enhances LLM responses in clinical contexts through emotional phase recovery.

📖 Reader Mode

~2 min read

[Submitted on 13 May 2026]

View PDF HTML (experimental)

Abstract:Tracking an interpretable emotional arc of a conversation via the sentiment of individual utterances processed as a whole is central to both understanding and guiding communication in applied, especially clinical, conversational contexts. Existing approaches to emotion recognition operate at the utterance level, obscuring the persistent phases that characterize real conversational dynamics. We propose a lightweight framework that models conversational emotion as a sequence of latent emotional regimes using sticky factorial HDP-HMMs over multimodal valence-arousal representations derived from simultaneous video, audio and textual input. We evaluate the quality of regime prediction using LLM-as-a-Judge, geometric, and temporal consistency metrics, demonstrating that the sticky HDP-HMM produces more interpretable regime sequences than the baseline Gaussian HMM at a fraction of the computational cost of LLM-based dialogue state tracking methods. In addition, Question-Answer experiments in a clinical dataset suggest that meaningful emotional phases can reliably be recovered from multimodal valence-arousal trajectories and used to improve the quality of LLM responses in unstable affective regimes via context augmentation. This framework thus opens a path toward interpretable, lightweight, and actionable analysis of conversational emotion dynamics at scale.

Comments:	8 pages, 2 figures
Subjects:	Artificial Intelligence (cs.AI)
Cite as:	arXiv:2605.12838 [cs.AI]
	(or arXiv:2605.12838v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2605.12838 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Aneesh Jonelagadda [view email]
[v1] Wed, 13 May 2026 00:16:05 UTC (2,717 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

Multimodal Hidden Markov Models for Persistent Emotional State Tracking

Quick Take

Key Points

📖 Reader Mode

Submission history

More from arXiv cs.AI

Invisible Orchestrators Suppress Protective Behavior and Dissociate Power-Holders: Safety Risks in Multi-Agent LLM Systems

Distribution-Aware Algorithm Design with LLM Agents

Enhanced and Efficient Reasoning in Large Learning Models

Related in this space

Generative Floor Plan Design with LLMs via Reinforcement Learning with Verifiable Rewards