PRISM: Prosody-Integrated Multi-Agent Reasoning Framework for Empathetic Spoken Dialogue
Quick Answer
PRISM is a novel multi-agent framework for empathetic spoken dialogue that separates speech perception, response generation, and synthesis.
Quick Take
PRISM is a novel multi-agent framework for empathetic spoken dialogue that separates speech perception, response generation, and synthesis. It enhances empathetic dialogue by integrating prosodic expression and external knowledge tools, achieving significant improvements in empathy and response quality across various metrics.
Key Points
- PRISM decouples speech perception, response generation, and synthesis into coordinated components.
- Introduces a prosody-to-language translation mechanism for improved emotional alignment.
- Enables on-demand use of external knowledge tools for empathetic dialogue.
- Experimental results show consistent improvements in empathy and response quality.
- Code available at: https://github.com/Bxzfrm/PRISM.
Paper Resources
Article Excerpt
From source RSS / original summaryarXiv:2606. 12902v1 Announce Type: new Abstract: Empathetic spoken dialogue systems require not only semantically appropriate responses but also emotionally aligned prosodic expression. However, cascade pipelines often discard acoustic cues during speech-to-text conversion, while end-to-end speech models lack interpretable control over emotion and knowledge integration.
To address these challenges, we propose PRISM, a multi-agent framework for empathetic spoken dialogue that decouples speech perception, response generation, and speech synthesis into coordinated components. PRISM introduces a prosody-to-language translation mechanism to stabilize large language model reasoning and enables on-demand invocation of external knowledge tools for empathetic dialogue generation.
Experimental results demonstrate that PRISM achieves consistent improvements in empathy, prosodic appropriateness, and text response generation quality across objective and subjective metrics. Our code is available at: https://github. com/Bxzfrm/PRISM.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The REFLECT benchmark reveals that current LLM judges are unreliable, achieving below 55% accuracy in evaluating reasoning and evidence use, highlighting the need for improved evaluation methods for deep research agents.