PRISM: Prosody-Integrated Multi-Agent Reasoning Framework for Empathetic Spoken Dialogue

arXiv cs.CL·Wen Zhang, Xiaocui Yang, Zhuoyue Gao, Shi Feng, Daling Wang, Yifei Zhang

1d ago

·~1 min·6/12/2026·en·0

Quick Answer

PRISM is a novel multi-agent framework for empathetic spoken dialogue that separates speech perception, response generation, and synthesis.

Quick Take

PRISM is a novel multi-agent framework for empathetic spoken dialogue that separates speech perception, response generation, and synthesis. It enhances empathetic dialogue by integrating prosodic expression and external knowledge tools, achieving significant improvements in empathy and response quality across various metrics.

Key Points

PRISM decouples speech perception, response generation, and synthesis into coordinated components.
Introduces a prosody-to-language translation mechanism for improved emotional alignment.
Enables on-demand use of external knowledge tools for empathetic dialogue.
Experimental results show consistent improvements in empathy and response quality.
Code available at: https://github.com/Bxzfrm/PRISM.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Article Excerpt

From source RSS / original summary

arXiv:2606. 12902v1 Announce Type: new Abstract: Empathetic spoken dialogue systems require not only semantically appropriate responses but also emotionally aligned prosodic expression. However, cascade pipelines often discard acoustic cues during speech-to-text conversion, while end-to-end speech models lack interpretable control over emotion and knowledge integration.

To address these challenges, we propose PRISM, a multi-agent framework for empathetic spoken dialogue that decouples speech perception, response generation, and speech synthesis into coordinated components. PRISM introduces a prosody-to-language translation mechanism to stabilize large language model reasoning and enables on-demand invocation of external knowledge tools for empathetic dialogue generation.

Experimental results demonstrate that PRISM achieves consistent improvements in empathy, prosodic appropriateness, and text response generation quality across objective and subjective metrics. Our code is available at: https://github. com/Bxzfrm/PRISM.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CL

See more →

arXiv cs.CL·Leyao Wang, Yanan He, Peng Chen, Asaf Yehudai, Yixin Liu, Rex Ying, Michal Shmueli-Scheuer, Arman Cohan

3w ago

FeaturedOriginal

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

AI Summary

The REFLECT benchmark reveals that current LLM judges are unreliable, achieving below 55% accuracy in evaluating reasoning and evidence use, highlighting the need for improved evaluation methods for deep research agents.

#LLM #Agent #Inference #Policy