DiPS: Dialogue Policy Selection for High-Stakes Persuasion Agents
Quick Answer
This paper shows that The DiPS framework utilizes Q-learning to dynamically select tailored persuasion strategies in high-stakes scenarios, achieving higher evacuation success rates than zero-shot LLMs and generic RAG-augmented methods.
Quick Take
The DiPS framework utilizes Q-learning to dynamically select tailored persuasion strategies in high-stakes scenarios, achieving higher evacuation success rates than zero-shot LLMs and generic -augmented methods. Evaluated in fire-rescue contexts, DiPS adapts to individual resident responses, significantly improving outcomes in critical situations.
Key Points
- DiPS employs Q-learning for dynamic persuasion strategy selection.
- Framework tested in fire-rescue scenarios for evacuation success.
- Outperforms zero-shot LLMs and generic RAG-augmented approaches.
- Critic model maximizes chances of successful resident evacuation.
- Adapts to individual resident responses for better outcomes.
Paper Resources
Article Excerpt
From source RSS / original summaryarXiv:2607. 01557v1 Announce Type: new Abstract: Large Language Models (LLMs) often struggle with persuasion in high-stakes scenarios. People's individual personalities and concerns require tailored strategies rather than a one-size-fits-all approach.
To address this challenge, we focus on a fire-rescue scenario in which an operator must persuade a resident to evacuate as a high-stakes persuasion domain and propose Dialogue Policy Selection (DiPS), a Q-learning framework to dynamically select persuasion strategies adapted to the evolving conversational context. Specifically, we train a critic, trained to maximize the chance of evacuation success, to select a persuasion policy at each turn based on the resident's recent utterances.
We then evaluate DiPS against multiple baselines in both simulated and real human interactions. We find that DiPS achieves higher evacuation success than a zero-shot LLM and generic -augmented approach.
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Quantifying Prior Dominance in Systems
The study introduces the Normalized Context Utilization (NCU) metric to evaluate Retrieval-Augmented Generation (RAG) systems, revealing that Small Language Models (SLMs) outperform larger models in factual extraction. The findings indicate that traditional scaling laws yield diminishing returns, with a commercial API frequently failing against adversarial evidence due to systemic confidence collapse.