DiPS: Dialogue Policy Selection for High-Stakes Persuasion Agents

arXiv cs.CL·Tianyi Zhang, Mousumi Das, Abrar Anwar, Jesse Thomason, David Traum

3h ago

·~1 min·7/3/2026·en·0

Quick Answer

This paper shows that The DiPS framework utilizes Q-learning to dynamically select tailored persuasion strategies in high-stakes scenarios, achieving higher evacuation success rates than zero-shot LLMs and generic RAG-augmented methods.

Quick Take

The DiPS framework utilizes Q-learning to dynamically select tailored persuasion strategies in high-stakes scenarios, achieving higher evacuation success rates than zero-shot LLMs and generic -augmented methods. Evaluated in fire-rescue contexts, DiPS adapts to individual resident responses, significantly improving outcomes in critical situations.

Key Points

DiPS employs Q-learning for dynamic persuasion strategy selection.
Framework tested in fire-rescue scenarios for evacuation success.
Outperforms zero-shot LLMs and generic RAG-augmented approaches.
Critic model maximizes chances of successful resident evacuation.
Adapts to individual resident responses for better outcomes.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Article Excerpt

From source RSS / original summary

arXiv:2607. 01557v1 Announce Type: new Abstract: Large Language Models (LLMs) often struggle with persuasion in high-stakes scenarios. People's individual personalities and concerns require tailored strategies rather than a one-size-fits-all approach.

To address this challenge, we focus on a fire-rescue scenario in which an operator must persuade a resident to evacuate as a high-stakes persuasion domain and propose Dialogue Policy Selection (DiPS), a Q-learning framework to dynamically select persuasion strategies adapted to the evolving conversational context. Specifically, we train a critic, trained to maximize the chance of evacuation success, to select a persuasion policy at each turn based on the resident's recent utterances.

We then evaluate DiPS against multiple baselines in both simulated and real human interactions. We find that DiPS achieves higher evacuation success than a zero-shot LLM and generic -augmented approach.

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CL

See more →

arXiv cs.CL·Barak Or

1w ago

FeaturedOriginal

Quantifying Prior Dominance in Systems

AI Summary

The study introduces the Normalized Context Utilization (NCU) metric to evaluate Retrieval-Augmented Generation (RAG) systems, revealing that Small Language Models (SLMs) outperform larger models in factual extraction. The findings indicate that traditional scaling laws yield diminishing returns, with a commercial API frequently failing against adversarial evidence due to systemic confidence collapse.

#LLM #AI Coding #Inference #AI Startup

DiPS: Dialogue Policy Selection for High-Stakes Persuasion Agents

Quick Answer

Quick Take

Key Points

Paper Resources

Article Excerpt

Want this in your inbox every morning?

More from arXiv cs.CL

Quantifying Prior Dominance in Systems

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

When Plausible Is Not Realistic: Evaluating Human Mobility in LLM-Based Urban Simulation

Quick Answer

Quick Take

Key Points

Paper Resources

Article Excerpt

Want this in your inbox every morning?

More from arXiv cs.CL

Quantifying Prior Dominance in RAG Systems

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

When Plausible Is Not Realistic: Evaluating Human Mobility in LLM-Based Urban Simulation

Quantifying Prior Dominance in Systems