Hallucination Detection-Guided Preference Optimization for Clinical Summarization

arXiv cs.CL·Shamanth Kuthpadi Seethakantha, Dung Ngoc Thai, Vara Prasad Gudi, Simran Tiwari, Rami Matar, Avijit Mitra, Wenlong Zhao, Wael Salloum, Andrew McCallum

1d ago

·~1 min·5/29/2026·en·2

Quick Take

The study introduces extit{model} and extit{model} for Preference Learning, which utilize hallucination detectors to significantly reduce hallucinations in clinical summarization tasks. Specifically, extit{model} reduces hallucinations by 24% and extit{model} by 48% in Llama-3.1-8B-Instruct, while maintaining summary fluency and coherence, demonstrating an effective approach for enhancing factual accuracy in healthcare applications.

Key Points

Introduces extit{model} for iterative summary revisions using hallucination detectors.
extit{model} reduces hallucinations by 24% in Llama-3.1-8B-Instruct.
extit{model} reduces hallucinations by 48%, enhancing factual accuracy.
Methods maintain summary fluency and coherence as per expert evaluations.
Demonstrates automated solutions for improving clinical summarization reliability.

Article Excerpt

From source RSS / original summary

arXiv:2605. 28910v1 Announce Type: new Abstract: Large language models (LLMs) have shown promise on summarization tasks, but they often produce hallucinations, which are unsupported or incorrect statements that limit their reliability in specialized healthcare applications. We introduce \itermodelfull (\itermodel), an inference-time method that leverages hallucination detectors to guide iterative summary revisions toward factual corrections.

Building on this, we propose \itermodel for Preference Learning (\model), which converts detector-guided refinement trajectories into preference pairs for model finetuning. Extensive experiments show that our methods substantially reduce hallucinations for Llama and Gemma models in summarizing real-world clinical notes from \MimicIV. For example, \itermodel reduces 24\% and \model reduces 48\% hallucinations in Llama-3. 1-8B-Instruct.

Importantly, both methods preserve summary fluency, coherence, and relevance according to human expert and LLM-Jury evaluations. Together, these results demonstrate that detection-informed refinement and preference learning offer an automated solution for improving factual faithfulness in clinical summarization.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

Hallucination Detection-Guided Preference Optimization for Clinical Summarization

Quick Take

Key Points

Article Excerpt

Want this in your inbox every morning?

More from arXiv cs.CL

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

What are They Thinking? Delineation, Probing and Tracking of Concepts in LLMs

In-Context Optimization for Retrieval-Augmented Generation: A Gradient-Descent Perspective