Mitigating Hallucinations in Large Vision-Language Models via Causal Route Gating
Quick Take
A new method reduces hallucinations in large vision-language models by selectively suppressing text routes during decision-making.
Key Points
- Hallucinations limit LVLM reliability in real-world applications.
- Proposed method uses causal route gating to mitigate hallucinations.
- Maintains multimodal performance with minimal inference overhead.
Article Excerpt
From source RSS / original summaryarXiv:2605. 24024v1 Announce Type: new Abstract: Large vision-language models (LVLMs) often hallucinate content that is fluent yet unsupported by the image, limiting their reliability in real-world deployment. We show that a key failure mode arises from route competition: even when visual tokens receive attention, the final token decision can be dominated by the textual pathway, causing the decoder to follow linguistic priors over visual evidence.
To mitigate this, we propose a training-free, decision-aligned intervention that decomposes each attention head into a visual route and a text route, and estimates their token-level effects using an efficient one-forward/one-gradient approximation. These estimates reveal route conflict within heads and identify prior-dominant ones, enabling selective suppression of only the text route while keeping the visual route intact.
Across five benchmarks spanning discriminative and generative settings, our method consistently reduces hallucination-related errors across models with limited impact on overall multimodal performance, while incurring a modest inference-time overhead.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →Deep Learning-Based Automated Quantification of TIMI Myocardial Perfusion Frame Count (DL-TMPFC) from Coronary Angiography: A Novel Framework for Rapid Assessment of Microvascular Dysfunction
DL-TMPFC automates TIMI Myocardial Perfusion Frame Count for rapid assessment of coronary microvascular dysfunction.