Why LLMs Hallucinate on Structured Knowledge: A Mechanistic Analysis of Reasoning over Linearized Representations
Quick Take
This study reveals that hallucinations in large language models (LLMs) stem from systematic internal dynamics, particularly attention focusing on shortcut cues and inadequate grounding of knowledge. These findings suggest that hallucinations are linked to failures in semantic grounding within feed-forward layers, affecting reasoning tasks that utilize structured knowledge.
Key Points
- LLMs often produce hallucinated outputs despite access to structured knowledge.
- Attention mechanisms favor shortcut cues over comprehensive context distribution.
- Failures in semantic grounding within feed-forward layers lead to hallucinations.
- Findings apply to both single-hop graphs and multi-hop/tabular settings.
- Effective hallucination detection can be achieved across various structured knowledge formats.
Article Content
From source RSS / original summaryarXiv:2605. 26362v1 Announce Type: new Abstract: In many reasoning tasks, large language models (LLMs) rely on structured external knowledge, such as graphs and tables, which is typically linearized into sequential token representations. However, even when sufficient knowledge is available, LLMs can still produce hallucinated outputs, and the underlying mechanisms behind such failures remain poorly understood.
We investigate these mechanisms and find that hallucinations arise from systematic internal dynamics rather than random noise. First, attention disproportionately concentrates toward shortcut-like structural cues rather than distributing across the full context. Second, feed-forward representations fail to ground the provided knowledge, causing the model to revert to parametric memory.
Moreover, our results indicate that hallucination is consistently associated with failures in semantic grounding within feed-forward layers, while attention allocation exhibits greater task-dependent variability. Finally, we show that these mechanistic patterns generalize beyond single-hop graphs to multi-hop and tabular settings, enabling effective hallucination detection across structured knowledge formats.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The reliability of LLM judges for evaluating deep research agents is critically assessed using the REFLECT benchmark.