Why LLMs Hallucinate on Structured Knowledge: A Mechanistic Analysis of Reasoning over Linearized Representations

arXiv cs.CL·Shanghao Li, Jinda Han, Yibo Wang, Yuanjie Zhu, Zihe Song, Langzhou He, Kenan Kamel A Alghythee, Philip S. Yu

5/27/2026

·~1 min·5/27/2026·en·2

Quick Answer

This study reveals that hallucinations in large language models (LLMs) stem from systematic internal dynamics, particularly attention focusing on shortcut cues and inadequate grounding of knowledge.

Quick Take

This study reveals that hallucinations in large language models (LLMs) stem from systematic internal dynamics, particularly attention focusing on shortcut cues and inadequate grounding of knowledge. These findings suggest that hallucinations are linked to failures in semantic grounding within feed-forward layers, affecting reasoning tasks that utilize structured knowledge.

Key Points

LLMs often produce hallucinated outputs despite access to structured knowledge.
Attention mechanisms favor shortcut cues over comprehensive context distribution.
Failures in semantic grounding within feed-forward layers lead to hallucinations.
Findings apply to both single-hop graphs and multi-hop/tabular settings.
Effective hallucination detection can be achieved across various structured knowledge formats.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Article Content

From source RSS / original summary

arXiv:2605. 26362v1 Announce Type: new Abstract: In many reasoning tasks, large language models (LLMs) rely on structured external knowledge, such as graphs and tables, which is typically linearized into sequential token representations. However, even when sufficient knowledge is available, LLMs can still produce hallucinated outputs, and the underlying mechanisms behind such failures remain poorly understood.

We investigate these mechanisms and find that hallucinations arise from systematic internal dynamics rather than random noise. First, attention disproportionately concentrates toward shortcut-like structural cues rather than distributing across the full context. Second, feed-forward representations fail to ground the provided knowledge, causing the model to revert to parametric memory.

Moreover, our results indicate that hallucination is consistently associated with failures in semantic grounding within feed-forward layers, while attention allocation exhibits greater task-dependent variability. Finally, we show that these mechanistic patterns generalize beyond single-hop graphs to multi-hop and tabular settings, enabling effective hallucination detection across structured knowledge formats.

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CL

See more →

arXiv cs.CL·Miguel Arana-Catania, Catherine Conisbee, Matthew Kidd

1d ago

FeaturedOriginal

Letting the Data Speak: Extracting Keywords from Crowdsourced Collections with AI

AI Summary

The study evaluates three NLP approaches—Named Entity Recognition, Keyword Extraction, and Topic Modelling—using the Their Finest Hour Online Archive to automate keyword extraction from crowdsourced WWII collections. Findings suggest that while NLP methods show promise, no single approach is sufficient, and ethical considerations in automated keyword extraction are crucial for responsible stewardship.

#AI Coding #Inference #Open Source #Policy

Why LLMs Hallucinate on Structured Knowledge: A Mechanistic Analysis of Reasoning over Linearized Representations

Quick Answer

Quick Take

Key Points

Paper Resources

Article Content

Want this in your inbox every morning?

More from arXiv cs.CL

Letting the Data Speak: Extracting Keywords from Crowdsourced Collections with AI

Quantifying Prior Dominance in Systems

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

Quick Answer

Quick Take

Key Points

Paper Resources

Article Content

Want this in your inbox every morning?

More from arXiv cs.CL

Letting the Data Speak: Extracting Keywords from Crowdsourced Collections with AI

Quantifying Prior Dominance in RAG Systems

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

Quantifying Prior Dominance in Systems