The Structural Attention Tax: How Retrieval Format Hijacks In-Context Learning Independent of Content
Quick Answer
The study identifies a 'structural attention tax' in retrieval-augmented generation systems, where knowledge graph triples capture 2-3x more attention than equivalent natural text, compressing demonstration attention by up to 42%.
Quick Take
The study identifies a 'structural attention tax' in systems, where knowledge graph triples capture 2-3x more attention than equivalent natural text, compressing demonstration attention by up to 42%. This effect is independent of content relevance and highlights the need for optimizing retrieval quality and reducing format-driven attention capture, as evidenced by a significant performance gap in task-matched retrieval strategies across models like Mistral-7B and LLaMA-3-8B.
Key Points
- Knowledge graph triples capture 2-3x more attention than natural language text.
- Demonstration attention can be compressed by up to 42% due to structural factors.
- Task-matched BM25 retrieval outperforms ConceptNet by over 30 percentage points.
- Five structure-aware mitigation strategies were derived, including prompt modifications.
- Format flattening was validated through accuracy and attention-level evidence.
Paper Resources
Article Content
From source RSS / original summaryarXiv:2606. 11198v1 Announce Type: new Abstract: (RAG) systems inject external knowledge to improve LLM outputs, yet the format of injected content -- distinct from its semantic relevance -- can independently distort the model's attention distribution.
We identify and formalise a phenomenon we term the structural attention tax: knowledge graph (KG) triples, due to their relational delimiters and repeated slot patterns, capture 2-3x more attention per token than semantically equivalent natural-language text ($\hat{o}$(KG) $\approx$ 0. 70 vs. $\hat{o}$(neutral) $\approx$ 0. 25), compressing demonstration attention by up to 42% -- regardless of whether the triples are relevant or noise.
We develop a formal framework decomposing attention scores into semantic and structural components (Eq. 2), derive a compression bound (Proposition 1) connecting token-level format bias to demonstration attention loss, and show that the structural term governs how much attention is diverted while the semantic term governs whether this helps or hurts.
This decoupling reveals two orthogonal axes for improving retrieval-augmented ICL: optimising retrieval quality (semantic axis) and reducing format-driven attention capture (structural axis). Empirically, across two model families (Mistral-7B, LLaMA-3-8B) and three QA benchmarks, we observe that source-task alignment dominates: task-matched BM25 retrieval achieves 58-62% on HotpotQA vs. ConceptNet's 25-27%, a >30 pp gap that dwarfs all gating strategies ($\leq$2 pp).
We derive five structure-aware mitigation strategies from the framework, ranging from zero-cost prompt modifications to training-time regularisation; format flattening (S3) is validated by both accuracy and attention-level evidence from a verbalized-triple control, while structural dispersal (S1) yields mixed results that illuminate the challenges of format-level intervention.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The REFLECT benchmark reveals that current LLM judges are unreliable, achieving below 55% accuracy in evaluating reasoning and evidence use, highlighting the need for improved evaluation methods for deep research agents.