Hallucinations as Orthogonal Noise: Inference-Time Manifold Alignment via Dynamic Contextual Orthogonalization
Quick Take
The study introduces Dynamic Contextual Orthogonalization (DCO) to mitigate hallucinations in LLMs like Llama-3-8B and 70B. DCO enhances contextual fidelity by addressing orthogonal noise in attention heads, outperforming existing methods on benchmarks such as XSum and TriviaQA, while maintaining knowledge retention.
Key Points
- DCO uses dynamic context anchors for orthogonal decomposition of attention outputs.
- Achieves superior performance on benchmarks like XSum, NQ-Swap, and IFEval.
- Maintains high accuracy on knowledge-intensive tasks like TriviaQA and TruthfulQA.
- Employs layer-wise Z-score suppression to filter out divergent noise.
- Validates the geometric interpretation of hallucinations in LLMs.
Article Content
From source RSS / original summaryarXiv:2606. 03022v1 Announce Type: new Abstract: Hallucination in Large Language Models (LLMs), characterized by the generation of content inconsistent with contextual facts or logical constraints -- remains a persistent challenge for reliable deployment. In this work, we address this issue through a geometric framework rooted in the linear representation hypothesis. We propose that hallucinations manifest as orthogonal noise relative to the semantic manifold of the residual stream.
Specifically, we hypothesize that while attention heads ideally propagate information congruent with the context subspace, hallucinations arise when specific heads introduce components orthogonal to this subspace, disrupting the coherence of the latent representation. Based on this formulation, we introduce Dynamic Contextual Orthogonalization (DCO), an inference-time intervention method.
DCO utilizes the input residual stream as a dynamic context anchor to perform orthogonal decomposition on attention head outputs. To distinguish between context-aligned semantic updates and divergent noise, DCO employs a layer-wise Z-score suppression mechanism that selectively attenuates outlier orthogonal components based on statistical distributions.
Evaluations on Llama-3-8B and 70B across benchmarks such as XSum, NQ-Swap, and IFEval demonstrate that DCO achieves superior contextual faithfulness compared to state-of-the-art intervention baselines. Furthermore, DCO maintains high performance on knowledge-intensive tasks like TriviaQA and TruthfulQA, effectively mitigating the trade-off between hallucination suppression and parametric knowledge retention often observed in existing methods.
Our findings validate the geometric interpretation of hallucinations and establish DCO as a computationally efficient approach for enforcing manifold alignment. Our code is available at https://github. com/Harry-Miral/DCO
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The REFLECT benchmark reveals that current LLM judges are unreliable, achieving below 55% accuracy in evaluating reasoning and evidence use, highlighting the need for improved evaluation methods for deep research agents.