CORTEX: Token-Level Hallucination Detection in RAG via Comparative Internal Representations
Quick Answer
CORTEX is a token-level hallucination detection method for Retrieval-Augmented Generation (RAG) that improves localization of ungrounded content by comparing internal representations of LLMs with and without retrieved documents.
Quick Take
CORTEX is a token-level hallucination detection method for (RAG) that improves localization of ungrounded content by comparing internal representations of LLMs with and without retrieved documents. Experiments on two RAG benchmarks demonstrate substantial performance gains in detecting hallucinations, reducing false positives and enhancing span consistency.
Key Points
- CORTEX identifies hallucinations at the token level for improved localization.
- It compares LLM internal representations with and without retrieved documents.
- The method reduces false positives by leveraging document-grounded information.
- Post-processing smoothing enhances span-consistent predictions.
- Experiments show significant performance improvements across multiple benchmarks.
Paper Resources
Article Content
From source RSS / original summaryarXiv:2606. 31033v1 Announce Type: new Abstract: In this paper, we propose CORTEX, a token-level hallucination detection method for (RAG). In long-form RAG outputs, hallucinations often arise in localized spans rather than throughout an entire response. CORTEX therefore identifies ungrounded content at the token level, enabling fine-grained localization of hallucinations.
The key intuition behind CORTEX is that tokens grounded in retrieved documents should be more strongly influenced by those documents than hallucinated tokens. To capture this document-induced effect, CORTEX compares internal representations of a large language model (LLM) under two conditions: with and without the retrieved documents.
Instead of relying solely on each token's immediate sensitivity to the retrieved documents, CORTEX also leverages the propagation of document-grounded information through preceding tokens, reducing false positives for tokens whose evidence has already been absorbed into the context. Finally, CORTEX applies post-processing smoothing step that models the tendency of hallucination labels to persist over contiguous spans, reducing local noise and encouraging span-consistent predictions.
Experiments on two RAG benchmarks and three LLMs show that CORTEX substantially improves token-level hallucination detection, with each component consistently contributing to performance gains.
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Quantifying Prior Dominance in Systems
The study introduces the Normalized Context Utilization (NCU) metric to evaluate Retrieval-Augmented Generation (RAG) systems, revealing that Small Language Models (SLMs) outperform larger models in factual extraction. The findings indicate that traditional scaling laws yield diminishing returns, with a commercial API frequently failing against adversarial evidence due to systemic confidence collapse.