Detecting Hallucinations for Large Language Model-based Knowledge Graph Reasoning
Quick Answer
The paper introduces LUCID, a novel hallucination detection method for LLM-based knowledge graph reasoning, which integrates LLM attention scores, KG semantics, and structural information.
Quick Take
The paper introduces LUCID, a novel hallucination detection method for LLM-based knowledge graph reasoning, which integrates LLM attention scores, KG semantics, and structural information. Experiments demonstrate that LUCID outperforms 15 baselines across nine datasets, addressing critical misinformation issues in existing frameworks.
Key Points
- LUCID is the first method specifically designed for hallucination detection in KG reasoning.
- It utilizes attention scores, semantic similarities, and KG structure via a graph neural network.
- The method was evaluated on nine benchmark datasets, achieving state-of-the-art results.
- LUCID addresses the limitations of existing detection methods that ignore KG structural information.
- This advancement is crucial for improving the reliability of LLM-based decision support systems.
Paper Resources
Article Content
From source RSS / original summaryarXiv:2606. 19351v1 Announce Type: new Abstract: Knowledge graph (KG) reasoning infers new knowledge from existing facts and is widely applied in question answering, recommendation, and decision support. With the rapid development of large language models (LLMs), LLM-based KG reasoning frameworks have become increasingly popular by leveraging retrieved KG information. However, hallucinations in LLMs remain a critical issue.
Even when relevant KG knowledge is incorporated, models may still generate incorrect outputs, leading to misinformation and unreliable decisions. Existing hallucination detection methods either focus on LLM internal states or verify consistency with retrieved contexts, but both overlook the structural information in KGs, resulting in suboptimal performance. To address this gap, we propose LUCID, the first halLUcination deteCtIon method for LLM-based knowleDge graph reasoning frameworks.
LUCID jointly leverages LLM attention scores, KG semantics, and structural information. Specifically, it extracts node and edge features from attention scores and semantic similarities, and integrates them with KG structure using a graph neural network. We also construct manually annotated benchmark datasets for evaluation. Experiments on nine datasets show that LUCID achieves state of the art performance compared to 15 baselines.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The REFLECT benchmark reveals that current LLM judges are unreliable, achieving below 55% accuracy in evaluating reasoning and evidence use, highlighting the need for improved evaluation methods for deep research agents.