Stepwise Reasoning Enhancement for LLMs via External Subgraph Generation
Quick Take
The SGR framework enhances large language models by integrating external knowledge graphs through query-relevant subgraph generation, improving reasoning accuracy on benchmarks like CWQ and WebQSP. Experiments show significant performance gains, with schema guidance and Neo4j retrieval being critical for effectiveness.
Key Points
- SGR improves reasoning accuracy and Hits@1 performance on CWQ, WebQSP, GrailQA, and KQA Pro.
- The framework constructs structured schemas from input questions to guide reasoning.
- Subgraphs provide explicit relational evidence for step-by-step reasoning.
- Collaborative reasoning integrates multiple paths for validating candidate answers.
- Ablation studies confirm the importance of schema guidance and Neo4j-based retrieval.
Article Content
From source RSS / original summaryarXiv:2606. 04454v1 Announce Type: new Abstract: Large language models have shown strong performance in natural language generation and downstream reasoning tasks, but they still struggle with logical consistency, factual grounding, and interpretability in complex multi-step reasoning. To address these limitations, this paper proposes SGR, a stepwise reasoning enhancement framework that integrates large language models with external knowledge graphs through query-relevant subgraph generation.
Given an input question, SGR first extracts key entities, relations, and constraints to construct a structured schema, then retrieves compact subgraphs from a knowledge graph using schema-guided querying. The generated subgraphs provide explicit relational evidence that guides the language model through step-by-step reasoning.
In addition, SGR combines direct Cypher-based reasoning with collaborative reasoning integration, allowing candidate answers from multiple reasoning paths to be validated and aggregated according to both model confidence and graph consistency. Experiments on benchmark datasets including CWQ, WebQSP, GrailQA, and KQA Pro demonstrate that SGR improves reasoning accuracy and Hits@1 performance over standard prompting and several knowledge-enhanced baselines.
Ablation studies further show that schema guidance and Neo4j-based retrieval are both crucial to the effectiveness of the framework. These results indicate that dynamically generated external subgraphs can improve the accuracy, robustness, and interpretability of LLM-based reasoning.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The REFLECT benchmark reveals that current LLM judges are unreliable, achieving below 55% accuracy in evaluating reasoning and evidence use, highlighting the need for improved evaluation methods for deep research agents.