Better Later Than Sooner: Neuro-Symbolic Knowledge Graph Construction via Ontology-grounded Post-extraction Correction
Quick Take
This paper introduces a neuro-symbolic framework for constructing knowledge graphs (KGs) that integrates ontology-grounded post-extraction corrections, enhancing consistency and reducing token usage in QA tasks. By leveraging LLMs for targeted corrections, the approach maintains high-quality symbolic querying capabilities, particularly for complex multi-hop reasoning queries.
Key Points
- Proposes a neuro-symbolic framework for ontology-grounded knowledge graph construction.
- Reduces token usage by deferring corrections to a post-extraction stage.
- Utilizes LLMs to correct ontology violations without repeated calls.
- Enhances KG consistency, improving downstream question answering quality.
- Demonstrates suitability for symbolic querying through SPARQL graph patterns.
Article Content
From source RSS / original summaryarXiv:2605. 29168v1 Announce Type: new Abstract: Question answering (QA) is a core challenge in AI, particularly for complex queries requiring multi-hop reasoning across documents, or symbolic operations like aggregation or exhaustive listing. Retrieval-augmented generation has become the dominant approach to QA, with recent graph-based variants addressing part of these issues by organizing knowledge to better support compositional questions.
However, most textual graph-based RAG methods still lack the structure needed for symbolic operations useful to answer complex questions reliably. This motivates symbolic graph-based approaches, which extract knowledge graphs (KGs) whose relations are logic predicates that enable SQL-like querying. Yet these pipelines typically use LLMs for KG extraction, which can introduce consistency issues, where extracted facts may violate commonsense ontology constraints.
We propose a neuro-symbolic framework for ontology-grounded KG construction combining open-domain extraction, embedding-based canonicalization of types and predicates, and targeted LLM-based correction of ontology violations. By deferring corrections to a post-extraction stage, our method avoids repeated LLM calls, substantially reducing token usage while improving KG consistency and preserving downstream QA quality.
Finally, we show that the extracted KGs are well suited for symbolic querying by measuring the occurrence of SPARQL graph patterns.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.AI
See more →The Importance of Out-of-Band Metadata for Safe Autonomous Agents: The Redpanda Agentic Data Plane
The Redpanda Agentic Data Plane (ADP) introduces out-of-band metadata channels to enhance the safety of autonomous AI agents, ensuring secure data access and tamper-proof audit trails. This architecture mitigates risks associated with unpredictable AI behavior by enforcing governance throughout the agent lifecycle, demonstrated in a multi-agent trading system with strict data scoping and approval thresholds.