TCAR-Gen: Temporal Graph Retrieval with Evidence Fusion for Knowledge-Grounded Generation
Quick Take
TCAR-Gen introduces a novel framework for knowledge-grounded generation, achieving 0.3738 Recall@5 on the Victorian Crime Diaries benchmark, outperforming existing models like Vanilla RAG and GraphRAG. It integrates query-conditioned graph neural networks and temporal evidence fusion, crucial for effective temporal reasoning in complex queries.
Key Points
- TCAR-Gen combines graph neural networks and temporal evidence fusion for improved reasoning.
- Achieves 0.3738 Recall@5, outperforming Vanilla RAG and GraphRAG models.
- Critical components include context graph, temporal penalty mechanism, and query conditioning.
- Maintains robust retrieval coverage across various language model sizes.
- Explicit temporal modeling is essential for accurate, reasoning-intensive question answering.
Article Content
From source RSS / original summaryarXiv:2606. 00029v1 Announce Type: new Abstract: Retrieval-augmented generation systems struggle with temporal reasoning and evidence fusion when answering complex questions over historical criminal case narratives. Existing approaches either retrieve independently of query semantics or fail to integrate multiple evidence sources coherently.
We propose Temporal Context Augmented Retrieval Generation (TCAR-Gen), a framework that combines query-conditioned graph neural networks, temporal evidence fusion, and chain-of-trees reasoning to ground answer generation in retrieved evidence. On the Victorian Crime Diaries benchmark, TCAR-Gen achieves 0. 3738 Recall@5, outperforming Vanilla RAG, Temporal RAG, GraphRAG-C, and GraphRAG-T across seven query types including multi-hop reasoning and counterfactual questions.
Ablation studies reveal that the context graph, temporal penalty mechanism, and query conditioning are critical components. Cross-model evaluation across five language model (GPT-OSS 20B to TinyLlama 1. 1B) demonstrates that TCAR-Gen maintains robust retrieval coverage at smaller model scales, though generation quality degrades substantially with reduced model capacity.
Our work shows that explicit temporal modelling and multi-branch evidence fusion are essential for faithful, reasoning-intensive question answering over knowledge-grounded corpora.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The REFLECT benchmark reveals that current LLM judges are unreliable, achieving below 55% accuracy in evaluating reasoning and evidence use, highlighting the need for improved evaluation methods for deep research agents.