Strikingness-Aware Evaluation for Temporal Knowledge Graph Reasoning

arXiv cs.AI·Rikui Huang, Shengzhe Zhang, Wei Wei

3d ago

·~2 min·5/14/2026·en·3

Quick Take

The paper introduces a strikingness-aware evaluation framework for improving Temporal Knowledge Graph Reasoning.

Key Points

Current TKGR evaluations overestimate reasoning abilities.
Strikingness-aware metrics emphasize rare, complex events.
Experiments show varied model performance based on event strikingness.

📖 Reader Mode

~2 min read

[Submitted on 13 May 2026]

View PDF HTML (experimental)

Abstract:Temporal Knowledge Graph Reasoning (TKGR) aims at inferring missing (especially future) events from historical data. Current evaluation in TKGR uniformly weights all events, ignoring that most are trivial repetitions, which overestimate the true reasoning ability. Therefore, the rare outstanding events, whose prediction demands deeper reasoning, should be distinguished and emphasized. To this end, we propose a strikingness-aware evaluation framework, which introduces a rule-based strikingness measuring framework (RSMF) to quantify event strikingness by comparing its expected occurrence with peer events derived from temporal rules. Strikingness is then integrated as a weighting factor into metrics like weighted MRR and Hits@k. Experiments on four TKG benchmarks reveal: 1) All representative models perform worse as event strikingness increases, 2) Path-based methods excel on low-strikingness events and representation-based ones on high-strikingness events, 3) We design an ensemble method whose gains stem from fitting trivial events rather than reasoning improvement. Our framework provides a more rigorous evaluation, refocusing the field on predicting outstanding events.

Comments:	Accepted to IJCAI-ECAI 2026
Subjects:	Artificial Intelligence (cs.AI)
Cite as:	arXiv:2605.13153 [cs.AI]
	(or arXiv:2605.13153v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2605.13153 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Rikui Huang [view email]
[v1] Wed, 13 May 2026 08:17:54 UTC (755 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

Strikingness-Aware Evaluation for Temporal Knowledge Graph Reasoning

Quick Take

Key Points

📖 Reader Mode

Submission history

More from arXiv cs.AI

Invisible Orchestrators Suppress Protective Behavior and Dissociate Power-Holders: Safety Risks in Multi-Agent LLM Systems

Distribution-Aware Algorithm Design with LLM Agents

Enhanced and Efficient Reasoning in Large Learning Models

Related in this space

Measuring and Mitigating Toxicity in Large Language Models: A Comprehensive Replication Study