Zero-Shot Goal Recognition with Large Language Models
Quick Take
The paper evaluates large language models for zero-shot goal recognition, revealing uneven competence across models.
Key Points
- LLMs show varying performance in goal recognition tasks.
- Some models excel with evidence, others rely on prior knowledge.
- Goal recognition serves as a benchmark for LLM planning knowledge.
📖 Reader Mode
~2 min readAbstract:Large language models have recently reached near-parity with classical planners on well-known planning domains, yet this competence relies on world-knowledge exploitation rather than genuine symbolic reasoning. Goal recognition is a complementary abductive task structurally better suited to LLM strengths: it consists of evaluating consistency with world knowledge rather than generating novel action sequences. This paper provides the first systematic zero-shot evaluation of frontier LLMs as goal recognisers on key classical PDDL benchmarks. Our results show that LLM competence on goal recognition is uneven: some models scale with evidence and approach landmark-based accuracy at full observations, while others remain anchored to world-knowledge priors regardless of how much evidence accumulates. Qualitative analysis of model reasoning traces reveals that this divergence reflects a fundamental difference in evidence integration rather than domain familiarity. These findings position goal recognition as a principled benchmark for the foundational planning knowledge of LLMs.
| Comments: | 9 pages, 1 figure, 1 table; appendix with 8 figures and 2 code listings (29 pages total); submitted to NeurIPS 2026 |
| Subjects: | Artificial Intelligence (cs.AI) |
| Cite as: | arXiv:2605.15333 [cs.AI] |
| (or arXiv:2605.15333v1 [cs.AI] for this version) | |
| https://doi.org/10.48550/arXiv.2605.15333 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Felipe Meneguzzi [view email]
[v1]
Thu, 14 May 2026 18:56:06 UTC (103 KB)
— Originally published at arxiv.org
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.AI
See more →From Prompts to Protocols: An AI Agent for Laboratory Automation
An AI agent integrates large language models for automating laboratory protocols, enhancing efficiency and accuracy.