Zero-Shot Goal Recognition with Large Language Models

arXiv cs.AI·Kin Max Piamolini Gusm\~ao, Nathan Gavenski, Nir Oren, Felipe Meneguzzi

4d ago

·~2 min·5/18/2026·en·1

Quick Take

The paper evaluates large language models for zero-shot goal recognition, revealing uneven competence across models.

Key Points

LLMs show varying performance in goal recognition tasks.
Some models excel with evidence, others rely on prior knowledge.
Goal recognition serves as a benchmark for LLM planning knowledge.

📖 Reader Mode

~2 min read

[Submitted on 14 May 2026]

View PDF HTML (experimental)

Abstract:Large language models have recently reached near-parity with classical planners on well-known planning domains, yet this competence relies on world-knowledge exploitation rather than genuine symbolic reasoning. Goal recognition is a complementary abductive task structurally better suited to LLM strengths: it consists of evaluating consistency with world knowledge rather than generating novel action sequences. This paper provides the first systematic zero-shot evaluation of frontier LLMs as goal recognisers on key classical PDDL benchmarks. Our results show that LLM competence on goal recognition is uneven: some models scale with evidence and approach landmark-based accuracy at full observations, while others remain anchored to world-knowledge priors regardless of how much evidence accumulates. Qualitative analysis of model reasoning traces reveals that this divergence reflects a fundamental difference in evidence integration rather than domain familiarity. These findings position goal recognition as a principled benchmark for the foundational planning knowledge of LLMs.

Comments:	9 pages, 1 figure, 1 table; appendix with 8 figures and 2 code listings (29 pages total); submitted to NeurIPS 2026
Subjects:	Artificial Intelligence (cs.AI)
Cite as:	arXiv:2605.15333 [cs.AI]
	(or arXiv:2605.15333v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2605.15333 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Felipe Meneguzzi [view email]
[v1] Thu, 14 May 2026 18:56:06 UTC (103 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

Zero-Shot Goal Recognition with Large Language Models

Quick Take

Key Points

📖 Reader Mode

Submission history

Want this in your inbox every morning?

More from arXiv cs.AI

From Prompts to Protocols: An AI Agent for Laboratory Automation

Agentic Trading: When LLM Agents Meet Financial Markets

Insights Generator: Systematic Corpus-Level Trace Diagnostics for LLM Agents

Related in this space

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?