PrologMCP: A Standardized Prolog Tool Interface for LLM Agents
Quick Answer
PrologMCP introduces a standardized Prolog tool interface, enhancing reasoning tasks for LLMs like Claude Sonnet 4.6 and GPT-4.1.
Quick Take
PrologMCP introduces a standardized Prolog tool interface, enhancing reasoning tasks for LLMs like Claude Sonnet 4.6 and GPT-4.1. In evaluations, a formalizer agent using PrologMCP achieved 100% accuracy on general tasks, outperforming standard models, while maintaining near-perfect results on challenging subsets, suggesting a robust alternative to extended natural-language reasoning.
Key Points
- PrologMCP is an open-source server using the .
- Formalizer agent achieved 100% accuracy on general PARARULE-Plus tasks.
- PrologMCP outperformed standard models like GPT-4.1 by significant margins.
- On challenging tasks, formalizer maintained accuracy of 1.00 compared to LLMs dropping to 0.95.
- Delegating inference to Prolog offers a robust alternative for LLMs.
Paper Resources
Article Content
From source RSS / original summaryarXiv:2606. 14935v1 Announce Type: new Abstract: Frontier reasoning-tuned language models still fail on deductive tasks at depth, and the cost of improved performance through extended internal reasoning scales poorly. Symbolic delegation offers a complementary route: a language model translates the problem, while a solver performs the inference. However, current autoformalization pipelines for logic programming are typically bespoke integrations tied to particular tasks or agents.
We introduce PrologMCP, a task-agnostic, open-source server that exposes Prolog as a stateful tool through the (MCP). Its compact tool interface, structured error reporting, and per-session isolation make the translate-run-inspect-repair loop a reusable primitive for MCP-capable agents. We evaluate a formalizer agent enhanced with PrologMCP against standard and reasoning LLMs (Claude Sonnet 4. 6, GPT-4.
1, and o4-mini) on two subsets of PARARULE-Plus: a general-purpose sample and a more challenging one targeting a specific failure mode of natural-language reasoning. On the general sample, the formalizer matches or exceeds reasoning LLMs (accuracy 1. 00 vs. \ 1. 00 / 0. 998), with the largest gains over standard models (0. 762 for GPT-4. 1). On the challenging subset, the formalizer remains near-perfect (1. 00 / 0. 99) while reasoning LLMs drop to 0. 95 / 0. 94.
These results suggest that delegating inference to Prolog via MCP is a robust and inspectable alternative to extended natural-language reasoning.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.AI
See more →Arbor: Tree Search as a Cognition Layer for Autonomous Agents
Arbor introduces a multi-agent framework utilizing structured tree search for optimizing LLM inference, achieving up to 193% throughput-latency improvement compared to vendor-optimized systems. It employs an Orchestrator and Critic agent for stability and coordination, demonstrating hardware-agnostic performance with minimal variance.