OPINE-World: Programmatic World Modeling with Ontology-error-Prioritized Interactive Exploration
Quick Answer
OPINE-World is an LLM agent that learns object-centric programmatic world models through interaction, achieving an action-efficiency score of 78.4 on the ARC-AGI-3 benchmark, solving 20 out of 25 games without per-game training.
Quick Take
OPINE-World is an LLM agent that learns object-centric programmatic world models through interaction, achieving an action-efficiency score of 78.4 on the -3 benchmark, solving 20 out of 25 games without per-game training.
Key Points
- OPINE-World uses a loop of hypothesis and testing with two cooperating agents.
- The model employs Bayesian measures of object-type adequacy termed ontology error.
- It demonstrates data efficiency and reusability compared to traditional deep network models.
- The benchmark ARC-AGI-3 tests skill-acquisition efficiency with withheld object vocabulary.
- OPINE-World's performance surpasses human baseline in action efficiency.
Paper Resources
Article Content
From source RSS / original summaryarXiv:2607. 01531v1 Announce Type: new Abstract: Learning how an environment behaves from interaction is central to building agents that adapt to unfamiliar tasks. World models learned with deep networks are flexible but data-hungry and transfer poorly beyond their training distribution.
Program-synthesized world models, written as source code by LLMs and refined through counterexample-guided inductive synthesis (CEGIS), are instead data-efficient and reusable, yet they have been demonstrated mainly on structured-state worlds with a given object vocabulary, and a single program search does not scale to pixel-rendered environments whose object structure must be hypothesized flexibly.
We introduce OPINE-World, an LLM agent that learns an object-centric programmatic world model online from interaction. OPINE-World couples two cooperating agents in a loop of hypothesis and test, one acting in the environment and one synthesizing the model in code with replay verification and model-based planning, and it steers exploration with a Bayesian measure of object-type adequacy we call ontology error.
We evaluate OPINE-World on -3, a benchmark for skill-acquisition efficiency in which the object vocabulary, the goal, and the action semantics are withheld. OPINE-World solves 20 of 25 games without per-game training and reaches an action-efficiency score of 78. 4 against the human baseline.
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.AI
See more →Procedural Memory Distillation: Online Reflection for Self-Improving Language Models
Procedural Memory Distillation (PMD) enhances reinforcement learning by converting cross-episode signals into reusable memory, improving Qwen3-8B and OLMo3-Instruct-7B models by 3.8-5.5% on SCIKNOWEVAL and 7.9-13.6% on . The co-evolution of policy and memory allows for more effective self-supervision, demonstrating significant performance gains when both components are active.