OPINE-World: Programmatic World Modeling with Ontology-error-Prioritized Interactive Exploration

arXiv cs.AI·David Courtis, Wenhao Li, Scott Sanner

3h ago

·~1 min·7/3/2026·en·0

Quick Answer

OPINE-World is an LLM agent that learns object-centric programmatic world models through interaction, achieving an action-efficiency score of 78.4 on the ARC-AGI-3 benchmark, solving 20 out of 25 games without per-game training.

Quick Take

OPINE-World is an LLM agent that learns object-centric programmatic world models through interaction, achieving an action-efficiency score of 78.4 on the -3 benchmark, solving 20 out of 25 games without per-game training.

Key Points

OPINE-World uses a loop of hypothesis and testing with two cooperating agents.
The model employs Bayesian measures of object-type adequacy termed ontology error.
It demonstrates data efficiency and reusability compared to traditional deep network models.
The benchmark ARC-AGI-3 tests skill-acquisition efficiency with withheld object vocabulary.
OPINE-World's performance surpasses human baseline in action efficiency.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Article Content

From source RSS / original summary

arXiv:2607. 01531v1 Announce Type: new Abstract: Learning how an environment behaves from interaction is central to building agents that adapt to unfamiliar tasks. World models learned with deep networks are flexible but data-hungry and transfer poorly beyond their training distribution.

Program-synthesized world models, written as source code by LLMs and refined through counterexample-guided inductive synthesis (CEGIS), are instead data-efficient and reusable, yet they have been demonstrated mainly on structured-state worlds with a given object vocabulary, and a single program search does not scale to pixel-rendered environments whose object structure must be hypothesized flexibly.

We introduce OPINE-World, an LLM agent that learns an object-centric programmatic world model online from interaction. OPINE-World couples two cooperating agents in a loop of hypothesis and test, one acting in the environment and one synthesizing the model in code with replay verification and model-based planning, and it steers exploration with a Bayesian measure of object-type adequacy we call ontology error.

We evaluate OPINE-World on -3, a benchmark for skill-acquisition efficiency in which the object vocabulary, the goal, and the action semantics are withheld. OPINE-World solves 20 of 25 games without per-game training and reaches an action-efficiency score of 78. 4 against the human baseline.

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.AI

See more →

arXiv cs.AI·Ye Liu, Srijan Bansal, Bo Pang, Yang Li, Zeyu Leo Liu, Yifei Ming, Zixuan Ke, Shafiq Joty, Semih Yavuz

3h ago

FeaturedOriginal

Procedural Memory Distillation: Online Reflection for Self-Improving Language Models

AI Summary

Procedural Memory Distillation (PMD) enhances reinforcement learning by converting cross-episode signals into reusable memory, improving Qwen3-8B and OLMo3-Instruct-7B models by 3.8-5.5% on SCIKNOWEVAL and 7.9-13.6% on . The co-evolution of policy and memory allows for more effective self-supervision, demonstrating significant performance gains when both components are active.

#LLM #AI Coding #Inference #Policy