MindZero: Learning Online Mental Reasoning With Zero Annotations

arXiv cs.AI·Shunchi Zhang, Jin Lu, Chuanyang Jin, Yichao Zhou, Zhining Zhang, Tianmin Shu

3h ago

·~2 min·6/2/2026·en·0

Quick Take

MindZero introduces a self-supervised reinforcement learning framework for online mental reasoning without annotations, significantly enhancing the ToM capabilities of multimodal large language models (MLLMs). It outperforms traditional model-based methods in both accuracy and efficiency across various tasks, demonstrating effective learning of mental reasoning as a self-supervised skill.

Key Points

MindZero trains MLLMs for robust online mental reasoning without requiring explicit annotations.
The framework rewards models for generating hypotheses that align with observed actions.
MindZero significantly improves accuracy and efficiency over traditional model-based methods.
Evaluation conducted in gridworld and household domains shows superior performance.
Self-supervised learning allows for effective mental reasoning skill acquisition.

Article Content

From source RSS / original summary

arXiv:2606. 00240v1 Announce Type: new Abstract: Effective real-world assistance requires AI agents with robust Theory of Mind (ToM): inferring human mental states from their behavior. Despite recent advances, several key challenges remain, including (1) online inference with robust uncertainty updates over multiple hypotheses; (2) efficient reasoning suitable for real-time assistance; and (3) the lack of ground-truth mental state annotations in real-world domains.

We address these challenges by introducing MindZero, a self-supervised reinforcement learning framework that trains multimodal large language models (MLLMs) for efficient and robust online mental reasoning. During training, the model is rewarded for generating mental state hypotheses that maximize the likelihood of observed actions estimated by a planner, similar to model-based ToM reasoning. This method thus eliminates the need for explicit mental state annotations.

After training, MindZero internalizes model-based reasoning into fast single-pass inference. We evaluate MindZero against baselines across challenging mental reasoning and AI assistance tasks in gridworld and household domains. We found that LLMs alone are insufficient; model-based methods improve accuracy but are slow, costly, and limited by backbone MLLM capacity.

In contrast, MindZero enhances MLLMs' intrinsic ToM ability and significantly outperforms model-based methods in both accuracy and efficiency, showing that mental reasoning can be effectively learned as a self-supervised skill.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.AI

See more →

arXiv cs.AI·Aliaksei Korshuk, Alexander Buyantuev, Ilya Makarov

3h ago

FeaturedOriginal

MindGames Arena Generalization Track: In2AI Solution with Delayed Per-Step Reward Attribution

AI Summary

The In2AI solution introduces delayed per-step reward attribution for training language model agents in multi-agent environments, achieving top performance on the MindGames Arena benchmark at NeurIPS 2025. An 8-billion-parameter model outperformed larger proprietary systems, including GPT-5, in competitive play, demonstrating enhanced stability and sample efficiency in reinforcement learning.

#LLM #Agent #Inference #AI Startup