OSCToM: RL-Guided Adversarial Generation for High-Order Theory of Mind

arXiv cs.AI·Sharmin Sultana Srishty, Kazi Mahathir Rahman, Malaika Parizat Sakkhi, Samia Shahid Prianna, Shaikhul Islam Sinat

15h ago

·~1 min·5/22/2026·en·0

Quick Take

OSCToM enhances Theory of Mind reasoning in LLMs through RL-guided adversarial generation.

Key Points

Models nested belief conflicts in LLM tasks.
Achieves 76% accuracy on FANToM benchmark.
Code available on GitHub for further research.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.AI

See more →

arXiv cs.AI·Angelos Angelopoulos, James F. Cahoon, Ron Alterovitz

3d ago

FeaturedOriginal

From Prompts to Protocols: An AI Agent for Laboratory Automation

AI Summary

An AI agent integrates large language models for automating laboratory protocols, enhancing efficiency and accuracy.

#LLM #Agent #AI Coding #Enterprise AI

1

arXiv cs.AI·Yihan Xia, Panpan You, Taotao Wang, Fang Liu, Han Qi, Xiaoxiao Wu, Shengli Zhang

2d ago

FeaturedOriginal

Agentic Trading: When LLM Agents Meet Financial Markets

AI Summary

The paper reviews LLM-based trading agents, highlighting protocol incomparability and reproducibility challenges.

#LLM #Agent #AI Startup #Enterprise AI

3

arXiv cs.AI·Akshay Manglik (Emily), Apaar Shanker (Emily), Kaustubh Deshpande (Emily), Jason Qin (Emily), Yash Maurya (Emily), Veronica Chatrath (Emily), Vijay S. Kalmath (Emily), Levi Lentz (Emily), Yuan (Emily), Xue

15h ago

FeaturedOriginal

Insights Generator: Systematic Corpus-Level Trace Diagnostics for LLM Agents

AI Summary

The Insights Generator automates corpus-level diagnostics for LLM agents, enhancing performance through evidence-backed insights.

#LLM #Agent #Inference

1

Related in this space

See more →

arXiv cs.CL·Leyao Wang, Yanan He, Peng Chen, Asaf Yehudai, Yixin Liu, Rex Ying, Michal Shmueli-Scheuer, Arman Cohan

2d ago

FeaturedOriginal

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

AI Summary

The reliability of LLM judges for evaluating deep research agents is critically assessed using the REFLECT benchmark.

#LLM #Agent #Inference #Policy

2

33

Business impact20%0

Novelty (recency)10%99

≥75 high · 50–74 medium · <50 low

Why Featured

OSCToM's RL-guided adversarial generation improves LLMs' Theory of Mind capabilities, signaling advancements in AI understanding of human-like reasoning, crucial for developers and PMs in creating more intuitive applications.