Capability Self-Assessment: Teaching LLMs to Know Their Limits

arXiv cs.AI·Haoyan Yang, Reza Shirkavand, Yukai Jin, Jiawei Zhou, Shangqian Gao, Heng Huang

3h ago

·~1 min·6/2/2026·en·0

Quick Take

Modern large language models (LLMs) struggle with self-assessment, often overestimating their capabilities. This study introduces Capability Self-Assessment (CSA) as a policy-learning problem, demonstrating that reinforcement learning significantly enhances CSA performance compared to supervised fine-tuning, while preserving original model capabilities. The findings suggest CSA can improve decision-making and data selection in AI systems.

Key Points

LLMs consistently overestimate their competence and misjudge problem-solving capabilities.
Reinforcement learning outperforms supervised fine-tuning in enhancing CSA.
CSA shows strong generalization beyond training data distributions.
Improved CSA aids in local-cloud decision-making during inference.
CSA provides valuable signals for targeted data selection in training.

Article Excerpt

From source RSS / original summary

arXiv:2606. 00251v1 Announce Type: new Abstract: The ability to recognize one's own limitations and decide whether to solve a problem or delegate is fundamental for reliable intelligent systems. Yet we show that modern large language models systematically lack this ability: across diverse model families and scales, they overestimate their competence and attempt queries they cannot solve.

We refer to this ability as Capability Self-Assessment (CSA) and formulate it as a policy-learning problem, aiming to improve self-assessment while preserving the model's original capabilities. Our results show that reinforcement learning teaches CSA effectively, significantly outperforming supervised fine-tuning while preserving original capabilities. In contrast, supervised fine-tuning severely degrades the capabilities the model is meant to assess.

Moreover, learned self-assessment behavior generalizes well out of distribution, suggesting that CSA is a transferable model trait. Finally, CSA is practically useful: it improves local-cloud decision making at inference time and provides a signal for targeted data selection during training.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.AI

See more →

arXiv cs.AI·Aliaksei Korshuk, Alexander Buyantuev, Ilya Makarov

3h ago

FeaturedOriginal

MindGames Arena Generalization Track: In2AI Solution with Delayed Per-Step Reward Attribution

AI Summary

The In2AI solution introduces delayed per-step reward attribution for training language model agents in multi-agent environments, achieving top performance on the MindGames Arena benchmark at NeurIPS 2025. An 8-billion-parameter model outperformed larger proprietary systems, including GPT-5, in competitive play, demonstrating enhanced stability and sample efficiency in reinforcement learning.

#LLM #Agent #Inference #AI Startup