Capability Self-Assessment: Teaching LLMs to Know Their Limits
Quick Take
Modern large language models (LLMs) struggle with self-assessment, often overestimating their capabilities. This study introduces Capability Self-Assessment (CSA) as a policy-learning problem, demonstrating that reinforcement learning significantly enhances CSA performance compared to supervised fine-tuning, while preserving original model capabilities. The findings suggest CSA can improve decision-making and data selection in AI systems.
Key Points
- LLMs consistently overestimate their competence and misjudge problem-solving capabilities.
- Reinforcement learning outperforms supervised fine-tuning in enhancing CSA.
- CSA shows strong generalization beyond training data distributions.
- Improved CSA aids in local-cloud decision-making during inference.
- CSA provides valuable signals for targeted data selection in training.
Article Excerpt
From source RSS / original summaryarXiv:2606. 00251v1 Announce Type: new Abstract: The ability to recognize one's own limitations and decide whether to solve a problem or delegate is fundamental for reliable intelligent systems. Yet we show that modern large language models systematically lack this ability: across diverse model families and scales, they overestimate their competence and attempt queries they cannot solve.
We refer to this ability as Capability Self-Assessment (CSA) and formulate it as a policy-learning problem, aiming to improve self-assessment while preserving the model's original capabilities. Our results show that reinforcement learning teaches CSA effectively, significantly outperforming supervised fine-tuning while preserving original capabilities. In contrast, supervised fine-tuning severely degrades the capabilities the model is meant to assess.
Moreover, learned self-assessment behavior generalizes well out of distribution, suggesting that CSA is a transferable model trait. Finally, CSA is practically useful: it improves local-cloud decision making at inference time and provides a signal for targeted data selection during training.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.AI
See more →MindGames Arena Generalization Track: In2AI Solution with Delayed Per-Step Reward Attribution
The In2AI solution introduces delayed per-step reward attribution for training language model agents in multi-agent environments, achieving top performance on the MindGames Arena benchmark at NeurIPS 2025. An 8-billion-parameter model outperformed larger proprietary systems, including GPT-5, in competitive play, demonstrating enhanced stability and sample efficiency in reinforcement learning.