Thinking Past the Answer: Evaluating Harmful Overthinking in Large Reasoning Models
Quick Take
Research reveals that Large Reasoning Models (LRMs) often overthink, leading to accuracy drops of up to 21%. The study introduces a new evaluation protocol that distinguishes between harmless and harmful overthinking, showing that many reasoning tasks require minimal reasoning. Early stopping strategies reduce verbose overthinking but do not address harmful reasoning deviations caused by logical drift.
Key Points
- Many reasoning-intensive tasks require surprisingly little reasoning from LRMs.
- Stopping at the first correct answer improves accuracy by up to 21%.
- Early stopping reduces verbose overthinking by 50% but not harmful overthinking.
- Correctness deviations are mainly due to logical drift and visual reinterpretation.
- Findings generalize to language-only reasoning benchmarks, indicating broader reliability risks.
Article Content
From source RSS / original summaryarXiv:2606. 02835v1 Announce Type: new Abstract: Large Reasoning Models (LRMs) improve performance by generating explicit intermediate reasoning traces through increased test-time compute, yet the assumption that longer reasoning is consistently beneficial remains under-examined. While recent evidence shows that additional reasoning can lead models to overthink, we ask: "Once a model has reached the correct answer, does further reasoning refine the solution, or deviate from it?
" To study the dynamics after correctness, we introduce a prefix-level trajectory evaluation protocol grounded in reasoning sufficiency, defining the minimum reasoning budget required for a model to first generate the correct answer. This allows us to disentangle verbose overthinking, where additional reasoning is redundant but harmless, from harmful overthinking, where continued reasoning destabilizes an already-correct trajectory.
Starting from multimodal benchmarks, we find that many instances considered reasoning-intensive require surprisingly little reasoning. Moreover, stopping at the first correct prefix improves accuracy over standard reasoning up to 21%, revealing that current models are limited not only by their ability to reason, but also by their inability to stop at the right time.
Furthermore, while common efficiency strategies like early stopping substantially reduce verbose overthinking (up to 50%), they fail to mitigate harmful overthinking. Failure analysis reveals that correctness deviations are mainly driven by logical drift and visual reinterpretation. Finally, we show that our findings generalize to language-only reasoning benchmarks, highlighting harmful overthinking as a broader reliability risk. Code available at https://simonecaldarella. github. io/thinking-past-the-answer.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.AI
See more →AUDITFLOW: Executable Symbolic Environments for Structured Financial Reporting Verification
AuditFlow introduces a multi-agent framework for structured financial reporting verification, achieving 82.09% accuracy with GPT-5.5, outperforming the baseline by 14.93 points. It utilizes a symbolic environment for effective audit processes, demonstrating the necessity of deterministic checks for reliable verification.