How Language Models Fail: Token-Level Signatures of Committed and Persistent Reasoning Failures
Quick Answer
This study identifies two distinct reasoning failure processes in language models: committed failure and persistent uncertainty, with empirical signatures found across 23 configurations.
Quick Take
This study identifies two distinct reasoning failure processes in language models: committed failure and persistent uncertainty, with empirical signatures found across 23 configurations. The findings reveal that early commitment to incorrect paths hinders detection, while accumulated uncertainty requires full trace analysis for accurate assessment, impacting self-consistency strategies.
Key Points
- Committed failure occurs when models lock onto incorrect reasoning paths early in the trace.
- Persistent uncertainty accumulates throughout the reasoning process, requiring full trace analysis.
- The framework's predictions were validated in 20 out of 23 model-dataset configurations.
- Identifying failure signatures can enhance detection strategies for language model reasoning.
- Results indicate when uncertainty signals can complement or replace self-consistency checks.
Article Content
From source RSS / original summaryarXiv:2606. 06635v1 Announce Type: new Abstract: Failures in language model reasoning emerge through distinct processes that leave identifiable signatures in the reasoning trace. We characterize these failures using token-level uncertainty signals, finding they arise through two empirically distinguishable processes. The first is committed failure, in which a model locks onto an incorrect reasoning path early in its trace.
A central diagnostic signature is the commitment point, beyond which considering additional tokens hurt rather than help failure detection. In the second, persistent uncertainty, uncertainty instead accumulates throughout, and the full trace is needed to best distinguish failing from successful completions. These signatures reproduce across 23 model-dataset configurations, with the framework's falsifiable predictions holding in 20 of 23 cases, well above chance across both failure modes.
Finally, we demonstrate our failure mode framework has direct implications for self-consistency, identifying when uncertainty signals complement it and when it can be selectively skipped. These results offer a foundation for understanding when LLM reasoning failures become detectable and for adapting detection strategies accordingly.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The REFLECT benchmark reveals that current LLM judges are unreliable, achieving below 55% accuracy in evaluating reasoning and evidence use, highlighting the need for improved evaluation methods for deep research agents.