How Language Models Fail: Token-Level Signatures of Committed and Persistent Reasoning Failures

arXiv cs.CL·Tanvi Thoria, Kiana Jafari, Marc R. Schlichting, Mykel J. Kochenderfer

3h ago

·~1 min·6/8/2026·en·0

Quick Answer

This study identifies two distinct reasoning failure processes in language models: committed failure and persistent uncertainty, with empirical signatures found across 23 configurations.

Quick Take

This study identifies two distinct reasoning failure processes in language models: committed failure and persistent uncertainty, with empirical signatures found across 23 configurations. The findings reveal that early commitment to incorrect paths hinders detection, while accumulated uncertainty requires full trace analysis for accurate assessment, impacting self-consistency strategies.

Key Points

Committed failure occurs when models lock onto incorrect reasoning paths early in the trace.
Persistent uncertainty accumulates throughout the reasoning process, requiring full trace analysis.
The framework's predictions were validated in 20 out of 23 model-dataset configurations.
Identifying failure signatures can enhance detection strategies for language model reasoning.
Results indicate when uncertainty signals can complement or replace self-consistency checks.

Article Content

From source RSS / original summary

arXiv:2606. 06635v1 Announce Type: new Abstract: Failures in language model reasoning emerge through distinct processes that leave identifiable signatures in the reasoning trace. We characterize these failures using token-level uncertainty signals, finding they arise through two empirically distinguishable processes. The first is committed failure, in which a model locks onto an incorrect reasoning path early in its trace.

A central diagnostic signature is the commitment point, beyond which considering additional tokens hurt rather than help failure detection. In the second, persistent uncertainty, uncertainty instead accumulates throughout, and the full trace is needed to best distinguish failing from successful completions. These signatures reproduce across 23 model-dataset configurations, with the framework's falsifiable predictions holding in 20 of 23 cases, well above chance across both failure modes.

Finally, we demonstrate our failure mode framework has direct implications for self-consistency, identifying when uncertainty signals complement it and when it can be selectively skipped. These results offer a foundation for understanding when LLM reasoning failures become detectable and for adapting detection strategies accordingly.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CL

See more →

arXiv cs.CL·Leyao Wang, Yanan He, Peng Chen, Asaf Yehudai, Yixin Liu, Rex Ying, Michal Shmueli-Scheuer, Arman Cohan

2w ago

FeaturedOriginal

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

AI Summary

The REFLECT benchmark reveals that current LLM judges are unreliable, achieving below 55% accuracy in evaluating reasoning and evidence use, highlighting the need for improved evaluation methods for deep research agents.

#LLM #Agent #Inference #Policy