Reasoning Can Be Restored by Correcting a Few Decision Tokens

arXiv cs.AI·Changshuo Shen, Leheng Sheng, Yuxin Chen, An Zhang, Xiang Wang

1d ago

·~2 min·5/19/2026·en·1

Quick Take

Correcting a few decision tokens can significantly enhance reasoning in large models.

Key Points

Base models struggle with early planning-related tokens.
Disagreement-guided intervention improves reasoning performance.
Sparse token corrections can surpass larger reasoning models.

📖 Reader Mode

~2 min read

[Submitted on 16 May 2026]

View PDF HTML (experimental)

Abstract:Large reasoning models (LRMs) substantially outperform their base LLM counterparts on challenging reasoning benchmarks, yet it remains poorly understood where base models go wrong during token-by-token generation and how to narrow this gap efficiently. We study the base-reasoning gap through quantifying token-level distributional disagreement between a base model and a stronger reasoning model using likelihood-based divergences. Across benchmarks, we find that the reasoning advantage is highly sparse and concentrates on a small set of early, planning-related decision tokens. For instance, on Qwen3-0.6B, only ~8% of generated tokens account for the salient disagreement, and these tokens concentrate early in the response, are strongly enriched in planning-related decisions (17x), and coincide with high base-model uncertainty -- suggesting that base models fail mainly at early planning points that steer the subsequent reasoning trajectory. Building on these findings, we propose disagreement-guided token intervention, a simple inference-time delegation scheme that performs a one-token takeover by the reasoning model only at high-disagreement positions and immediately switches back to the base model. With a small intervention budget, this sparse delegation substantially recovers and can even surpass the performance of a same-size reasoning model on challenging reasoning tasks. Code is available at this https URL.

Comments:	Accepted at ICML 2026
Subjects:	Artificial Intelligence (cs.AI)
Cite as:	arXiv:2605.16874 [cs.AI]
	(or arXiv:2605.16874v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2605.16874 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Changshuo Shen [view email]
[v1] Sat, 16 May 2026 08:33:31 UTC (3,170 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

Reasoning Can Be Restored by Correcting a Few Decision Tokens

Quick Take

Key Points

📖 Reader Mode

Submission history

More from arXiv cs.AI

From Prompts to Protocols: An AI Agent for Laboratory Automation

Agentic Trading: When LLM Agents Meet Financial Markets

Invisible Orchestrators Suppress Protective Behavior and Dissociate Power-Holders: Safety Risks in Multi-Agent LLM Systems

Related in this space

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?