Taming the Thinker: Conditional Entropy Shaping for Adaptive LLM Reasoning

arXiv cs.CL·Shuyu Wei, Jian Sun, Delai Qiu, Yining Wang, Shengping Liu, Jiaen Liang, Ying Fu, Wei Huang, Jitao Sang

17h ago

·~2 min·5/20/2026·en·0

Quick Take

Conditional Entropy Shaping enhances LLM reasoning by balancing response length and accuracy.

Key Points

Introduces a framework for dynamic token-level entropy control.
Improves accuracy while reducing response length on benchmarks.
Encourages exploration and error correction in reasoning.

📖 Reader Mode

~2 min read

[Submitted on 19 May 2026]

View PDF HTML (experimental)

Abstract:Entropy-based deep reasoning has emerged as a promising direction for improving the reasoning capabilities of Large Language Models (LLMs), but existing methods often either increase response length indiscriminately or shorten responses at the cost of accuracy. To better balance this trade-off, we introduce Conditional Entropy Shaping (CES), a framework that dynamically controls token-level response entropy, enabling LLMs to produce concise solutions on simple problems while encouraging deeper exploration on hard ones. Built on DAPO, CES uses token-level entropy as an uncertainty signal and applies a conditional bidirectional policy: it penalizes high-entropy "forking point" tokens on correct reasoning paths to improve conciseness, and rewards them on incorrect paths to encourage exploration and error correction. We implement CES on DeepSeek-R1-Distill-7B and evaluate it on 12 mathematical benchmarks. CES consistently improves average accuracy while reducing response length relative to DAPO, and supplementary experiments show similar trends on a smaller 1.5B backbone and on out-of-domain benchmarks.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2605.19358 [cs.CL]
	(or arXiv:2605.19358v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2605.19358 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Shuyu Wei [view email]
[v1] Tue, 19 May 2026 04:41:51 UTC (249 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

Taming the Thinker: Conditional Entropy Shaping for Adaptive LLM Reasoning

Quick Take

Key Points

📖 Reader Mode

Submission history

More from arXiv cs.CL

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

Diagnosing Multi-step Reasoning Failures in Black-box LLMs via Stepwise Confidence Attribution

MMoA: An AI-Agent framework with recurrence for Memoried Mixure-of-Agent

Related in this space

From Prompts to Protocols: An AI Agent for Laboratory Automation

Agentic Trading: When LLM Agents Meet Financial Markets