When to Think Deeply: Inhibitory Deliberation for LLM Reasoning

arXiv cs.CL·Zhixuan He, Yue Feng

3h ago

·~1 min·6/8/2026·en·0

Quick Answer

Quick Take

The IDPR framework enhances Large Language Model reasoning by selectively invoking slow deliberation, improving accuracy from 47.90% to 48.92% on a math test set while only using slow reasoning 8.20% of the time. This method outperforms random routing and confidence-based baselines in identifying when to apply deeper reasoning.

Key Points

IDPR uses an inhibition controller to decide on slow reasoning necessity.
Only 8.20% of examples required slow reasoning, optimizing computational costs.
Accuracy improved from 47.90% to 48.92% on a 5,000-example math test.
Random routing decreased accuracy to 46.76%, highlighting IDPR's efficiency.
IDPR achieved the highest corrective precision in identifying beneficial fast answers.

Article Content

From source RSS / original summary

arXiv:2606. 06745v1 Announce Type: new Abstract: Reasoning Large Language Models can improve problem-solving performance through deliberative inference, but invoking slow reasoning for every input is computationally expensive and often unnecessary. We propose IDPR, a framework for response-conditioned inhibitory deliberation. IDPR first generates a concise intuitive answer and then uses an inhibition controller to decide whether that specific response should be released or suppressed in favor of slow reasoning.

Unlike input-only routers, the inhibition controller conditions on the fast answer and fast-side evidence, including confidence, logit margin, parseability, and generation cost. We train the controller from paired fast-slow outcomes and select the inhibition threshold on a held-out validation set under an accuracy-first slow-call budget. On a held-out 5,000-example mathematical reasoning test set, IDPR invokes slow reasoning on only 8. 20% of examples and improves accuracy from 47. 90% to 48. 92%.

Under the same slow-call budget, random routing decreases accuracy to 46. 76%, while the strongest confidence-based baseline reaches 48. 22%. IDPR also achieves the highest corrective precision, showing that response-conditioned inhibition better identifies fast answers that benefit from slow reasoning.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CL

See more →

arXiv cs.CL·Leyao Wang, Yanan He, Peng Chen, Asaf Yehudai, Yixin Liu, Rex Ying, Michal Shmueli-Scheuer, Arman Cohan

2w ago

FeaturedOriginal

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

AI Summary

The REFLECT benchmark reveals that current LLM judges are unreliable, achieving below 55% accuracy in evaluating reasoning and evidence use, highlighting the need for improved evaluation methods for deep research agents.

#LLM #Agent #Inference #Policy