When to Think Deeply: Inhibitory Deliberation for LLM Reasoning
Quick Answer
The IDPR framework enhances Large Language Model reasoning by selectively invoking slow deliberation, improving accuracy from 47.90% to 48.92% on a math test set while only using slow reasoning 8.20% of the time.
Quick Take
The IDPR framework enhances Large Language Model reasoning by selectively invoking slow deliberation, improving accuracy from 47.90% to 48.92% on a math test set while only using slow reasoning 8.20% of the time. This method outperforms random routing and confidence-based baselines in identifying when to apply deeper reasoning.
Key Points
- IDPR uses an inhibition controller to decide on slow reasoning necessity.
- Only 8.20% of examples required slow reasoning, optimizing computational costs.
- Accuracy improved from 47.90% to 48.92% on a 5,000-example math test.
- Random routing decreased accuracy to 46.76%, highlighting IDPR's efficiency.
- IDPR achieved the highest corrective precision in identifying beneficial fast answers.
Article Content
From source RSS / original summaryarXiv:2606. 06745v1 Announce Type: new Abstract: Reasoning Large Language Models can improve problem-solving performance through deliberative inference, but invoking slow reasoning for every input is computationally expensive and often unnecessary. We propose IDPR, a framework for response-conditioned inhibitory deliberation. IDPR first generates a concise intuitive answer and then uses an inhibition controller to decide whether that specific response should be released or suppressed in favor of slow reasoning.
Unlike input-only routers, the inhibition controller conditions on the fast answer and fast-side evidence, including confidence, logit margin, parseability, and generation cost. We train the controller from paired fast-slow outcomes and select the inhibition threshold on a held-out validation set under an accuracy-first slow-call budget. On a held-out 5,000-example mathematical reasoning test set, IDPR invokes slow reasoning on only 8. 20% of examples and improves accuracy from 47. 90% to 48. 92%.
Under the same slow-call budget, random routing decreases accuracy to 46. 76%, while the strongest confidence-based baseline reaches 48. 22%. IDPR also achieves the highest corrective precision, showing that response-conditioned inhibition better identifies fast answers that benefit from slow reasoning.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The REFLECT benchmark reveals that current LLM judges are unreliable, achieving below 55% accuracy in evaluating reasoning and evidence use, highlighting the need for improved evaluation methods for deep research agents.