LoRi: Low-Rank Distillation for Implicit Reasoning
Quick Answer
LoRi introduces a low-rank distillation framework for implicit reasoning in large language models like LLaMA and Qwen, enhancing performance on multi-step tasks.
Quick Take
LoRi introduces a low-rank distillation framework for implicit reasoning in large language models like LLaMA and Qwen, enhancing performance on multi-step tasks. The method aligns reasoning trajectories in a low-rank tensor subspace, achieving results close to explicit chain-of-thought prompting and outperforming previous iCoT methods across various benchmarks.
Key Points
- LoRi aligns teacher and student reasoning trajectories in a shared low-rank tensor subspace.
- The framework captures global reasoning structure while enabling a compact latent process.
- Evaluated on LLaMA and Qwen, it shows consistent improvement on mathematical reasoning tasks.
- Performance approaches explicit chain-of-thought accuracy, especially on challenging multi-step tasks.
- Outperforms previous implicit chain-of-thought distillation methods across multiple model families.
Article Excerpt
From source RSS / original summaryarXiv:2606. 05315v1 Announce Type: new Abstract: Implicit chain-of-thought (iCoT) methods aim to internalize reasoning in large language models, but often underperform explicit CoT prompting. We empirically find that hidden-state reasoning trajectories exhibit low-rank structure. Motivated by this observation, we propose a low-rank distillation framework that transfers reasoning by aligning teacher and student trajectories in a shared low-rank tensor subspace using first- and second-order statistics.
The resulting formulation captures the global structure of reasoning while supporting a compact latent reasoning process. We evaluate the method across multiple model families, including LLaMA and Qwen, at different scales on mathematical reasoning benchmarks. Our approach consistently improves performance, especially on challenging multi-step tasks, approaching explicit CoT accuracy and outperforming prior iCoT distillation methods.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The REFLECT benchmark reveals that current LLM judges are unreliable, achieving below 55% accuracy in evaluating reasoning and evidence use, highlighting the need for improved evaluation methods for deep research agents.