Reasoning that Travels: Dissecting How Chain-of-Thought Transfers Across Models
Quick Take
This study explores how chain-of-thought (CoT) reasoning can be transferred across models, revealing that transfer mechanisms vary by model and task. In force-answer mode, explicit answer availability drives transfer, while in free-generation mode, partial CoTs enhance performance across benchmarks. The findings suggest that cross-model CoT transfer reflects diverse processes, including answer extraction and reasoning scaffolding.
Key Points
- Full CoT traces often transfer successfully across models and tasks.
- In force-answer mode, explicit answer availability is crucial for transfer.
- Receiver competence plays a significant role in MMLU-Pro benchmark.
- Partial CoTs improve performance in free-generation mode across benchmarks.
- Answer agreement among receivers can signal when to stop provider reasoning.
Article Content
From source RSS / original summaryarXiv:2605. 28913v1 Announce Type: new Abstract: Large reasoning models (LRMs) often generate extensive chain-of-thought (CoT) traces before producing a final answer. As explicit textual artifacts, these traces can be passed to other models to solve the same task, enabling cross-model reasoning transfer. Yet successful transfer alone does not reveal how the provided CoT contributes to another model's answer.
We study this question with a controlled provider--receiver framework, where a provider generates a reasoning trace and a receiver solves the same problem from increasingly longer trace prefixes. We compare force-answer, where the receiver answers directly from the prefix, with free-generation, where it may continue reasoning before answering. Across models and benchmarks, full traces often transfer successfully, but prefix trajectories reveal distinct mechanisms.
In force-answer mode, AIME transfer is largely driven by explicit answer availability. MMLU-Pro instead reflects a larger role for receiver competence, while ZebraLogic depends on partial structured-answer information rather than complete-answer leakage alone. In free-generation mode, partial CoTs improve performance across benchmarks, indicating that prefixes can guide continued reasoning. Finally, answer agreement among receivers provides a gold-free signal for stopping provider reasoning early.
Overall, cross-model CoT transfer is not a single phenomenon: it can reflect answer extraction, reasoning scaffolding, or receiver-dependent competence.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The reliability of LLM judges for evaluating deep research agents is critically assessed using the REFLECT benchmark.