Reasoning that Travels: Dissecting How Chain-of-Thought Transfers Across Models

arXiv cs.CL·Xinyuan Cheng, Beiduo Chen, Philipp Mondorf, Barbara Plank

5/29/2026

·~2 min·5/29/2026·en·4

Quick Answer

This study explores how chain-of-thought (CoT) reasoning can be transferred across models, revealing that transfer mechanisms vary by model and task.

Quick Take

This study explores how chain-of-thought (CoT) reasoning can be transferred across models, revealing that transfer mechanisms vary by model and task. In force-answer mode, explicit answer availability drives transfer, while in free-generation mode, partial CoTs enhance performance across benchmarks. The findings suggest that cross-model CoT transfer reflects diverse processes, including answer extraction and reasoning scaffolding.

Key Points

Full CoT traces often transfer successfully across models and tasks.
In force-answer mode, explicit answer availability is crucial for transfer.
Receiver competence plays a significant role in benchmark.
Partial CoTs improve performance in free-generation mode across benchmarks.
Answer agreement among receivers can signal when to stop provider reasoning.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Article Content

From source RSS / original summary

arXiv:2605. 28913v1 Announce Type: new Abstract: Large reasoning models (LRMs) often generate extensive chain-of-thought (CoT) traces before producing a final answer. As explicit textual artifacts, these traces can be passed to other models to solve the same task, enabling cross-model reasoning transfer. Yet successful transfer alone does not reveal how the provided CoT contributes to another model's answer.

We study this question with a controlled provider--receiver framework, where a provider generates a reasoning trace and a receiver solves the same problem from increasingly longer trace prefixes. We compare force-answer, where the receiver answers directly from the prefix, with free-generation, where it may continue reasoning before answering. Across models and benchmarks, full traces often transfer successfully, but prefix trajectories reveal distinct mechanisms.

In force-answer mode, AIME transfer is largely driven by explicit answer availability. instead reflects a larger role for receiver competence, while ZebraLogic depends on partial structured-answer information rather than complete-answer leakage alone. In free-generation mode, partial CoTs improve performance across benchmarks, indicating that prefixes can guide continued reasoning. Finally, answer agreement among receivers provides a gold-free signal for stopping provider reasoning early.

Overall, cross-model CoT transfer is not a single phenomenon: it can reflect answer extraction, reasoning scaffolding, or receiver-dependent competence.

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CL

See more →

arXiv cs.CL·Miguel Arana-Catania, Catherine Conisbee, Matthew Kidd

1d ago

FeaturedOriginal

Letting the Data Speak: Extracting Keywords from Crowdsourced Collections with AI

AI Summary

The study evaluates three NLP approaches—Named Entity Recognition, Keyword Extraction, and Topic Modelling—using the Their Finest Hour Online Archive to automate keyword extraction from crowdsourced WWII collections. Findings suggest that while NLP methods show promise, no single approach is sufficient, and ethical considerations in automated keyword extraction are crucial for responsible stewardship.

#AI Coding #Inference #Open Source #Policy

Reasoning that Travels: Dissecting How Chain-of-Thought Transfers Across Models

Quick Answer

Quick Take

Key Points

Paper Resources

Article Content

Want this in your inbox every morning?

More from arXiv cs.CL

Letting the Data Speak: Extracting Keywords from Crowdsourced Collections with AI

Quantifying Prior Dominance in Systems

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

Quick Answer

Quick Take

Key Points

Paper Resources

Article Content

Want this in your inbox every morning?

More from arXiv cs.CL

Letting the Data Speak: Extracting Keywords from Crowdsourced Collections with AI

Quantifying Prior Dominance in RAG Systems

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

Quantifying Prior Dominance in Systems