Long-Context Reasoning Through Proxy-Based Chain-of-Thought Tuning

arXiv cs.CL·Miao Li, Irina Saparina, Alexander Gurung, Mirella Lapata

5/21/2026

·~2 min·5/21/2026·en·8

Quick Answer

ProxyCoT is a novel training framework that enhances long-context reasoning in large language models by transferring capabilities from short proxy contexts.

Quick Take

ProxyCoT is a novel training framework that enhances long-context reasoning in large language models by transferring capabilities from short proxy contexts. It utilizes high-quality reasoning traces obtained through reinforcement learning or distillation, significantly improving performance on long-context tasks while reducing computational overhead. Experiments show that models trained with ProxyCoT outperform strong baselines and generalize well to out-of-domain tasks.

Key Points

ProxyCoT improves long-context reasoning in models with up to 10 million tokens.
The framework uses high-quality reasoning traces from proxy contexts for training.
Models trained with ProxyCoT show reduced computational overhead and better performance.
Experiments demonstrate consistent outperformance against strong baselines across datasets.
ProxyCoT enables generalization of reasoning capabilities to out-of-domain tasks.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

📖 Reader Mode

~2 min read

[Submitted on 6 Apr 2026]

View PDF HTML (experimental)

Abstract:Recent large language models support inputs of up to 10 million tokens, yet they perform poorly on long-context tasks that require complex reasoning. Such tasks can be solved using only a subset of the input -- a proxy context -- rather than the full sequence. Despite sharing the same underlying reasoning process, models exhibit a significant performance disparity between proxy and full contexts. To improve long-context reasoning, we propose ProxyCoT, a novel training framework that transfers reasoning capabilities from short proxy contexts to full long contexts. Specifically, we first obtain high-quality chain-of-thought reasoning traces on proxy contexts through reinforcement learning or distillation from a larger teacher model, and then ground the generated traces in full long contexts with supervised fine-tuning. Experiments across different datasets demonstrate that ProxyCoT consistently outperforms strong baselines with reduced computational overhead. Furthermore, models trained with ProxyCoT generalize their long-context reasoning capabilities to out-of-domain tasks.

Comments:	Long, ACL 2026 (Main conference)
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2605.20201 [cs.CL]
	(or arXiv:2605.20201v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2605.20201 arXiv-issued DOI via DataCite

Submission history

From: Miao Li [view email]
[v1] Mon, 6 Apr 2026 16:44:17 UTC (887 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CL

See more →

arXiv cs.CL·Barak Or

1w ago

FeaturedOriginal

Quantifying Prior Dominance in Systems

AI Summary

The study introduces the Normalized Context Utilization (NCU) metric to evaluate Retrieval-Augmented Generation (RAG) systems, revealing that Small Language Models (SLMs) outperform larger models in factual extraction. The findings indicate that traditional scaling laws yield diminishing returns, with a commercial API frequently failing against adversarial evidence due to systemic confidence collapse.

#LLM #AI Coding #Inference #AI Startup

Long-Context Reasoning Through Proxy-Based Chain-of-Thought Tuning

Quick Answer

Quick Take

Key Points

Paper Resources

📖 Reader Mode

Submission history

Want this in your inbox every morning?

More from arXiv cs.CL

Quantifying Prior Dominance in Systems

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

When Plausible Is Not Realistic: Evaluating Human Mobility in LLM-Based Urban Simulation

Quick Answer

Quick Take

Key Points

Paper Resources

📖 Reader Mode

Submission history

Want this in your inbox every morning?

More from arXiv cs.CL

Quantifying Prior Dominance in RAG Systems

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

When Plausible Is Not Realistic: Evaluating Human Mobility in LLM-Based Urban Simulation

Quantifying Prior Dominance in Systems