Architecture-Aware Reinforcement Learning Makes Sliding-Window Attention Competitive in Math Reasoning

arXiv cs.AI·Kai Liu, Peijie Dong, Xinchen Xie, Jianfei Gao, Qipeng Guo, Xiaowen Chu, Shaoting Zhang, Kai Chen

6/11/2026

·~1 min·6/11/2026·en·2

Quick Answer

This paper shows that SWARR (Sliding-Window Attention with Reinforced Adaptation for Math Reasoning) enhances mathematical reasoning by adapting self-attention models through supervised fine-tuning and reinforcement learning, significantly narrowing the performance gap between sliding-window and self-attention models.

Quick Take

Experiments show that SWARR recovers accuracy lost during conversion while maintaining linear-complexity efficiency.

Key Points

SWARR consists of supervised fine-tuning and reinforcement learning stages.
SWA models initially underperform compared to self-attention models after fine-tuning.
On-policy reinforcement learning optimizes trajectories to better fit SWA constraints.
Experiments on benchmarks demonstrate improved accuracy for SWARR over traditional methods.
SWARR retains efficiency benefits with linear-complexity attention.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Source Excerpt

arXiv:2606. 11634v1 Announce Type: new Abstract: The rapid progress of reasoning and agentic (LLMs) has increased the demand for long-context inference, but self-attention (SA) scales quadratically with context length. To address this, we study SWARR (Sliding-Window Attention with Reinforced Adaptation for Math Reasoning), a practical recipe for adapting SWA models to mathematical reasoning.

SWARR has two stages: (1) efficient conversion from a pretrained SA model to SWA with supervised fine-tuning (SFT), which avoids pretraining a new base model, and (2) policy adaptation with reinforcement learning (RL). …

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.AI

See more →

arXiv cs.AI·Vinil Pasupuleti, Shyalendar Reddy Allala, Siva Rama Krishna Varma Bayyavarapu, Shrey Tyagi, Srinivasateja Songa

4d ago

FeaturedOriginal

AINTMA: Agentic AI Architecture for Autonomous Test Management with Generative Intelligence, Secure Cloud Communication and Adaptive Quality Analytics

AI Summary

AINTMA, an autonomous test management architecture utilizing six specialized AI agents, achieves 88.4% test prioritization accuracy and reduces defect escape rates from 8.3% to 2.1%. The system demonstrates a 340% ROI within nine months, showcasing the potential of agentic AI in enhancing software quality management in cloud environments.

#Agent #AI Coding #Security #Enterprise AI

Architecture-Aware Reinforcement Learning Makes Sliding-Window Attention Competitive in Math Reasoning

Quick Answer

Quick Take

Key Points

Paper Resources

Source Excerpt

Want this in your inbox every morning?

More from arXiv cs.AI

AINTMA: Agentic AI Architecture for Autonomous Test Management with Generative Intelligence, Secure Cloud Communication and Adaptive Quality Analytics

RAIL Guard: Closing the Evaluation-to-Remediation Gap in Responsible AI for Agents

Automatic Ordinary Differential Equations Discovery For Biological Systems Using Powered Agentic System

Quick Answer

Quick Take

Key Points

Paper Resources

Source Excerpt

Want this in your inbox every morning?

More from arXiv cs.AI

AINTMA: Agentic AI Architecture for Autonomous Test Management with Generative Intelligence, Secure Cloud Communication and Adaptive Quality Analytics

RAIL Guard: Closing the Evaluation-to-Remediation Gap in Responsible AI for LLM Agents

Automatic Ordinary Differential Equations Discovery For Biological Systems Using Large Language Model Powered Agentic System

RAIL Guard: Closing the Evaluation-to-Remediation Gap in Responsible AI for Agents

Automatic Ordinary Differential Equations Discovery For Biological Systems Using Powered Agentic System