From AR to Diffusion: Efficiently Adapting Large Language Models with Strictly Causal and Elastic Horizons

arXiv cs.CL·Xiangyu Ma, Teng Xiao, Zuchao Li, Lefei Zhang

5/28/2026

·~1 min·5/28/2026·en·1

Quick Answer

This paper shows that The FLUID framework adapts autoregressive models to diffusion paradigms, enabling seamless initialization from GPT-style checkpoints and significantly reducing training costs.

Quick Take

The FLUID framework adapts autoregressive models to diffusion paradigms, enabling seamless initialization from GPT-style checkpoints and significantly reducing training costs. By employing Strictly Causal Alignment and Elastic Horizons, FLUID achieves state-of-the-art performance while reconciling traditional AR foundations with efficient parallel text generation.

Key Points

FLUID enables efficient adaptation of autoregressive models to diffusion frameworks.
Strictly Causal Alignment allows initialization from existing GPT-style checkpoints.
Elastic Horizons dynamically adjusts denoising strides based on local information density.
FLUID reduces training costs by orders of magnitude compared to traditional methods.
Experiments show FLUID achieves state-of-the-art performance in text generation.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Article Excerpt

From source RSS / original summary

arXiv:2605. 27387v1 Announce Type: new Abstract: Diffusion models promise efficient parallel text generation but rely on bidirectional attention, creating a structural mismatch with pre-trained Autoregressive (AR) models. This incompatibility precludes reusing robust AR priors, necessitating prohibitive pre-training from scratch. To bridge this gap, we propose FLUID, a framework that efficiently adapts AR backbones to the diffusion paradigm.

By enforcing Strictly Causal Alignment, FLUID enables seamless initialization from standard GPT-style checkpoints, circumventing the need for massive pre-training. Furthermore, we introduce Elastic Horizons, an entropy-driven mechanism that dynamically modulates denoising strides based on local information density rather than fixed schedules.

Experiments demonstrate that FLUID achieves state-of-the-art performance while reducing training costs by orders of magnitude, effectively reconciling established AR foundations with efficient parallel generation. Our code is available at https://github. com/Oli-lab-nun/FLUID/tree/main.

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CL

See more →

arXiv cs.CL·Miguel Arana-Catania, Catherine Conisbee, Matthew Kidd

1d ago

FeaturedOriginal

Letting the Data Speak: Extracting Keywords from Crowdsourced Collections with AI

AI Summary

The study evaluates three NLP approaches—Named Entity Recognition, Keyword Extraction, and Topic Modelling—using the Their Finest Hour Online Archive to automate keyword extraction from crowdsourced WWII collections. Findings suggest that while NLP methods show promise, no single approach is sufficient, and ethical considerations in automated keyword extraction are crucial for responsible stewardship.

#AI Coding #Inference #Open Source #Policy

From AR to Diffusion: Efficiently Adapting Large Language Models with Strictly Causal and Elastic Horizons

Quick Answer

Quick Take

Key Points

Paper Resources

Article Excerpt

Want this in your inbox every morning?

More from arXiv cs.CL

Letting the Data Speak: Extracting Keywords from Crowdsourced Collections with AI

Quantifying Prior Dominance in Systems

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

Quick Answer

Quick Take

Key Points

Paper Resources

Article Excerpt

Want this in your inbox every morning?

More from arXiv cs.CL

Letting the Data Speak: Extracting Keywords from Crowdsourced Collections with AI

Quantifying Prior Dominance in RAG Systems

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

Quantifying Prior Dominance in Systems