From AR to Diffusion: Efficiently Adapting Large Language Models with Strictly Causal and Elastic Horizons
Quick Take
The FLUID framework adapts autoregressive models to diffusion paradigms, enabling seamless initialization from GPT-style checkpoints and reducing training costs significantly. By introducing Strictly Causal Alignment and Elastic Horizons, FLUID achieves state-of-the-art performance in text generation while maintaining efficiency.
Key Points
- FLUID allows autoregressive models to adapt to diffusion without extensive pre-training.
- Introduces Strictly Causal Alignment for seamless initialization from GPT checkpoints.
- Elastic Horizons dynamically adjusts denoising strides based on local information density.
- Achieves state-of-the-art performance while reducing training costs by orders of magnitude.
- Code available at https://github.com/Oli-lab-nun/FLUID/tree/main.
Article Excerpt
From source RSS / original summaryarXiv:2605. 27387v1 Announce Type: new Abstract: Diffusion models promise efficient parallel text generation but rely on bidirectional attention, creating a structural mismatch with pre-trained Autoregressive (AR) models. This incompatibility precludes reusing robust AR priors, necessitating prohibitive pre-training from scratch. To bridge this gap, we propose FLUID, a framework that efficiently adapts AR backbones to the diffusion paradigm.
By enforcing Strictly Causal Alignment, FLUID enables seamless initialization from standard GPT-style checkpoints, circumventing the need for massive pre-training. Furthermore, we introduce Elastic Horizons, an entropy-driven mechanism that dynamically modulates denoising strides based on local information density rather than fixed schedules.
Experiments demonstrate that FLUID achieves state-of-the-art performance while reducing training costs by orders of magnitude, effectively reconciling established AR foundations with efficient parallel generation. Our code is available at https://github. com/Oli-lab-nun/FLUID/tree/main.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The reliability of LLM judges for evaluating deep research agents is critically assessed using the REFLECT benchmark.