AdaGRPO: A Capability-Aware Adaptive Enhancement for Flow-based GRPO
Quick Answer
AdaGRPO introduces a capability-aware RL algorithm that enhances flow-based GRPO by dynamically selecting prompts and integrating intra-group and global advantages, leading to improved performance and stability in training.
Quick Take
AdaGRPO introduces a capability-aware RL algorithm that enhances flow-based GRPO by dynamically selecting prompts and integrating intra-group and global advantages, leading to improved performance and stability in training. This lightweight module can be integrated with existing frameworks like Flow-GRPO and DanceGRPO, demonstrating significant training stability and performance gains.
Key Points
- AdaGRPO features an Online Curriculum Filtering Strategy for adaptive prompt selection.
- Cross-Level Advantage Fusion integrates intra-group and global advantages for unbiased evaluation.
- Extensive experiments show consistent performance gains in flow models using AdaGRPO.
- AdaGRPO stabilizes GRPO training, addressing critical blind spots in current methods.
- The module is lightweight and easily integrates with existing frameworks.
Article Content
From source RSS / original summaryarXiv:2606. 06828v1 Announce Type: new Abstract: Group Relative Policy Optimization (GRPO) has demonstrated remarkable success in aligning text-to-image (T2I) flow models with human preferences.
However, we have identified that the learning loop of current flow-based GRPO is fundamentally decoupled from the learner's current capability, suffering from critical blind spots at both prompt selection and advantage estimation: (i) Existing methods sample prompts randomly, overlooking the substantial impact of data selection on reinforcement learning (RL) efficacy--a factor proven crucial in GRPO for large language models; (ii) They evaluate sample quality solely relying on intra-group statistics, lacking a global perspective to accurately measure true policy improvement.
To address these issues, we propose Adaptive GRPO (AdaGRPO), a novel capability-aware RL algorithm tailored for flow models.
Specifically, AdaGRPO consists of two principal components: (i) Online Curriculum Filtering Strategy: Dynamically tracks the model's proficiency and adaptively selects prompts that best match its current learning boundary; (ii) Cross-Level Advantage Fusion: Synergistically integrates fine-grained intra-group advantages with macro-level global advantages, providing a comprehensive and unbiased policy evaluation.
As a lightweight, plug-and-play module, AdaGRPO can be seamlessly integrated with existing frameworks such as Flow-GRPO, DanceGRPO, and Flow-CPS. Extensive experiments demonstrate that AdaGRPO consistently drives performance gains while significantly stabilizes GRPO training for flow models.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval
A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.