GRACE: Gradient-aligned Reasoning Data Curation for Efficient Post-training

arXiv cs.AI·Junjie Li, Ziao Wang, NingXuan Ma, Jianghong Ma, Xiaofeng Zhang

3d ago

·~2 min·5/14/2026·en·1

Quick Take

GRACE optimizes reasoning data curation by scoring individual steps for efficient post-training performance.

Key Points

Scores reasoning steps based on alignment with gradients.
Achieves high performance with significantly reduced data.
Utilizes internal optimization signals for scalability.

📖 Reader Mode

~2 min read

[Submitted on 13 May 2026]

View PDF HTML (experimental)

Abstract:Existing reasoning data curation pipelines score whole samples, treating every intermediate step as equally valuable. In reality, steps within a trace contribute very unevenly, and selecting reasoning data well requires assessing them individually. We present GRACE, a gradient-aligned curation method that views each reasoning trace as a sequence of optimization events and scores every step by two complementary signals: its alignment with the answer-oriented gradient direction, and its consistency with the preceding reasoning trajectory. Step-level scores are aggregated into a sample-level value for subset selection, using only the model's internal optimization signals and no external reward models or step annotations. To make this scalable, GRACE introduces a representation-level gradient proxy that estimates step-level alignment from token-level upstream signals in a single forward pass. Post-training Qwen3-VL-2B-Instruct on MMathCoT-1M, GRACE reaches 108.8% of the full-data performance with 20% of the data and retains 100.2% with only 5%, with subsets that transfer effectively across model backbones.

Subjects:	Artificial Intelligence (cs.AI)
Cite as:	arXiv:2605.13130 [cs.AI]
	(or arXiv:2605.13130v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2605.13130 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Junjie Li [view email]
[v1] Wed, 13 May 2026 07:55:39 UTC (708 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

GRACE: Gradient-aligned Reasoning Data Curation for Efficient Post-training

Quick Take

Key Points

📖 Reader Mode

Submission history

More from arXiv cs.AI

Invisible Orchestrators Suppress Protective Behavior and Dissociate Power-Holders: Safety Risks in Multi-Agent LLM Systems

Distribution-Aware Algorithm Design with LLM Agents

Enhanced and Efficient Reasoning in Large Learning Models

Related in this space

Measuring and Mitigating Toxicity in Large Language Models: A Comprehensive Replication Study