SLAP: Stratified Loss-based Pruning for On-Policy Data-Efficient Instruction Tuning
Quick Take
SLAP, a novel batch-aware data selection framework, enhances instruction tuning for large language models like LLaMA and ChatGLM by achieving superior performance with 20-40% less training data, significantly reducing costs while maintaining model capabilities across various tasks.
Key Points
- SLAP uses distribution-aware stratified sampling for comprehensive data coverage.
- It optimizes intra-batch diversity through relative distance optimization.
- Achieves superior performance across tasks like multi-turn dialogue and translation.
- Outperforms existing methods with dynamic batch selection leveraging Hessian information.
- Reduces computational costs while maintaining or improving model performance.
Article Content
From source RSS / original summaryarXiv:2605. 23969v1 Announce Type: new Abstract: Instruction tuning has optimized the specialized capabilities of large language models (LLMs), but it often requires extensive datasets and prolonged training times. The challenge lies in developing specific capabilities by identifying useful data and efficiently fine-tuning. High-quality and diverse pruned data can help models achieve lossless performance at a lower cost.
In this paper, we propose \textbf{SLAP}, a novel batch-aware data selection framework that evaluates the learnability of entire batch compositions rather than individual. SLAP ensures comprehensive data distribution coverage through distribution-aware stratified sampling while maximizing intra-batch diversity through relative distance optimization.
By leveraging Hessian-approximated gradient information for dynamic batch selection, SLAP significantly outperforms existing state-of-the-art methods across multiple model architectures (LLaMA, ChatGLM) and diverse downstream tasks including multi-turn dialogue, multilingual translation, and question answering.
Most notably, SLAP achieves superior performance with 20-40\% less training data compared to full dataset training, substantially reducing computational costs while maintaining or improving model capabilities. These results establish SLAP as a powerful approach for efficient and effective instruction tuning of large language models.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The reliability of LLM judges for evaluating deep research agents is critically assessed using the REFLECT benchmark.