Visual-Advantage On-Policy Distillation for Vision-Language Models
Quick Take
Visual-Advantage On-Policy Distillation enhances vision-language models by focusing on critical visual tokens.
Key Points
- Introduces visual advantage (VA) for token-level analysis.
- Proposes VA-OPD for improved distillation in VLMs.
- Demonstrates consistent gains across various benchmarks.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →GeoSym127K: Scalable Symbolically-verifiable Synthesis for Multimodal Geometric Reasoning
GeoSym127K introduces a scalable neuro-symbolic framework for enhanced geometric reasoning in multimodal models.