Visual-Advantage On-Policy Distillation for Vision-Language Models

arXiv cs.CV·Ruiqi Liu, Xiaolei Lv, Gengsheng Li, Ximo Zhu, Zhiheng Wang, Zhengbo Zhang, Junkai Chen, Zhiheng Li, Bo Li, Jun Gao, Shu Wu

15h ago

·~2 min·5/22/2026·en·0

Quick Take

Visual-Advantage On-Policy Distillation enhances vision-language models by focusing on critical visual tokens.

Key Points

Introduces visual advantage (VA) for token-level analysis.
Proposes VA-OPD for improved distillation in VLMs.
Demonstrates consistent gains across various benchmarks.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Jinhao Jing, Zheng Ma, Jinwei Liang, Qiannian Zhao, Shawn Chen, Jing Yang, Por Lip Yee, Prayag Tiwari, Jingjing Bai, Benyou Wang, Lewei Lu, Zhan Su

3d ago

FeaturedOriginal

GeoSym127K: Scalable Symbolically-verifiable Synthesis for Multimodal Geometric Reasoning

AI Summary

GeoSym127K introduces a scalable neuro-symbolic framework for enhanced geometric reasoning in multimodal models.

#LLM #AI Coding #Robotics

1

arXiv cs.CV·Yuiko Sakuma, Masakazu Yoshimura, Marcel Gr\"opl, Zitang Sun, Junji Otsuka, Atsushi Irie, Takeshi Ohashi

2d ago

FeaturedOriginal

Structuring Open-Ended NAS: Semi-Automated Design Knowledge Structuring with LLMs for Efficient Neural Architecture Search

AI Summary

This paper presents FairNAD, a semi-automated approach for efficient neural architecture search using structured design knowledge.

#LLM #Open Source #AI Startup

1

arXiv cs.CV·Xiangxiang Cui, Tianjin Huang, Yifang Wang, Lijie Hu, Lu Yin

2d ago

FeaturedOriginal

MedFM-Robust: Benchmarking Robustness of Medical Foundation Models

AI Summary

MedFM-Robust benchmarks the reliability of medical foundation models in clinical applications.

#LLM #Robotics #AI Assistant #Policy

1

Related in this space

See more →

arXiv cs.CL·Leyao Wang, Yanan He, Peng Chen, Asaf Yehudai, Yixin Liu, Rex Ying, Michal Shmueli-Scheuer, Arman Cohan

2d ago

FeaturedOriginal

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

AI Summary

The reliability of LLM judges for evaluating deep research agents is critically assessed using the REFLECT benchmark.

#LLM #Agent #Inference #Policy

2

arXiv cs.AI·Angelos Angelopoulos, James F. Cahoon, Ron Alterovitz

3d ago

FeaturedOriginal

From Prompts to Protocols: An AI Agent for Laboratory Automation

AI Summary

An AI agent integrates large language models for automating laboratory protocols, enhancing efficiency and accuracy.

#LLM #Agent #AI Coding #Enterprise AI

1

arXiv cs.AI·Yihan Xia, Panpan You, Taotao Wang, Fang Liu, Han Qi, Xiaoxiao Wu, Shengli Zhang

2d ago

FeaturedOriginal

Agentic Trading: When LLM Agents Meet Financial Markets

AI Summary

The paper reviews LLM-based trading agents, highlighting protocol incomparability and reproducibility challenges.

#LLM #Agent #AI Startup #Enterprise AI

3

0

Business impact20%0

Novelty (recency)10%97

≥75 high · 50–74 medium · <50 low

Why Featured

This advancement in vision-language models signals improved efficiency in processing visual data, which can enhance AI applications for developers, optimize project management for PMs, and attract investment for innovative solutions.