Curvature-Guided Mixing for MLLM Adaptation

arXiv cs.CV·Jinglong Yang, Jiaxuan He, Wenjian Huang, Zhan Zhuang, Jianguo Zhang

12h ago

·~1 min·6/25/2026·en·7

Quick Answer

This paper shows that Curvature-Guided Mixing (CGM) enhances MLLM adaptation by merging pre-trained and fine-tuned models using a second-order optimization approach.

Quick Take

Curvature-Guided Mixing (CGM) enhances MLLM adaptation by merging pre-trained and fine-tuned models using a second-order optimization approach. Experiments on LLaVA-1.5 and Qwen2.5VL demonstrate improved task specialization and general knowledge retention compared to existing methods. The proposed CGM and its variant CGM† show consistent performance gains across multiple downstream tasks.

Key Points

CGM uses Hessian approximation for optimal soft mixing ratios in model merging.
CGM† introduces a robust hard mixing variant with curvature-aware parameter selection.
Experiments show consistent improvements in task specialization and knowledge retention.
Code for CGM is available on GitHub for further research and application.
Applicable to various downstream tasks, enhancing MLLM performance.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Article Excerpt

From source RSS / original summary

arXiv:2606. 24963v1 Announce Type: new Abstract: Fine-tuning Multimodal Large Language Models (MLLMs) on specialized tasks often leads to catastrophic forgetting of their general capabilities. Existing model merging methods to combat this are often heuristic or use sub-optimal objectives. We propose CurvatureGuided Mixing (CGM), a theoretically grounded framework that merges pre-trained and fine-tuned models.

CGM formulates a joint optimization objective and uses a second-order (Hessian) approximation of the loss landscapes to analytically derive an optimal, closed-form "soft mixing" ratio. This ratio intelligently blends parameters based on their relative task-specific curvatures. We also introduce CGM$\dagger$, a robust "hard mixing" variant that performs sparse parameter selection guided by a novel, curvature-aware score. Experiments on LLaVA-1. 5 and Qwen2.

5VL across multiple downstream tasks show that CGM and CGM$\dagger$ consistently improve the trade-off between task specialization and general knowledge retention over existing methods. Code is available at github. com/zzsyjl/CGM-ECCV-2026.

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Shahrzad Esmat, Chaunte W. Lacewell, Sameh Gobriel, Nilesh Jain, Ali Jannesari

2w ago

FeaturedOriginal

LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval

AI Summary

A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.

#LLM #Agent #Inference #AI Startup