Curvature-Guided Mixing for MLLM Adaptation
Quick Answer
This paper shows that Curvature-Guided Mixing (CGM) enhances MLLM adaptation by merging pre-trained and fine-tuned models using a second-order optimization approach.
Quick Take
Curvature-Guided Mixing (CGM) enhances MLLM adaptation by merging pre-trained and fine-tuned models using a second-order optimization approach. Experiments on LLaVA-1.5 and Qwen2.5VL demonstrate improved task specialization and general knowledge retention compared to existing methods. The proposed CGM and its variant CGM† show consistent performance gains across multiple downstream tasks.
Key Points
- CGM uses Hessian approximation for optimal soft mixing ratios in model merging.
- CGM† introduces a robust hard mixing variant with curvature-aware parameter selection.
- Experiments show consistent improvements in task specialization and knowledge retention.
- Code for CGM is available on GitHub for further research and application.
- Applicable to various downstream tasks, enhancing MLLM performance.
Paper Resources
Article Excerpt
From source RSS / original summaryarXiv:2606. 24963v1 Announce Type: new Abstract: Fine-tuning Multimodal Large Language Models (MLLMs) on specialized tasks often leads to catastrophic forgetting of their general capabilities. Existing model merging methods to combat this are often heuristic or use sub-optimal objectives. We propose CurvatureGuided Mixing (CGM), a theoretically grounded framework that merges pre-trained and fine-tuned models.
CGM formulates a joint optimization objective and uses a second-order (Hessian) approximation of the loss landscapes to analytically derive an optimal, closed-form "soft mixing" ratio. This ratio intelligently blends parameters based on their relative task-specific curvatures. We also introduce CGM$\dagger$, a robust "hard mixing" variant that performs sparse parameter selection guided by a novel, curvature-aware score. Experiments on LLaVA-1. 5 and Qwen2.
5VL across multiple downstream tasks show that CGM and CGM$\dagger$ consistently improve the trade-off between task specialization and general knowledge retention over existing methods. Code is available at github. com/zzsyjl/CGM-ECCV-2026.
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval
A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.