MorphoQuant: Modality-Aware Quantization for Omni-modal Large Language Models

arXiv cs.CV·Yue Wu, Changyuan Wang, Zixuan Wang, Shilin Ma, Yansong Tang

3h ago

·~1 min·6/4/2026·en·0

Quick Take

MorphoQuant introduces a modality-aware quantization framework for Omni-modal Large Language Models, achieving 76.63% on ScienceQA with the W4A4 model, outperforming existing SOTA methods and the W4A16 baseline. This approach utilizes Distribution-Aware Bias Compensation to manage outliers and optimize quantization across diverse modalities.

Key Points

MorphoQuant addresses challenges in 4-bit quantization for Omni-modal Large Language Models.
Introduces Distribution-Aware Bias Compensation to manage long-tailed outliers effectively.
Achieves superior performance on benchmarks like MMMU and Video-MME.
W4A4 model surpasses SOTA methods and W4A16 baseline in accuracy-efficiency trade-off.
Optimizes quantization grid with Morphology-Directed Quantization Function Optimization.

Article Content

From source RSS / original summary

arXiv:2606. 04349v1 Announce Type: new Abstract: Conventional Post-Training Quantization (PTQ) methods struggle with 4-bit Omni-modal Large Language Models (OLLMs) due to the extreme distribution heterogeneity and disparate outlier patterns across modalities. To address this, we propose MorphoQuant, a modality-aware PTQ framework engineered to preserve cross-modal morphology and mitigate outlier loss.

Specifically, we introduce Distribution-Aware Bias Compensation (DABC), which selectively absorbs long-tailed outliers into channel-wise biases. This mechanism safeguards outlier magnitudes while maintaining high-precision discretization for dense inliers, thereby preserving accurate discretization across diverse modal distribution.

Complementing this, we propose Morphology-Directed Quantization Function Optimization (MDQFO) to co-optimize the quantization grid with the bias mask, ensuring fine-grained alignment across modalities. Extensive evaluations on Qwen2. 5-Omni across benchmarks like MMMU and Video-MME demonstrate our approach's superiority. Notably, our W4A4 model achieves 76.

63% on ScienceQA, significantly outperforming SOTA W4A4 methods and surprisingly surpassing the W4A16 baseline, which fully demonstrates the exceptional accuracy-efficiency trade-off of our framework.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Shimon Malnick, Matan Rusanovsky, Ohad Fried, Shai Avidan

3h ago

Original

Optimal Transport Flow Matching by Design

AI Summary

The study presents a novel approach to optimal transport (OT) flow matching, reformulating the problem by treating the prior as a design choice. This method achieves over 2x reduction in trajectory curvature compared to existing methods, improving generation quality in few-step regimes without altering the flow model. The approach integrates seamlessly with latent-space models and classifier-free guidance.

#AI Coding #Inference #Open Source