ActQuant: Sub-4-bit Action-Guided Quantization for Vision-Language-Action Models

arXiv cs.CV·Arash Akbari, Arman Akbari, Masih Eskandar, Qitao Tan, Yixiao Chen, Jingwu Luo, Bertha Pangaribuan, Liyun Zhang, Jennifer Dy, Geng Yuan, Xue Lin, Gaowen Liu, Stratis Ioannidis, Yanzhi Wang

4d ago

·~2 min·5/26/2026·en·1

Quick Take

ActQuant introduces a sub-4-bit action-guided quantization framework for Vision-Language-Action models, achieving 95.0% performance on OpenVLA-OFT at 3 bits-per-weight. The framework compresses model size from 14.3 GB to 2.7 GB, while maintaining success rates on a physical UR3 arm.

Key Points

ActQuant uses a two-stage mixed-precision PTQ framework for efficient quantization.
Achieves 2.5 bits-per-weight with 90.1% performance on OpenVLA-OFT.
OmniModel.cpp enables deployment of quantized models in native C/C++ runtime.
Quantization reduces memory footprint by 2.5 times on the physical UR3 arm.
ActQuant is the only method achieving performance at or below 3 bits-per-weight.

Article Content

From source RSS / original summary

arXiv:2605. 24011v1 Announce Type: new Abstract: Vision-Language-Action (VLA) models exhibit remarkable action generation for embodied intelligence, but their heavy compute make deployment on edge platforms impractical. Aggressive, sub-4-bit weight quantization is the natural solution, yet existing post-training quantization (PTQ) methods suffer severe performance degradation in this regime.

To address this, we introduce ActQuant, an action-guided mixed-precision PTQ framework that operates in two stages: (1) an inter-tensor bit allocator that assigns each weight matrix a single bit-width based on how much it contributes to predicting the agent's actions; (2) an intra-tensor scale optimizer tunes per-block quantization scales using action-aware curvature, so that dynamic range is concentrated on the weights most influential for control.

To deliver the on-device benefits of our aggressive quantization, we further introduce OmniModel. cpp, an agentic conversion pipeline that ports architectures into a native C/C++ runtime with efficient low-bit kernels. We evaluate ActQuant both in simulation and on a real-world 6-DoF UR3 arm, with all models deployed through OmniModel. cpp. On the LIBERO benchmark, ActQuant is the only method that operates at or below 3 bits-per-weight, retaining 95. 0% on OpenVLA-OFT and 94. 8% on $\pi_{0. 5}$.

Pushed further, ActQuant reaches 2. 5 bpw at 90. 1% on OpenVLA-OFT, compressing the backbone from 14. 3 GB to 2. 7 GB (5. 3$\times$). On the physical UR3 arm, $\pi_{0. 5}$ quantized with ActQuant retains the baseline's success rate while reducing the memory footprint by 2. 5$\times$.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Taha Koleilat, Hassan Rivaz, Yiming Xiao

3d ago

FeaturedOriginal

Evi-Steer: Learning to Steer Biomedical Vision-Language Models through Efficient and Generalizable Evidential Tuning

AI Summary

Evi-Steer introduces a novel evidential tuning framework for BiomedCLIP, achieving 0.11% parameter updates while enhancing uncertainty-aware fine-tuning. It outperforms state-of-the-art methods across 15 biomedical imaging datasets, proving effective in few-shot learning and domain shifts for clinical applications.

#AI Coding #Inference #Open Source

ActQuant: Sub-4-bit Action-Guided Quantization for Vision-Language-Action Models

Quick Take

Key Points

Article Content

Want this in your inbox every morning?

More from arXiv cs.CV

Evi-Steer: Learning to Steer Biomedical Vision-Language Models through Efficient and Generalizable Evidential Tuning

Deep Learning-Based Automated Quantification of TIMI Myocardial Perfusion Frame Count (DL-TMPFC) from Coronary Angiography: A Novel Framework for Rapid Assessment of Microvascular Dysfunction

GeoSym127K: Scalable Symbolically-verifiable Synthesis for Multimodal Geometric Reasoning

Related in this space

The Importance of Out-of-Band Metadata for Safe Autonomous Agents: The Redpanda Agentic Data Plane

TorqueAGI Announces Collaborations with NVIDIA, John Deere, and Dexterity to Advance Physical AI for Enterprise-Grade Robots

FORT Robotics Acquires Mapless AI to Expand Its Trust Platform with Remote Supervision and Active Safety Capabilities