OmniMem: Perturbation-aware Memory Compression for Streaming Audio-Visual LLMs

arXiv cs.AI·Guangzhi Sun, Yixuan Li, Yudong Yang, Chao Zhang

6/9/2026

·~1 min·6/9/2026·en·1

Quick Answer

OmniMem introduces a memory-efficient streaming framework for audio-visual LLMs, enhancing long-video inference by 2-4% accuracy over existing methods.

Quick Take

It employs a modality-aware memory allocation strategy and budget-aware fine-tuning, achieving improved performance on benchmarks like VideoMME Long and LVBench. This innovation addresses token imbalance and preserves informative KV states, benefiting models like video-SALMONN 2+ and Qwen-2.5-Omni.

Key Points

OmniMem improves long-video inference accuracy by 2-4% over strong training-free baselines.
Introduces modality-aware memory allocation to manage visual and audio contexts separately.
Employs perturbation-aware memory selection to preserve informative KV states.
Budget-aware fine-tuning consolidates useful information into retained memory.
Demonstrated effectiveness on benchmarks like VideoMME Long, LVBench, and LVOmniBench.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Source Excerpt

arXiv:2606. 07577v1 Announce Type: new Abstract: Audio-visual (LLMs) hold strong promise for long-form video understanding, yet their long-video inference is fundamentally limited by the linear growth of video tokens and key-value (KV) caches. We present OmniMem, a memory-efficient streaming framework designed specifically for audio-visual LLMs.

Unlike existing compression methods that treat all tokens uniformly, OmniMem introduces a modality-aware memory allocation strategy that separately manages visual and audio contexts, addressing the severe token imbalance between the two modalities. …

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.AI

See more →

arXiv cs.AI·Vinil Pasupuleti, Shyalendar Reddy Allala, Siva Rama Krishna Varma Bayyavarapu, Shrey Tyagi, Srinivasateja Songa

3h ago

FeaturedOriginal

AINTMA: Agentic AI Architecture for Autonomous Test Management with Generative Intelligence, Secure Cloud Communication and Adaptive Quality Analytics

AI Summary

AINTMA, an autonomous test management architecture utilizing six specialized AI agents, achieves 88.4% test prioritization accuracy and reduces defect escape rates from 8.3% to 2.1%. The system demonstrates a 340% ROI within nine months, showcasing the potential of agentic AI in enhancing software quality management in cloud environments.

#Agent #AI Coding #Security #Enterprise AI

OmniMem: Perturbation-aware Memory Compression for Streaming Audio-Visual LLMs

Quick Answer

Quick Take

Key Points

Paper Resources

Source Excerpt

Want this in your inbox every morning?

More from arXiv cs.AI

AINTMA: Agentic AI Architecture for Autonomous Test Management with Generative Intelligence, Secure Cloud Communication and Adaptive Quality Analytics

RAIL Guard: Closing the Evaluation-to-Remediation Gap in Responsible AI for Agents

Automatic Ordinary Differential Equations Discovery For Biological Systems Using Powered Agentic System

Quick Answer

Quick Take

Key Points

Paper Resources

Source Excerpt

Want this in your inbox every morning?

More from arXiv cs.AI

AINTMA: Agentic AI Architecture for Autonomous Test Management with Generative Intelligence, Secure Cloud Communication and Adaptive Quality Analytics

RAIL Guard: Closing the Evaluation-to-Remediation Gap in Responsible AI for LLM Agents

Automatic Ordinary Differential Equations Discovery For Biological Systems Using Large Language Model Powered Agentic System

RAIL Guard: Closing the Evaluation-to-Remediation Gap in Responsible AI for Agents

Automatic Ordinary Differential Equations Discovery For Biological Systems Using Powered Agentic System