COPRA: Conditional Parameter Adaptation with Reinforcement Learning for Video Anomaly Detection

arXiv cs.CV·Darryl Cherian Jacob, Xinyu Liu, Kai Wang, Pan He

5/18/2026

·~2 min·5/18/2026·en·8

Quick Answer

COPRA introduces a conditional parameter adaptation framework for video anomaly detection, enhancing VLMs' performance by dynamically adjusting parameters during training and inference.

Quick Take

COPRA introduces a conditional parameter adaptation framework for video anomaly detection, enhancing ' performance by dynamically adjusting parameters during training and inference. This approach outperforms static baselines on standard VAD benchmarks and generalizes to tasks like Video Question Answering and Dense Captioning, showcasing its scalability and adaptability.

Key Points

COPRA dynamically adapts frozen VLMs for each video segment during training and inference.
It addresses mismatches in data distribution and model configuration in existing VAD methods.
Experiments show COPRA consistently outperforms static baselines in both in-domain and cross-domain settings.
The framework also generalizes to unseen tasks like multiple-choice Video Question Answering.
COPRA is positioned as a scalable and context-aware solution for video understanding.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

📖 Reader Mode

~2 min read

[Submitted on 14 May 2026]

View PDF HTML (experimental)

Abstract:Vision-language models (VLMs) have shown strong performance in video anomaly detection (VAD) while providing interpretable predictions. However, existing VLM-based VAD methods suffer from a fundamental mismatch between training and inference in both data distribution and model configuration. First, most approaches rely on static post-training adaptation, limiting generalization under distribution shifts such as unseen environments or anomaly types. Second, they train VLMs on sparse frames from long videos, but perform inference on densely sampled short segments, creating inconsistencies between training and testing. To address these limitations, we propose COPRA, a conditional parameter adaptation framework for VLM-based VAD. Instead of fixed prompts or shared parameter updates, COPRA generates input-specific parameter updates to dynamically adapt a frozen VLM for each video segment during both training and inference. Experiments show strong performance on standard VAD benchmarks, consistently outperforming static baselines in both in-domain and cross-domain settings. Moreover, COPRA generalizes beyond VAD to unseen tasks such as multiple-choice Video Question Answering and Dense Captioning. These results highlight COPRA as an effective weight-space generation framework for scalable, adaptive, and context-aware video understanding. The code will be released at this https URL

Comments:	Manuscript currently under review for publication
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2605.15325 [cs.CV]
	(or arXiv:2605.15325v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2605.15325 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Pan He [view email]
[v1] Thu, 14 May 2026 18:39:40 UTC (14,154 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Shahrzad Esmat, Chaunte W. Lacewell, Sameh Gobriel, Nilesh Jain, Ali Jannesari

4w ago

FeaturedOriginal

LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval

AI Summary

A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.

#LLM #Agent #Inference #AI Startup