VigilFormer: Deformable Attention for Video Anomaly Detection… | AI Deep Signal

VigilFormer: Deformable Attention for Video Anomaly Detection with Causal Risk Inference

6/16/2026

·~1 min·6/16/2026·en·1

Quick Answer

VigilFormer introduces a novel framework for video anomaly detection, utilizing Deformable Spatio-Temporal Attention and Causal Risk Inference.

Quick Take

It achieves AUC scores of 87.83%, 97.21%, and 89.74% on UCF-Crime, ShanghaiTech, and CUHK Avenue, respectively, while maintaining 41.5 FPS on a single GPU, outperforming recent methods in both accuracy and speed.

Key Points

VigilFormer employs Deformable Spatio-Temporal Encoder to optimize attention across frames.
Causal Anomaly Classifier uses dilated causal convolutions for snippet-level feature analysis.
Adaptive Confidence Scheduler reduces computation by skipping low-information frames during inference.
Achieves state-of-the-art AUC scores on multiple benchmarks while maintaining real-time performance.
Outperforms recent weakly-supervised approaches in both speed and detection accuracy.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Source Excerpt

arXiv:2606. 14724v1 Announce Type: new Abstract: Video anomaly detection in surveillance settings must balance detection accuracy against real-time throughput, a tension that existing methods address either through stronger feature extractors or more efficient architectures, but rarely both. We present VigilFormer, a unified framework that combines deformable spatio-temporal attention with causal temporal modeling to detect anomalies in untrimmed surveillance video. …

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Aavash Chhetri, Bibek Niroula, Eduard Vazquez, Yash Raj Shrestha, Prashnna Gyawali, Loris Bazzani, Binod Bhattarai

3w ago

FeaturedOriginal

ProMoE-FL: Prototype-conditioned Mixture of Experts for Multimodal Federated Learning with Missing Modalities

AI Summary

ProMoE-FL introduces a Prototype-conditioned Mixture-of-Experts framework for multimodal federated learning, effectively addressing missing modalities. It outperforms existing methods on four chest X-ray datasets, demonstrating superior feature synthesis capabilities in both homogeneous and heterogeneous settings.

#LLM #AI Coding #AI Startup #Enterprise AI

VigilFormer: Deformable Attention for Video Anomaly Detection with Causal Risk Inference

Quick Answer

Quick Take

Key Points

Paper Resources

Source Excerpt

Want this in your inbox every morning?

More from arXiv cs.CV

ProMoE-FL: Prototype-conditioned Mixture of Experts for Multimodal Federated Learning with Missing Modalities

-Guided ANN Index Optimization for Human-Object Interaction Retrieval

Eddeep: a deep-learning framework for fast eddy-current distortion correction in diffusion MRI

Related in this space

Synthetic Data Generation for Financial AI Research with NVIDIA NeMo

Deploy a Production-Ready NVIDIA AI-Q Blueprint on Oracle Cloud Infrastructure

Deploy Long-Context Reasoning and Agentic Workflows with MiniMax M3 on NVIDIA Accelerated Infrastructure

Quick Answer

Quick Take

Key Points

Paper Resources

Source Excerpt

Want this in your inbox every morning?

More from arXiv cs.CV

ProMoE-FL: Prototype-conditioned Mixture of Experts for Multimodal Federated Learning with Missing Modalities

LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval

Eddeep: a deep-learning framework for fast eddy-current distortion correction in diffusion MRI

Related in this space

Synthetic Data Generation for Financial AI Research with NVIDIA NeMo

Deploy a Production-Ready NVIDIA AI-Q Blueprint on Oracle Cloud Infrastructure

Deploy Long-Context Reasoning and Agentic Workflows with MiniMax M3 on NVIDIA Accelerated Infrastructure

-Guided ANN Index Optimization for Human-Object Interaction Retrieval