When Seeing Is Not Believing -- A Benchmark for Search-Grounded Video Misinformation Detection

arXiv cs.CV·Tao Yu, Yujia Yang, Shenghua Chai, Zhang Jinshuai, Haopeng Jin, Hao Wang, Minghui Zhang, Zhongtian Luo, Yuchen Long, Xinlong Chen, Jiabing Yang, Zhaolu Kang, Yuxuan Zhou, Zhengyu Man, Xinming Wang, Hongzhu Yi, Zheqi He, Xi Yang, Yan Huang, Liang Wang

3h ago

·~1 min·6/4/2026·en·0

Quick Take

EVID-Bench introduces a benchmark for detecting video misinformation, requiring models to search the web for related content. Nine multimodal models were evaluated, achieving only 61.43% point-level accuracy and 43.24% video-level accuracy, highlighting challenges in detecting AI-generated manipulations.

Key Points

EVID-Bench consists of 222 videos across 9 manipulation types.
Models struggle with AI-generated manipulations and misattribution of content.
Best-performing model achieved 61.43% point-level accuracy.
Error analysis shows models often focus on irrelevant anchors.
Verification requires cross-video comparison for effective detection.

Article Content

From source RSS / original summary

arXiv:2606. 04098v1 Announce Type: new Abstract: Video misinformation increasingly operates at the semantic and evidential level: authentic footage may be selectively edited, temporally reordered, spliced across sources, or augmented with AI-generated content to construct false narratives. Such evidence-dependent manipulations cannot be reliably verified from the input video alone, because the missing, reordered, replaced, or recontextualized evidence lies outside the video itself.

We introduce \textbf{EVID-Bench}, a benchmark for search-grounded video misinformation detection, where a system must search the open web for related videos and identify what information is false through cross-video comparison. EVID-Bench comprises 222 videos spanning 9 manipulation types across 3 categories: AI generation, single-source editing, and multi-source editing. All samples are verified to be undetectable by frontier models through visual inspection alone.

We evaluate nine frontier multimodal models using a retrieval-augmented verification baseline. The best system achieves only 61. 43\% point-level accuracy and 43. 24\% video-level accuracy, while AI-generated manipulations remain especially challenging. Error analysis reveals recurring challenges: models fixate on irrelevant anchors, misattribute synthetic content to editorial splicing, and terminate search prematurely before fully explaining the manipulation.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Shimon Malnick, Matan Rusanovsky, Ohad Fried, Shai Avidan

3h ago

Original

Optimal Transport Flow Matching by Design

AI Summary

The study presents a novel approach to optimal transport (OT) flow matching, reformulating the problem by treating the prior as a design choice. This method achieves over 2x reduction in trajectory curvature compared to existing methods, improving generation quality in few-step regimes without altering the flow model. The approach integrates seamlessly with latent-space models and classifier-free guidance.

#AI Coding #Inference #Open Source

When Seeing Is Not Believing -- A Benchmark for Search-Grounded Video Misinformation Detection

Quick Take

Key Points

Article Content

Want this in your inbox every morning?

More from arXiv cs.CV

Optimal Transport Flow Matching by Design

Plan2Map: A Multimodal Benchmark for Document-Grounded Geospatial Boundary Reconstruction from Planning Records

Evi-Steer: Learning to Steer Biomedical Vision-Language Models through Efficient and Generalizable Evidential Tuning

Related in this space

The Importance of Out-of-Band Metadata for Safe Autonomous Agents: The Redpanda Agentic Data Plane

Aptiv to Deliver Production-Ready Edge AI with Long-Term Support with NVIDIA

TorqueAGI Announces Collaborations with NVIDIA, John Deere, and Dexterity to Advance Physical AI for Enterprise-Grade Robots