Segmentation-Guided Spatial Indexing for Generalizable and Explainable Deepfake Detection

arXiv cs.CV·Izaldein Al-Zyoud, Abdulmotaleb El Saddik

3h ago

·~1 min·6/2/2026·en·0

Quick Take

The proposed segmentation-guided spatial indexing method enhances deepfake detection by focusing on semantically meaningful facial patches, achieving an AUC of 0.905 on Celeb-DF v2. This approach outperforms existing models like LipForensics and Xception without requiring fine-tuning or target-domain data. The method's effectiveness hinges on DINOv3's spatial consistency and selective regional analysis.

Key Points

Achieved AUC of 0.905 on Celeb-DF v2, outperforming LipForensics by 8.1 pp.
Utilizes DINOv3 ViT-L/16 for semantic labeling of facial patch tokens.
Method discards non-target tokens, focusing on relevant facial regions.
Replacing regional selection with CLS token drops AUC by 26.4 pp.
Both DINOv3 representation and spatial indexing are crucial for performance.

Article Content

From source RSS / original summary

arXiv:2606. 00098v1 Announce Type: new Abstract: We introduce segmentation-guided spatial indexing for generalizable and explainable deepfake detection. The key idea reverses the standard design order: rather than pooling all facial tokens and classifying afterward, we first select semantically meaningful patch tokens, then pool only those. A frozen FaRL parser assigns each DINOv3 ViT-L/16 patch token a semantic label; non-target tokens are discarded; a linear probe classifies the retained region.

This spatial indexing exploits DINOv3's patch-level spatial consistency, the same property that enables emergent segmentation, to present the probe with a purer regional subspace where manipulation-relevant evidence is less diluted by whole-face cues. Region attribution is structural: when the mouth model predicts fake, the decision used only mouth tokens, not an overlaid saliency map. On Celeb-DF v2, the mouth-indexed probe achieves AUC 0. 905, outperforming LipForensics (+8. 1 pp) and Xception (+16.

9 pp), with no DINOv3 or FaRL fine-tuning and no target-domain data. Ablations isolate the mechanism: replacing regional selection with DINOv3's CLS token drops Celeb-DF v2 AUC by 26. 4 pp; replacing DINOv3 with FaRL features drops it by 20. 9 pp. Both DINOv3 representation and the spatial index are independently necessary; neither alone approaches the full system.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Taha Koleilat, Hassan Rivaz, Yiming Xiao

6d ago

FeaturedOriginal

Evi-Steer: Learning to Steer Biomedical Vision-Language Models through Efficient and Generalizable Evidential Tuning

AI Summary

Evi-Steer introduces a novel evidential tuning framework for BiomedCLIP, enabling efficient fine-tuning with only 0.11% parameter updates. It significantly enhances performance in few-shot learning and domain shifts across 15 biomedical imaging datasets, demonstrating robustness for clinical applications.

#AI Coding #Inference #Open Source

Segmentation-Guided Spatial Indexing for Generalizable and Explainable Deepfake Detection

Quick Take

Key Points

Article Content

Want this in your inbox every morning?

More from arXiv cs.CV

Evi-Steer: Learning to Steer Biomedical Vision-Language Models through Efficient and Generalizable Evidential Tuning

Deep Learning-Based Automated Quantification of TIMI Myocardial Perfusion Frame Count (DL-TMPFC) from Coronary Angiography: A Novel Framework for Rapid Assessment of Microvascular Dysfunction

GeoSym127K: Scalable Symbolically-verifiable Synthesis for Multimodal Geometric Reasoning

Related in this space

The Importance of Out-of-Band Metadata for Safe Autonomous Agents: The Redpanda Agentic Data Plane

Got a Secret? LLM Agents Can't Keep It: Evaluating Privacy in Multi-Agent Systems

FORT Robotics Acquires Mapless AI to Expand Its Trust Platform with Remote Supervision and Active Safety Capabilities