CoGeoAD: Hierarchical Color-Geometric Fusion with Multi-View Attention for Zero-Shot 3D Anomaly Detection

arXiv cs.CV·Ke Xu, Xinle Wang, Yanning Hou, Xueliang Ma, Juan Xie, Jianfeng Qiu

6d ago

·~2 min·6/25/2026·en·0

Quick Answer

CoGeoAD introduces a unified CLIP-based framework for zero-shot 3D anomaly detection, effectively fusing 2D color images and 3D geometric structures.

Quick Take

CoGeoAD introduces a unified CLIP-based framework for zero-shot 3D anomaly detection, effectively fusing 2D color images and 3D geometric structures. Its innovative Data-Driven Multi-View Attention mechanism and Multi-Stage Color-Geometric Fusion module achieve state-of-the-art performance on MVTec3D-AD and Eyecandies benchmarks, addressing critical industrial quality inspection challenges.

Key Points

CoGeoAD fuses 2D color and 3D geometric features for anomaly detection.
Utilizes Data-Driven Multi-View Attention for adaptive 3D feature aggregation.
Achieves state-of-the-art results on MVTec3D-AD and Eyecandies benchmarks.
Addresses the scarcity of labeled anomaly samples in industrial settings.
Source code available at https://github.com/kingdomShu/CoGeoAD.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

📖 Reader Mode

~2 min read

[Submitted on 24 Jun 2026]

View PDF HTML (experimental)

Abstract:Zero-shot 3D anomaly detection is essential for industrial quality inspection, where labeled anomaly samples are scarce. Meanwhile, existing methods lack an effective mechanism to fuse complementary 2D color images with 3D geometric structures, limiting their ability to detect both surface and structural defects in a unified framework. To address these issues, we propose CoGeoAD, a unified CLIP-based framework that fuses color and geometric features by constructing pixel-aligned paired multi-view images. The framework introduces a Data-Driven Multi-View Attention (MVA) mechanism to adaptively aggregate 3D features and a Multi-Stage Color-Geometric Fusion (MS-CGF) module to hierarchically integrate multi-level features from both modalities. Extensive experiments on the MVTec3D-AD and Eyecandies benchmarks demonstrate that CoGeoAD achieves state-of-the-art performance, effectively capturing both structural and textural anomalies in complex industrial scenarios. our source code is available at this https URL.

Comments:	ICML 2026
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2606.25273 [cs.CV]
	(or arXiv:2606.25273v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2606.25273 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Ke Xu [view email]
[v1] Wed, 24 Jun 2026 01:12:22 UTC (4,521 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Shahrzad Esmat, Chaunte W. Lacewell, Sameh Gobriel, Nilesh Jain, Ali Jannesari

3w ago

FeaturedOriginal

LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval

AI Summary

A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.

#LLM #Agent #Inference #AI Startup