YOLO-AMC: An Improved YOLO Architecture with Attention Mechanisms for Building Crack Detection

arXiv cs.CV·Ching-Yu Tsai, Chia-Min Lin, Chih-Hsiang Yang, Yung-Che Wang, Jen-Shiun Chiang

1d ago

·~2 min·6/12/2026·en·0

Quick Answer

The YOLO-AMC architecture enhances crack detection in infrastructure by integrating attention mechanisms, achieving mAP@0.5 of 0.9917, outperforming YOLOv11 and YOLOv8.

Quick Take

The YOLO-AMC architecture enhances crack detection in infrastructure by integrating attention mechanisms, achieving mAP@0.5 of 0.9917, outperforming YOLOv11 and YOLOv8. It maintains 110.95 FPS on an RTX 4090 and 5 FPS on a Raspberry Pi 5, demonstrating efficiency in deployment.

Key Points

YOLO-AMC introduces Global Attention Mechanism, Res-CBAM, and Shuffle Attention for improved feature integration.
Achieved mAP@0.5 of 0.9917, surpassing YOLOv11's 0.9833 and YOLOv8's 0.9707.
Maintains 7.6 GFLOPs computational complexity while achieving high frame rates.
Implementation code is available on GitHub for further research and application.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Article Content

From source RSS / original summary

arXiv:2606. 12958v1 Announce Type: new Abstract: Crack detection plays an important role in infrastructure inspection and Structural Health Monitoring (SHM). However, cracks typically appear as thin, low-contrast structures and are easily affected by background noise, posing challenges for existing object detection models.

This study proposes an improved YOLO-based architecture with integrated attention mechanisms, termed YOLO-AMC (YOLO with Attention Mechanisms for Crack Detection), to enhance automated crack detection performance.

Based on YOLOv11, the original C2PSA module is removed, and multiple attention mechanisms, including Global Attention Mechanism (GAM), Residual Convolutional Block Attention Module (Res-CBAM), and Shuffle Attention (SA), are introduced into the multi-scale feature fusion layers of the Neck to strengthen cross-scale feature integration. Experimental results demonstrate that YOLO-AMC consistently outperforms baseline models YOLOv11n and YOLOv8n across multiple evaluation metrics.

Among the evaluated attention modules, GAM achieves the best detection performance, obtaining mAP@0. 5 = 0. 9917 and mAP@0. 5:0. 95 = 0. 9506 on the test dataset, which are higher than those of YOLOv11 (0. 9833 / 0. 9112) and YOLOv8 (0. 9707 / 0. 8921). Furthermore, while maintaining a computational complexity of 7. 6 GFLOPs, the proposed model achieves 110.

95 FPS on an NVIDIA RTX 4090 platform and approximately 5 FPS on a Raspberry Pi 5 edge device, demonstrating a favorable trade-off between accuracy and deployment efficiency. The implementation code for this study is available on GitHub at https://github. com/CY-Tsai24/YOLO-AMC.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Shahrzad Esmat, Chaunte W. Lacewell, Sameh Gobriel, Nilesh Jain, Ali Jannesari

1w ago

FeaturedOriginal

LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval

AI Summary

A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.

#LLM #Agent #Inference #AI Startup

YOLO-AMC: An Improved YOLO Architecture with Attention Mechanisms for Building Crack Detection

Quick Answer

Quick Take

Key Points

Paper Resources

Article Content

Want this in your inbox every morning?

More from arXiv cs.CV

LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval

SlideCheck: Guiding Self-Supervised Pretraining of Pathology Foundation Models via Dataset Distributions

Biomazon: A Multimodal Dataset for 3D Forest Structure and Biomass Modeling in the Amazon Basin

Related in this space

Deploy Long-Context Reasoning and Agentic Workflows with MiniMax M3 on NVIDIA Accelerated Infrastructure

Deploy Self-Evolving Agents for Faster, More Secure Research with a Hermes Agent and NVIDIA NemoClaw

Run Local AI Agents with Faster Models and Multi-Node Clustering on NVIDIA DGX Spark