YOLO-AMC: An Improved YOLO Architecture with Attention Mechanisms for Building Crack Detection
Quick Answer
The YOLO-AMC architecture enhances crack detection in infrastructure by integrating attention mechanisms, achieving mAP@0.5 of 0.9917, outperforming YOLOv11 and YOLOv8.
Quick Take
The YOLO-AMC architecture enhances crack detection in infrastructure by integrating attention mechanisms, achieving mAP@0.5 of 0.9917, outperforming YOLOv11 and YOLOv8. It maintains 110.95 FPS on an RTX 4090 and 5 FPS on a Raspberry Pi 5, demonstrating efficiency in deployment.
Key Points
- YOLO-AMC introduces Global Attention Mechanism, Res-CBAM, and Shuffle Attention for improved feature integration.
- Achieved mAP@0.5 of 0.9917, surpassing YOLOv11's 0.9833 and YOLOv8's 0.9707.
- Maintains 7.6 GFLOPs computational complexity while achieving high frame rates.
- Implementation code is available on GitHub for further research and application.
Paper Resources
Article Content
From source RSS / original summaryarXiv:2606. 12958v1 Announce Type: new Abstract: Crack detection plays an important role in infrastructure inspection and Structural Health Monitoring (SHM). However, cracks typically appear as thin, low-contrast structures and are easily affected by background noise, posing challenges for existing object detection models.
This study proposes an improved YOLO-based architecture with integrated attention mechanisms, termed YOLO-AMC (YOLO with Attention Mechanisms for Crack Detection), to enhance automated crack detection performance.
Based on YOLOv11, the original C2PSA module is removed, and multiple attention mechanisms, including Global Attention Mechanism (GAM), Residual Convolutional Block Attention Module (Res-CBAM), and Shuffle Attention (SA), are introduced into the multi-scale feature fusion layers of the Neck to strengthen cross-scale feature integration. Experimental results demonstrate that YOLO-AMC consistently outperforms baseline models YOLOv11n and YOLOv8n across multiple evaluation metrics.
Among the evaluated attention modules, GAM achieves the best detection performance, obtaining mAP@0. 5 = 0. 9917 and mAP@0. 5:0. 95 = 0. 9506 on the test dataset, which are higher than those of YOLOv11 (0. 9833 / 0. 9112) and YOLOv8 (0. 9707 / 0. 8921). Furthermore, while maintaining a computational complexity of 7. 6 GFLOPs, the proposed model achieves 110.
95 FPS on an NVIDIA RTX 4090 platform and approximately 5 FPS on a Raspberry Pi 5 edge device, demonstrating a favorable trade-off between accuracy and deployment efficiency. The implementation code for this study is available on GitHub at https://github. com/CY-Tsai24/YOLO-AMC.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval
A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.


