AI for Maritime Security: Comparative Evaluation of CNN and Vision Transformer Architectures for Maritime Object Detection
Quick Answer
This study evaluates six deep learning models for maritime object detection, including CNNs and Vision Transformers.
Quick Take
This study evaluates six deep learning models for maritime object detection, including CNNs and Vision Transformers. The Vision Transformer achieved 100% accuracy with the fastest processing time, highlighting its potential for maritime security applications.
Key Points
- Used a dataset of 6,468 maritime images under various weather conditions.
- Evaluated six architectures: CNN, Xception, VGG16, MobileNetV2, EfficientNetV2L, and Vision Transformer.
- Vision Transformer outperformed others with 100% accuracy and lowest error rates.
- Performance varies based on computational constraints and deployment conditions.
- Lightweight models are suitable for resource-limited devices.
Paper Resources
Article Content
From source RSS / original summaryarXiv:2606. 14720v1 Announce Type: new Abstract: This study aims to enhance maritime security by using advanced Artificial Intelligence (AI) and Computer Vision (CV) techniques. For this purpose, it was designed and assessed intelligent object detection systems that can detect the presence of ships on the sea surface under different real-time environments.
To achieve this goal, a maritime image dataset with 6,468 images was used, covering different weather conditions like cloudy, foggy, rainy, and sunny environments. Six deep learning architectures were evaluated, including a base Convolutional Neural Network (CNN) model, four transfer learning models (Xception, VGG16, MobileNetV2, and EfficientNetV2L), and a Vision Transformer (ViT) model.
The models were compared using multiple performance indicators, including accuracy, Type I and Type II errors, model size, and video processing time. The results show that model performance varies depending on computational constraints and deployment conditions. While lightweight architectures are suitable for resource-limited devices, the ViT achieved the best overall performance, reaching 100% accuracy with the lowest error rates and the fastest video processing time.
The findings highlight the potential of AI-driven computer vision systems for maritime surveillance, border protection, and autonomous navigation.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval
A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.