Learning Dynamic Structural Specialization for Underwater Salient Object Detection
Quick Take
DSS-USOD enhances underwater salient object detection by dynamically coordinating structural features for improved accuracy.
Key Points
- Introduces dynamic structural specialization for USOD.
- Balances boundary precision and region coherence.
- Demonstrates superior performance in real-world underwater applications.
📖 Reader Mode
~2 min readAbstract:Underwater salient object detection (USOD) has attracted increasing attention for underwater visual scene understanding and vision-guided robotic applications. However, existing USOD methods still struggle with underwater image degradations, which often lead to inaccurate object localization, fragmented salient regions, and coarse boundary prediction. To address these challenges, this paper proposes DSS-USOD, a novel RGB-based USOD method built upon dynamic structural specialization. DSS-USOD extracts a shared base representation from a single underwater image, decomposes it into boundary-sensitive and region-coherent structural features, and dynamically coordinates their contributions according to local structural context. Specifically, the extracted shared base representation is decomposed into a boundary-sensitive branch for modeling fine-grained boundary details and a region-coherent branch for capturing region-level structural consistency. A spatial coordination module is then introduced to adaptively regulate the relative contributions of the two branches according to local structural context. Moreover, cooperative structural supervision is introduced to promote branch specialization and stabilize spatial coordination, enabling DSS-USOD to better balance boundary precision and region coherence under degraded underwater conditions. Extensive experiments show that DSS-USOD achieves superior performance on benchmark datasets. Finally, real-world deployment on an underwater robot validates the practical effectiveness of DSS-USOD for underwater object inspection.
| Comments: | 15 pages |
| Subjects: | Computer Vision and Pattern Recognition (cs.CV) |
| Cite as: | arXiv:2605.15535 [cs.CV] |
| (or arXiv:2605.15535v1 [cs.CV] for this version) | |
| https://doi.org/10.48550/arXiv.2605.15535 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Lin Hong [view email]
[v1]
Fri, 15 May 2026 02:14:10 UTC (48,069 KB)
— Originally published at arxiv.org
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →GeoSym127K: Scalable Symbolically-verifiable Synthesis for Multimodal Geometric Reasoning
GeoSym127K introduces a scalable neuro-symbolic framework for enhanced geometric reasoning in multimodal models.