DCSNet: Multiscale Feature Aggregation for Small Medical Object Segmentation with Detection-guided Hierarchical Cropping
Quick Answer
DCSNet introduces a novel approach for small medical object segmentation, utilizing Detection-guided Hierarchical Cropping and Multiscale Feature Aggregation to enhance boundary precision.
Quick Take
DCSNet introduces a novel approach for small medical object segmentation, utilizing Detection-guided Hierarchical Cropping and Multiscale Feature Aggregation to enhance boundary precision. Extensive experiments show DCSNet significantly outperforms existing methods across three medical datasets, addressing class imbalance and edge degradation effectively.
Key Points
- DCSNet transforms global dense prediction into localized refinement for better segmentation.
- Detection-guided Hierarchical Cropping filters background interference, improving class balance.
- Multiscale Feature Aggregation captures semantic context and fine details for sharp boundaries.
- Extensive tests show DCSNet outperforms state-of-the-art methods in boundary precision.
- Robust solution for clinical micro-lesion segmentation across diverse medical datasets.
Paper Resources
📖 Reader Mode
~2 min readAbstract:Small object segmentation in medical imaging is primarily hindered by class imbalance and inherent boundary complexity. Consequently, conventional global networks frequently fail to detect sparse targets or suffer from severe edge degradation. To overcome these limitations, we propose the Detection-guided Cropping Segmentation Network (DCSNet), an end-to-end framework that transforms global dense prediction into a localized refinement process. This framework integrates two core components, namely Detection-guided Hierarchical Cropping (DGHC) and Multiscale Feature Aggregation (MSFA). The DGHC module leverages region proposals to dynamically extract object-centric features, effdataectively filtering out massive background interference to mitigate class imbalance. Subsequently, the MSFA module operates strictly within these purified regions, synergizing a Transformer encoder with a pixel-adaptive fusion strategy. This mechanism dynamically aggregates multiscale features to capture both semantic context and fine-grained details for sharp boundary delineation. Extensive experiments across three diverse medical datasets demonstrate that DCSNet significantly outperforms existing state-of-the-art methods, yielding substantial improvements in boundary precision and offering a highly robust solution for clinical micro-lesion segmentation.
| Subjects: | Computer Vision and Pattern Recognition (cs.CV) |
| Cite as: | arXiv:2606.28402 [cs.CV] |
| (or arXiv:2606.28402v1 [cs.CV] for this version) | |
| https://doi.org/10.48550/arXiv.2606.28402 arXiv-issued DOI via DataCite |
Submission history
From: Tao He [view email]
[v1]
Wed, 24 Jun 2026 12:46:58 UTC (381 KB)
— Originally published at arxiv.org
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval
A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.