DCSNet: Multiscale Feature Aggregation for Small Medical Object Segmentation with Detection-guided Hierarchical Cropping

arXiv cs.CV·Shanfeng Zhang, Bo Gou, Yue Cao, Lei Zhang, Zhang Yi, Tao He

1d ago

·~2 min·6/30/2026·en·0

Quick Answer

DCSNet introduces a novel approach for small medical object segmentation, utilizing Detection-guided Hierarchical Cropping and Multiscale Feature Aggregation to enhance boundary precision.

Quick Take

DCSNet introduces a novel approach for small medical object segmentation, utilizing Detection-guided Hierarchical Cropping and Multiscale Feature Aggregation to enhance boundary precision. Extensive experiments show DCSNet significantly outperforms existing methods across three medical datasets, addressing class imbalance and edge degradation effectively.

Key Points

DCSNet transforms global dense prediction into localized refinement for better segmentation.
Detection-guided Hierarchical Cropping filters background interference, improving class balance.
Multiscale Feature Aggregation captures semantic context and fine details for sharp boundaries.
Extensive tests show DCSNet outperforms state-of-the-art methods in boundary precision.
Robust solution for clinical micro-lesion segmentation across diverse medical datasets.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

📖 Reader Mode

~2 min read

[Submitted on 24 Jun 2026]

View PDF HTML (experimental)

Abstract:Small object segmentation in medical imaging is primarily hindered by class imbalance and inherent boundary complexity. Consequently, conventional global networks frequently fail to detect sparse targets or suffer from severe edge degradation. To overcome these limitations, we propose the Detection-guided Cropping Segmentation Network (DCSNet), an end-to-end framework that transforms global dense prediction into a localized refinement process. This framework integrates two core components, namely Detection-guided Hierarchical Cropping (DGHC) and Multiscale Feature Aggregation (MSFA). The DGHC module leverages region proposals to dynamically extract object-centric features, effdataectively filtering out massive background interference to mitigate class imbalance. Subsequently, the MSFA module operates strictly within these purified regions, synergizing a Transformer encoder with a pixel-adaptive fusion strategy. This mechanism dynamically aggregates multiscale features to capture both semantic context and fine-grained details for sharp boundary delineation. Extensive experiments across three diverse medical datasets demonstrate that DCSNet significantly outperforms existing state-of-the-art methods, yielding substantial improvements in boundary precision and offering a highly robust solution for clinical micro-lesion segmentation.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2606.28402 [cs.CV]
	(or arXiv:2606.28402v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2606.28402 arXiv-issued DOI via DataCite

Submission history

From: Tao He [view email]
[v1] Wed, 24 Jun 2026 12:46:58 UTC (381 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Shahrzad Esmat, Chaunte W. Lacewell, Sameh Gobriel, Nilesh Jain, Ali Jannesari

3w ago

FeaturedOriginal

LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval

AI Summary

A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.

#LLM #Agent #Inference #AI Startup