Cross-Source Supervision for Bone Infection Segmentation in Dual-Modality PET-CT
Quick Take
This study presents a dual-modality PET-CT framework for improved bone infection segmentation amidst annotation discrepancies.
Key Points
- Integrates PET and CT for enhanced diagnostic accuracy.
- Employs a decoupled dual-source learning framework.
- Demonstrates effective multimodal fusion through rigorous evaluation.
📖 Reader Mode
~2 min readAbstract:Early and accurate diagnosis and lesion localization of bone infections are crucial for clinical treatment. PET-CT integrates anatomical information from CT with metabolic information from PET, making it an important imaging modality for diagnosing bone infections. However, accurate lesion segmentation remains challenging due to indistinct lesion boundaries and inconsistencies in annotations generated by different experts or automated systems. In this work, we investigate multimodal segmentation of bone infections under annotation discrepancy. We develop a bimodal end-to-end segmentation framework that integrates PET metabolic signals and CT bone-window anatomy through an early-fusion multimodal this http URL mitigate performance inflation caused by inter-slice correlation in small datasets, this study discards traditional two-dimensional evaluation methods and implements a rigorous patient-level 3D volumetric evaluation and cross-validation. Furthermore, instead of forcing a singular consensus, we propose a decoupled dual-source learning framework where parallel models are trained on independent expert annotations driven by high-sensitivity and high-specificity clinical intents. Experimental results objectively report performance variations at the patient level (Mean + SD and Mean - SD), demonstrating the effectiveness of multimodal PET-CT fusion. The cross-evaluation matrix quantitatively reveals how models successfully internalize distinct expert diagnostic philosophies, providing a robust, diversity-preserving paradigm for clinical AI deployment in bone infection segmentation.
| Subjects: | Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG) |
| Cite as: | arXiv:2605.16373 [cs.CV] |
| (or arXiv:2605.16373v1 [cs.CV] for this version) | |
| https://doi.org/10.48550/arXiv.2605.16373 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Zonglin Yang [view email]
[v1]
Sun, 10 May 2026 17:22:42 UTC (766 KB)
— Originally published at arxiv.org
More from arXiv cs.CV
See more →GeoSym127K: Scalable Symbolically-verifiable Synthesis for Multimodal Geometric Reasoning
GeoSym127K introduces a scalable neuro-symbolic framework for enhanced geometric reasoning in multimodal models.