Structured Visual Evidence Decomposition for Evidence-Grounded Multimodal Screening of Obstructive Sleep Apnea-Hypopnea Syndrome
Quick Take
EviOSAHS is a novel multimodal framework for screening obstructive sleep apnea-hypopnea syndrome, achieving 88.47% accuracy and 94.86% sensitivity. It decomposes facial images into seven anatomical queries, enhancing clinical decision-making while maintaining a low false-negative rate of 5.14%. This system serves as a triage assistant rather than a diagnostic tool, requiring further validation before clinical use.
Key Points
- EviOSAHS achieved 93.74% F1-score and 5.14% false-negative rate in a 642-subject cohort.
- The framework separates anatomical evidence acquisition from clinical adjudication for better stability.
- Seven-question visual decomposition was critical for achieving high sensitivity in screening.
- A question-level audit showed a 100% structured parse rate in visual outputs.
- EviOSAHS is intended as a triage assistant, not a standalone diagnostic system.
Article Content
From source RSS / original summaryarXiv:2606. 00087v1 Announce Type: new Abstract: Effective pre-polysomnography screening for obstructive sleep apnea-hypopnea syndrome (OSAHS) requires combining clinical risk factors with visible craniofacial and neck cues. Directly prompting general-purpose multimodal foundation models for medical yes/no decisions can yield unstable, poorly calibrated outputs.
We propose EviOSAHS, an evidence-grounded multimodal reasoning framework that separates image-only anatomical evidence acquisition from final clinical adjudication. Each frontal facial image is decomposed into seven fixed anatomical queries covering the neck, chin, mouth, face/neck fat, lower jaw, midface, and nose. Visual responses are converted into structured evidence cards recording target anatomy, visibility, risk direction, evidence strength, confidence, and a concise summary.
These cards are combined with a cleaned clinical profile only in the final stage, where a large language model performs balanced binary screening adjudication. We evaluated EviOSAHS on a 642-subject cohort, mapping normal subjects to screening-negative and mild, moderate, or severe OSAHS subjects to screening-positive. EviOSAHS achieved 88. 47% accuracy, 94. 86% sensitivity, 93. 74% F1-score, and a 5.
14% false-negative rate, outperforming clinical-only prompting, direct multimodal prompting, and naive two-stage pipelines under a unified protocol. Ablations showed that seven-question visual decomposition and balanced final adjudication were critical to the high-sensitivity operating point. A question-level audit of 4,494 visual outputs showed a 100% structured parse rate and 93. 88% high-visibility rate.
EviOSAHS provides an auditable, high-sensitivity workflow for binary pre-polysomnography OSAHS screening, but should be viewed as a triage assistant rather than a diagnostic system. Prospective validation, external testing, and calibrated operating-point control are needed before clinical deployment.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →Evi-Steer: Learning to Steer Biomedical Vision-Language Models through Efficient and Generalizable Evidential Tuning
Evi-Steer introduces a novel evidential tuning framework for BiomedCLIP, enabling efficient fine-tuning with only 0.11% parameter updates. It significantly enhances performance in few-shot learning and domain shifts across 15 biomedical imaging datasets, demonstrating robustness for clinical applications.