MedFM-Robust: Benchmarking Robustness of Medical Foundation Models
Quick Take
MedFM-Robust benchmarks the reliability of medical foundation models in clinical applications.
Key Points
- Categorizes models into Med-VLMs and segmentation models.
- Highlights models like LLaVA-Med and SAM-Med2D.
- Emphasizes the need for rigorous real-world evaluations.
📖 Reader Mode
~2 min readAbstract:Medical foundation models (MedFMs) have emerged as transformative tools in healthcare, demonstrating capabilities across diverse clinical applications. These models can be broadly categorized into two paradigms: Medical Vision-Language Models (Med-VLMs) and segmentation foundation models. Med-VLMs range from medical-specialized models such as LLaVA-Med and MedGemma, to general-purpose models like GPT-4o and Gemini, all capable of medical image understanding tasks including visual question answering (VQA), report generation, and visual grounding. Concurrently, the Segment Anything Model (SAM) has catalyzed a new generation of medical segmentation models, with adaptations like SAM-Med2D and MedSAM. The widespread clinical deployment of these models thus necessitates rigorous evaluation of their reliability under real-world conditions.
| Comments: | MICCAI2026 |
| Subjects: | Computer Vision and Pattern Recognition (cs.CV) |
| Cite as: | arXiv:2605.19027 [cs.CV] |
| (or arXiv:2605.19027v1 [cs.CV] for this version) | |
| https://doi.org/10.48550/arXiv.2605.19027 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Yifang Wang [view email]
[v1]
Mon, 18 May 2026 18:50:56 UTC (5,625 KB)
— Originally published at arxiv.org
More from arXiv cs.CV
See more →GeoSym127K: Scalable Symbolically-verifiable Synthesis for Multimodal Geometric Reasoning
GeoSym127K introduces a scalable neuro-symbolic framework for enhanced geometric reasoning in multimodal models.