MedFM-Robust: Benchmarking Robustness of Medical Foundation Models

arXiv cs.CV·Xiangxiang Cui, Tianjin Huang, Yifang Wang, Lijie Hu, Lu Yin

5/20/2026

·~2 min·5/20/2026·en·2

Quick Answer

This paper shows that MedFM-Robust benchmarks the robustness of medical foundation models (MedFMs) like LLaVA-Med and SAM-Med2D, emphasizing their clinical reliability.

Quick Take

MedFM-Robust benchmarks the robustness of medical foundation models (MedFMs) like LLaVA-Med and SAM-Med2D, emphasizing their clinical reliability. The study highlights the need for rigorous evaluation as these models are increasingly deployed in real-world healthcare applications.

Key Points

MedFMs include specialized models like LLaVA-Med and general-purpose models like GPT-4o.
Medical (Med-VLMs) excel in tasks like visual question answering and report generation.
The Segment Anything Model (SAM) has spurred new medical segmentation models like SAM-Med2D.
Rigorous evaluation is essential for the reliable deployment of these models in clinical settings.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

📖 Reader Mode

~2 min read

[Submitted on 18 May 2026]

View PDF HTML (experimental)

Abstract:Medical foundation models (MedFMs) have emerged as transformative tools in healthcare, demonstrating capabilities across diverse clinical applications. These models can be broadly categorized into two paradigms: Medical Vision-Language Models (Med-VLMs) and segmentation foundation models. Med-VLMs range from medical-specialized models such as LLaVA-Med and MedGemma, to general-purpose models like GPT-4o and Gemini, all capable of medical image understanding tasks including visual question answering (VQA), report generation, and visual grounding. Concurrently, the Segment Anything Model (SAM) has catalyzed a new generation of medical segmentation models, with adaptations like SAM-Med2D and MedSAM. The widespread clinical deployment of these models thus necessitates rigorous evaluation of their reliability under real-world conditions.

Comments:	MICCAI2026
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2605.19027 [cs.CV]
	(or arXiv:2605.19027v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2605.19027 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Yifang Wang [view email]
[v1] Mon, 18 May 2026 18:50:56 UTC (5,625 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Shahrzad Esmat, Chaunte W. Lacewell, Sameh Gobriel, Nilesh Jain, Ali Jannesari

4w ago

FeaturedOriginal

LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval

AI Summary

A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.

#LLM #Agent #Inference #AI Startup