Immuno-VLM: Immunizing Large Vision-Language Models via Generative Semantic Antibodies for Open-World Trustworthiness
Quick Take
Immuno-VLM introduces a novel framework for enhancing the trustworthiness of large vision-language models by utilizing generative semantic antibodies to mitigate the 'Hubris of Semantics'. This approach outperforms traditional methods, achieving state-of-the-art results on ImageNet-1K and four OOD benchmarks.
Key Points
- Immuno-VLM adapts immunological principles for improved model robustness in open-world scenarios.
- Generative reasoning is used to create 'Semantic Antibodies' for identifying near-distribution outliers.
- Extensive testing shows Immuno-VLM sets a new state-of-the-art in open-set recognition tasks.
- The framework addresses the critical vulnerability of high-confidence misclassifications in unknown categories.
Article Content
From source RSS / original summaryarXiv:2605. 30745v1 Announce Type: new Abstract: Large Vision-Language Models have achieved unprecedented success in zero-shot recognition by aligning visual features with broad semantic concepts. However, this semantic abstraction creates a critical vulnerability in open-world deployment: the ``Hubris of Semantics'', where models force-fit unknown anomalies into known categories with high confidence due to the lack of explicit negative knowledge.
To address this \textit{Open-World Trustworthiness Paradox}, we propose \textbf{Immuno-VLM}, a bio-inspired framework that adapts the biological principle of \textbf{Immunological Negative Selection} to high-dimensional latent spaces.
Departing from traditional Open-Set Recognition methods that rely on passive density estimation or inefficient pixel-space outlier generation, Immuno-VLM leverages the generative reasoning of Large Language Models to actively hallucinate ``Semantic Antibodies'', textual descriptions of near-distribution outliers (e. g. , look-alikes, contextual anomalies) that effectively bound the decision space of known classes.
Extensive experiments on ImageNet-1K and four challenging OOD benchmarks reveal that Immuno-VLM establishes a new state-of-the-art.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →Evi-Steer: Learning to Steer Biomedical Vision-Language Models through Efficient and Generalizable Evidential Tuning
Evi-Steer introduces a novel evidential tuning framework for BiomedCLIP, enabling efficient fine-tuning with only 0.11% parameter updates. It significantly enhances performance in few-shot learning and domain shifts across 15 biomedical imaging datasets, demonstrating robustness for clinical applications.