Immuno-VLM: Immunizing Large Vision-Language Models via Generative Semantic Antibodies for Open-World Trustworthiness

arXiv cs.CV·Xiang Fang, Wanlong Fang, Wei Ji

5h ago

·~1 min·6/1/2026·en·0

Quick Take

Immuno-VLM introduces a novel framework for enhancing the trustworthiness of large vision-language models by utilizing generative semantic antibodies to mitigate the 'Hubris of Semantics'. This approach outperforms traditional methods, achieving state-of-the-art results on ImageNet-1K and four OOD benchmarks.

Key Points

Immuno-VLM adapts immunological principles for improved model robustness in open-world scenarios.
Generative reasoning is used to create 'Semantic Antibodies' for identifying near-distribution outliers.
Extensive testing shows Immuno-VLM sets a new state-of-the-art in open-set recognition tasks.
The framework addresses the critical vulnerability of high-confidence misclassifications in unknown categories.

Article Content

From source RSS / original summary

arXiv:2605. 30745v1 Announce Type: new Abstract: Large Vision-Language Models have achieved unprecedented success in zero-shot recognition by aligning visual features with broad semantic concepts. However, this semantic abstraction creates a critical vulnerability in open-world deployment: the ``Hubris of Semantics'', where models force-fit unknown anomalies into known categories with high confidence due to the lack of explicit negative knowledge.

To address this \textit{Open-World Trustworthiness Paradox}, we propose \textbf{Immuno-VLM}, a bio-inspired framework that adapts the biological principle of \textbf{Immunological Negative Selection} to high-dimensional latent spaces.

Departing from traditional Open-Set Recognition methods that rely on passive density estimation or inefficient pixel-space outlier generation, Immuno-VLM leverages the generative reasoning of Large Language Models to actively hallucinate ``Semantic Antibodies'', textual descriptions of near-distribution outliers (e. g. , look-alikes, contextual anomalies) that effectively bound the decision space of known classes.

Extensive experiments on ImageNet-1K and four challenging OOD benchmarks reveal that Immuno-VLM establishes a new state-of-the-art.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Taha Koleilat, Hassan Rivaz, Yiming Xiao

5d ago

FeaturedOriginal

Evi-Steer: Learning to Steer Biomedical Vision-Language Models through Efficient and Generalizable Evidential Tuning

AI Summary

Evi-Steer introduces a novel evidential tuning framework for BiomedCLIP, enabling efficient fine-tuning with only 0.11% parameter updates. It significantly enhances performance in few-shot learning and domain shifts across 15 biomedical imaging datasets, demonstrating robustness for clinical applications.

#AI Coding #Inference #Open Source