Multi-Label Test-Time Adaptation with Bayesian Conditional Priors

arXiv cs.CV·Qiru Li, Ao Zhou, Zhiwei Jiang, Zifeng Cheng, Cong Wang, Yafeng Yin, Qing Gu

1d ago

·~1 min·6/12/2026·en·1

Quick Answer

The paper introduces Bayesian Conditional Priors (BCP) for multi-label recognition using frozen Vision-Language Models, significantly improving performance on benchmarks like RN50 and ViT-B/16.

Quick Take

The paper introduces Bayesian Conditional Priors (BCP) for multi-label recognition using frozen Vision-Language Models, significantly improving performance on benchmarks like RN50 and ViT-B/16. BCP enhances average mAP from 57.31 to 69.22 and 62.61 to 71.79 respectively, without requiring target annotations.

Key Points

BCP adapts multi-label recognition without tuning the backbone model.
It uses zero-shot logits as proxies for marginal posteriors.
The method improves performance on standard multi-label benchmarks.
BCP enhances RN50 and ViT-B/16 mAP scores significantly.
It operates with negligible overhead using unlabeled test data.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Article Content

From source RSS / original summary

arXiv:2606. 12925v1 Announce Type: new Abstract: Multi-label recognition with frozen Vision-Language Models (VLMs) is brittle under distribution shift: standard zero-shot inference scores labels independently, ignoring co-occurrence structure and producing incoherent label sets where dominant concepts suppress weaker but compatible labels. We introduce Bayesian Conditional Priors (BCP) Estimation, a gradient-free test-time adaptation method that injects label dependency without tuning the backbone.

BCP views zero-shot logits as a proxy for marginal posteriors under a fixed image-text likelihood and attributes shift-induced errors mainly to a mismatched label prior. For each test image, it selects a high-confidence anchor label and applies an anchor-conditioned Bayesian refinement. This update is closed-form in logit space and admits a pointwise mutual information (PMI) interpretation, explicitly promoting compatible labels and suppressing incompatible ones.

BCP operates without target annotations by estimating anchor-conditioned priors online from the unlabeled test stream via lightweight second-order co-occurrence statistics, adding negligible overhead beyond a single forward pass. Across standard multi-label benchmarks and multiple CLIP backbones, BCP consistently outperforms strong TTA baselines, e. g. , improving RN50 average mAP from 57. 31 to 69. 22 and ViT-B/16 from 62. 61 to 71. 79.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Shahrzad Esmat, Chaunte W. Lacewell, Sameh Gobriel, Nilesh Jain, Ali Jannesari

1w ago

FeaturedOriginal

LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval

AI Summary

A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.

#LLM #Agent #Inference #AI Startup