Improved Belief-Attention in Vision Task
Quick Take
The paper introduces Belief2-Attention, an enhancement of the original Belief-Attention, which incorporates both perpendicular and projected components to improve token correlation in Transformers. This new module has been validated for effectiveness in image classification and segmentation tasks, demonstrating superior expressiveness over standard attention mechanisms.
Key Points
- Belief2-Attention enhances token correlation using both perpendicular and projected components.
- The projected component is processed through an activation function and linear mapping.
- Belief2-Attention is validated for image classification and segmentation tasks.
- The new module captures richer token correlations than standard attention mechanisms.
- An ablation study shows the projected component carries significant information.
Article Content
From source RSS / original summaryarXiv:2606. 00077v1 Announce Type: new Abstract: Recently, Belief-Attention \cite{Guoqiang25BeliefAttention} has been proposed by first performing an orthogonal projection of the softmax-based weighted summation of $V$ vectors with respect to the original $V$ vectors and then taking the perpendicular component as the residual signal in Transformer for performance improvement.
In this paper, we first conduct an ablation study showing the projected component also carries information about the token correlation, which should not be ignored. We then propose to extend Belief-Attention by making use of both the perpendicular and projected components. In particular, the projected component goes through certain activation function and then a linear mapping before merging with the considered token.
Conceptually speaking, the neural block for the projected component can be viewed as a two-layer feedforward network (FFN) within the new attention block. It is also noted that standard attention captures the token correlation via the inner-product matrix $QK^T$. We propose to introduce an additional inner-product matrix $ZZ^T$ to $QK^T$ to capture richer token correlation. We refer to the new module as Belief2-Attention. It can be easily shown that Belief2-Attention is more expressive than standard Attention.
We then verify the effectiveness of Belief2-Attention for vision tasks of image classification and segmentation.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →Evi-Steer: Learning to Steer Biomedical Vision-Language Models through Efficient and Generalizable Evidential Tuning
Evi-Steer introduces a novel evidential tuning framework for BiomedCLIP, enabling efficient fine-tuning with only 0.11% parameter updates. It significantly enhances performance in few-shot learning and domain shifts across 15 biomedical imaging datasets, demonstrating robustness for clinical applications.