A Mechanistic Analysis of Adversarial Fine-tuning of Vision Transformers
Quick Answer
This study investigates adversarial fine-tuning of Vision Transformers (ViTs) to enhance robustness against image perturbations.
Quick Take
This study investigates adversarial fine-tuning of Vision Transformers (ViTs) to enhance robustness against image perturbations. While fine-tuning improves performance on familiar corruptions, it fails to generalize to unseen types. The analysis reveals changes in attention mechanisms but no fundamental shifts in sparse representations.
Key Points
- Adversarial fine-tuning improves ViT performance on familiar image corruptions.
- Improvements do not transfer to unseen types of image perturbations.
- Changes in attention mechanisms were observed during the analysis.
- No fundamental changes in sparse representations of ViTs were found.
- Study emphasizes the need for robust models in high-risk applications.
Article Content
From source RSS / original summaryarXiv:2606. 07593v1 Announce Type: new Abstract: The widespread use of image classification models in high-risk, real-world situations necessitates making these models robust to slight disturbances or perturbations, such as blurring or sharpening, in the input images. While vision transformers (ViTs) play an integral role in many modern-day multi-modal models like Vision-Language-Models (VLMs) and Vision-Language-Action (VLA) models, they have received a lack of attention in the setting of robustness.
In this work, we analyze the effects of adversarial fine-tuning, a popular method for improving model robustness to image perturbations, on a ViT's performance on perturbed and regular images through a mechanistic lens. We adversarially train a ViT on low-frequency and high-frequency image corruptions, and attempt to explain changes in downstream model performance through an examination of the model's attention mechanisms, internal representations, and knowledge evolution.
Overall, our results suggest that, while fine-tuning on inputs with common corruptions improves model performance and certainty on new instances of corrupted data, these improvements do not transfer to other classes of corruptions not seen in the training. Additionally, despite observing changes in visual attention and knowledge evolution across layers, we found that adversarial training did not lead to fundamental changes in the sparse representations learned by ViTs.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval
A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.
