Dithering Defense: Adversarial Robustness of Vision Foundation Models via Multi-Level Floyd-Steinberg Dithering

arXiv cs.CV·Yury Belousov, Brian Pulfer, Vitaliy Kinakh, Slava Voloshynovskiy

3h ago

·~1 min·5/25/2026·en·0

Quick Take

Multi-level Floyd-Steinberg dithering enhances adversarial robustness in vision models while preserving semantic content.

Key Points

Evaluated across six tasks and two model families.
Surpasses baselines with less degradation on clean inputs.
Model-agnostic and lightweight input transformation.

Article Excerpt

From source RSS / original summary

arXiv:2605. 23065v1 Announce Type: new Abstract: Vision foundation models are widely used as frozen backbones across many downstream tasks, making them a single point of failure under adversarial attack. We study multi-level Floyd-Steinberg error-diffusion dithering as a lightweight, model-agnostic input transformation that disrupts adversarial perturbations while preserving semantic content.

Unlike prior work, which was limited to binary dithering, grayscale CIFAR-10, and a single small model trained from scratch, we evaluate across six tasks (classification, segmentation, depth estimation, retrieval, captioning, visual question answering), two model families (DINOv2, PaliGemma), and three attacks of increasing strength (PGD, MI-FGSM, SIA), as well as an adaptive attacker using a straight-through estimator.

Our results show that Floyd-Steinberg dithering at intermediate quantization levels, especially when combined with post-processing blur, exceeds or matches all tested baselines, including diffusion-based denoising, with substantially less degradation on clean inputs.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

Dithering Defense: Adversarial Robustness of Vision Foundation Models via Multi-Level Floyd-Steinberg Dithering

Quick Take

Key Points

Article Excerpt

Want this in your inbox every morning?

More from arXiv cs.CV

GeoSym127K: Scalable Symbolically-verifiable Synthesis for Multimodal Geometric Reasoning

Flow Mismatching: Unsupervised Anomaly Detection via Velocity Discrepancies in Flow Matching Models

Structuring Open-Ended NAS: Semi-Automated Design Knowledge Structuring with LLMs for Efficient Neural Architecture Search

Related in this space

Tool-Augmented Agent for Closed-loop Optimization,Simulation,and Modeling Orchestration

AutoRPA: Efficient GUI Automation through LLM-Driven Code Synthesis from Interactions

Evaluating Temporal Semantic Caching and Workflow Optimization in Agentic Plan-Execute Pipelines