VisualLeakBench: Reproducible Action-Boundary Propagation Failures in Vision-Language Agents
Quick Answer
VisualLeakBench reveals that vision-language models (VLMs) propagate sensitive text into tool arguments, with 78.8% of PII cases affected.
Quick Take
VisualLeakBench reveals that vision-language models (VLMs) propagate sensitive text into tool arguments, with 78.8% of PII cases affected. Under defensive prompts, PII propagation drops to 2.0%, but unsafe text still crosses boundaries at 52.6%. This benchmark evaluates four VLM systems across diverse scenarios, highlighting significant risks in action-boundary propagation.
Key Points
- VisualLeakBench includes a 500-image benchmark covering various UI and document scenarios.
- PII tool argument propagation occurs in 78.8% of cases without defensive prompts.
- Defensive prompts reduce PII propagation to 2.0%, but unsafe text remains at 52.6%.
- Propagation rates vary by tool type, with search tools suppressing PII leakage.
- Most failures occur at the tool boundary, indicating residual risks in response-side leakage.
Article Content
From source RSS / original summaryarXiv:2606. 07595v1 Announce Type: new Abstract: Vision-language agents increasingly consume screenshots, documents, and user interfaces before writing to memory, sending messages, or invoking external tools. We study a concrete failure mode in this setting: action-boundary propagation, where sensitive or unsafe visible text is copied from an image into downstream tool arguments.
We present VisualLeakBench, a diversified 500-image benchmark spanning UI, chat, document, form, and dashboard scenes, and evaluate a stratified 100-image agent subset with four production VLM systems under two workflows: note capture and external handoff. At baseline, target strings are propagated into tool arguments in 78. 8% of PII cases and 85. 5% of rendered unsafe-text cases. Under a defensive system prompt, rendered unsafe-text propagation remains high at 52. 6%, while PII tool propagation falls to 2.
0%, largely by suppressing rather than preserving utility. Rates are tool-surface dependent: search-like tools suppress PII propagation, but rendered unsafe text still crosses tool boundaries. We measure visual-to-tool propagation rather than downstream instruction execution. We additionally provide a labeled-target oracle upper-bound diagnostic that localizes most failures at the tool boundary while leaving response-side leakage as residual risk.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval
A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.
