VisualLeakBench: Reproducible Action-Boundary Propagation Failures in Vision-Language Agents

arXiv cs.CV·Youting Wang, Yuan Tang, Yitian Qian, Chen Zhao

2h ago

·~1 min·6/9/2026·en·0

Quick Answer

VisualLeakBench reveals that vision-language models (VLMs) propagate sensitive text into tool arguments, with 78.8% of PII cases affected.

Quick Take

VisualLeakBench reveals that vision-language models (VLMs) propagate sensitive text into tool arguments, with 78.8% of PII cases affected. Under defensive prompts, PII propagation drops to 2.0%, but unsafe text still crosses boundaries at 52.6%. This benchmark evaluates four VLM systems across diverse scenarios, highlighting significant risks in action-boundary propagation.

Key Points

VisualLeakBench includes a 500-image benchmark covering various UI and document scenarios.
PII tool argument propagation occurs in 78.8% of cases without defensive prompts.
Defensive prompts reduce PII propagation to 2.0%, but unsafe text remains at 52.6%.
Propagation rates vary by tool type, with search tools suppressing PII leakage.
Most failures occur at the tool boundary, indicating residual risks in response-side leakage.

Article Content

From source RSS / original summary

arXiv:2606. 07595v1 Announce Type: new Abstract: Vision-language agents increasingly consume screenshots, documents, and user interfaces before writing to memory, sending messages, or invoking external tools. We study a concrete failure mode in this setting: action-boundary propagation, where sensitive or unsafe visible text is copied from an image into downstream tool arguments.

We present VisualLeakBench, a diversified 500-image benchmark spanning UI, chat, document, form, and dashboard scenes, and evaluate a stratified 100-image agent subset with four production VLM systems under two workflows: note capture and external handoff. At baseline, target strings are propagated into tool arguments in 78. 8% of PII cases and 85. 5% of rendered unsafe-text cases. Under a defensive system prompt, rendered unsafe-text propagation remains high at 52. 6%, while PII tool propagation falls to 2.

0%, largely by suppressing rather than preserving utility. Rates are tool-surface dependent: search-like tools suppress PII propagation, but rendered unsafe text still crosses tool boundaries. We measure visual-to-tool propagation rather than downstream instruction execution. We additionally provide a labeled-target oracle upper-bound diagnostic that localizes most failures at the tool boundary while leaving response-side leakage as residual risk.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Shahrzad Esmat, Chaunte W. Lacewell, Sameh Gobriel, Nilesh Jain, Ali Jannesari

4d ago

FeaturedOriginal

LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval

AI Summary

A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.

#LLM #Agent #Inference #AI Startup

VisualLeakBench: Reproducible Action-Boundary Propagation Failures in Vision-Language Agents

Quick Answer

Quick Take

Key Points

Article Content

Want this in your inbox every morning?

More from arXiv cs.CV

LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval

SlideCheck: Guiding Self-Supervised Pretraining of Pathology Foundation Models via Dataset Distributions

Biomazon: A Multimodal Dataset for 3D Forest Structure and Biomass Modeling in the Amazon Basin

Related in this space

Deploy Self-Evolving Agents for Faster, More Secure Research with a Hermes Agent and NVIDIA NemoClaw

The Importance of Out-of-Band Metadata for Safe Autonomous Agents: The Redpanda Agentic Data Plane

Got a Secret? LLM Agents Can't Keep It: Evaluating Privacy in Multi-Agent Systems