Erased but Exploitable: Black-box Embedding-Aware Prompting Against Unlearned Text-to-Image Diffusion Models

arXiv cs.CV·Arian Komaei Koma, Seyed Amir Kasaei, AmirMahdi Sadeghzadeh, Mohammad Hossein Rohban

3d ago

·~1 min·5/27/2026·en·0

Quick Take

The paper introduces BEAP, a black-box adversarial prompting attack that effectively exploits vulnerabilities in text-to-image diffusion models, achieving over 60% improvement in Attack Success Rate (ASR) with only fifteen prompts needed per attack. Unlike previous methods, BEAP generates undetectable prompts while maintaining high image quality, addressing gaps in existing threat models.

Key Points

BEAP leverages a large language model for iterative adversarial prompt generation.
Combines multiple reward signals for effective embedding-aware search in text space.
Achieves over 60% improvement in Attack Success Rate compared to prior methods.
Requires only an average of fifteen prompts for successful attacks.
Prompts remain undetectable to safety filters while producing high-quality images.

Article Content

From source RSS / original summary

arXiv:2605. 26332v1 Announce Type: new Abstract: Machine unlearning aims to remove specific concepts from pretrained text-to-image diffusion models, yet several white- and black-box attacks have been introduced to make the model generate such unlearned concepts. These attacks, nevertheless, do not assume a realistic threat model, i. e. they either assume access to the model weights, or result in gibberish adversarial prompts that could be easily detected even through naive rule-based safeguarding.

We aim to address this gap in this paper. We introduce BEAP, a black-box, embedding-aware adversarial prompting attack that leverages a large language model (LLM) to iteratively generate effective adversarial prompts and exploit such hidden vulnerabilities. BEAP performs an embedding-aware search in text space, combining multiple reward signals: unlearned concept presence, text-image alignment, and image quality, to refine generated prompts.

Unlike previous attack methods, BEAP keeps its prompts undetectable to safety filters while producing high-quality images. Extensive experiments show that BEAP improves the Attack Success Rate (ASR) by more than 60% over prior methods, while requiring only an average of fifteen prompts per successful attack. Warning: This paper contains model outputs that may be offensive or upsetting in nature.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Taha Koleilat, Hassan Rivaz, Yiming Xiao

3d ago

FeaturedOriginal

Evi-Steer: Learning to Steer Biomedical Vision-Language Models through Efficient and Generalizable Evidential Tuning

AI Summary

Evi-Steer introduces a novel evidential tuning framework for BiomedCLIP, achieving 0.11% parameter updates while enhancing uncertainty-aware fine-tuning. It outperforms state-of-the-art methods across 15 biomedical imaging datasets, proving effective in few-shot learning and domain shifts for clinical applications.

#AI Coding #Inference #Open Source

Erased but Exploitable: Black-box Embedding-Aware Prompting Against Unlearned Text-to-Image Diffusion Models

Quick Take

Key Points

Article Content

Want this in your inbox every morning?

More from arXiv cs.CV

Evi-Steer: Learning to Steer Biomedical Vision-Language Models through Efficient and Generalizable Evidential Tuning

Deep Learning-Based Automated Quantification of TIMI Myocardial Perfusion Frame Count (DL-TMPFC) from Coronary Angiography: A Novel Framework for Rapid Assessment of Microvascular Dysfunction

GeoSym127K: Scalable Symbolically-verifiable Synthesis for Multimodal Geometric Reasoning

Related in this space

The Importance of Out-of-Band Metadata for Safe Autonomous Agents: The Redpanda Agentic Data Plane

Got a Secret? LLM Agents Can't Keep It: Evaluating Privacy in Multi-Agent Systems

FORT Robotics Acquires Mapless AI to Expand Its Trust Platform with Remote Supervision and Active Safety Capabilities