Extracting Training Data from Diffusion Language Models via Infilling

arXiv cs.CL·Yihan Wang, N. Asokan

5/26/2026

·~2 min·5/26/2026·en·2

Quick Answer

Quick Take

The study introduces 'infilling extraction' for diffusion language models (DLMs) like LLaDA-8B and Dream-7B, revealing that edge-conditioned masks can extract up to three times more verbatim sequences than prefix-conditioned methods. This highlights the significant risk of training data extraction, especially for personally identifiable information, outperforming autoregressive models in recall metrics.

Key Points

Infilling extraction allows arbitrary binary masks for data extraction in DLMs.
Edge-conditioned masks outperform prefix-conditioned ones, extracting three times more data.
Bidirectional access in DLMs reveals channels unavailable in autoregressive models.
Adversaries can extract redacted email addresses more effectively from DLMs.
Tunable decoding parameters significantly impact extraction performance.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Article Content

From source RSS / original summary

arXiv:2605. 24173v1 Announce Type: new Abstract: Memorization in large language models has been studied almost exclusively through prefix-conditioned extraction, a natural choice for autoregressive models. However, diffusion language models (DLMs) can denoise masked tokens at arbitrary positions. Thus, prefix-only probing reveals only one facet of memorization in DLMs and significantly underestimates the risk of training-data extraction.

In order to realistically model extractability of training data in DLMs, we introduce \emph{infilling extraction}, a data-extraction protocol parameterized by an arbitrary binary mask that subsumes prefix-only probing and accounts for the bidirectional inductive bias of DLMs.

Instantiating it on LLaDA-8B and Dream-7B across five extraction modes, three training pipelines, and three corpora covering verbatim and partial leakage, we find that mask geometry governs extractability: edge-conditioned masks \emph{extract up to three times more} verbatim sequences than prefix-conditioned ones, and bidirectional access opens channels inaccessible in autoregressive models.

In particular, we show that a realistic adversary with access to training data where personally identifiable information has been redacted, can even achieve higher recall on extracting redacted email addresses from DLMs than from scale-matched autoregressive models. Tunable parameters for decoding measurably affect extraction performance, while a follow-up supervised finetuning stage does not eliminate the prior memorization.

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CL

See more →

arXiv cs.CL·Barak Or

2w ago

FeaturedOriginal

Quantifying Prior Dominance in Systems

AI Summary

The study introduces the Normalized Context Utilization (NCU) metric to evaluate Retrieval-Augmented Generation (RAG) systems, revealing that Small Language Models (SLMs) outperform larger models in factual extraction. The findings indicate that traditional scaling laws yield diminishing returns, with a commercial API frequently failing against adversarial evidence due to systemic confidence collapse.

#LLM #AI Coding #Inference #AI Startup

Extracting Training Data from Diffusion Language Models via Infilling

Quick Answer

Quick Take

Key Points

Paper Resources

Article Content

Want this in your inbox every morning?

More from arXiv cs.CL

Quantifying Prior Dominance in Systems

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

Tool-Making and Self-Evolving LLM Agents in Low-Latency Systems

Quick Answer

Quick Take

Key Points

Paper Resources

Article Content

Want this in your inbox every morning?

More from arXiv cs.CL

Quantifying Prior Dominance in RAG Systems

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

Tool-Making and Self-Evolving LLM Agents in Low-Latency Systems

Quantifying Prior Dominance in Systems