Detect, Remask, Repair: Diffusion Editing for Faithful Summarization of Evolving Contexts
Quick Answer
This paper shows that The DETECT-REMASK-REPAIR framework utilizes diffusion models to update outdated summaries while preserving supported content, achieving significant efficiency with repair times under half a second.
Quick Take
The DETECT-REMASK-REPAIR framework utilizes diffusion models to update outdated summaries while preserving supported content, achieving significant efficiency with repair times under half a second. Evaluated on DialogSum and the new StreamSum benchmark, it demonstrates improved faithfulness and controllability in evolving-context summarization.
Key Points
- Localized faithfulness repair updates summaries without full regeneration.
- Repair times reduced to under half a second using one-step repair.
- StreamSum benchmark introduced for evaluating evolving-context summarization.
- Faithfulness-steered repair improves early drafts significantly.
- Framework allows trade-offs between faithfulness, speed, and preservation.
Paper Resources
Article Content
From source RSS / original summaryarXiv:2606. 12807v1 Announce Type: new Abstract: Summaries of real-world events can become outdated as contexts evolve and new information arrives. A common response is to generate a new summary from the updated context, but full regeneration discards the previous draft, can obscure what changed, and may be unnecessary when only a few claims are unsupported. We study localized faithfulness repair: updating outdated spans in an existing summary while preserving supported content.
We propose DETECT-REMASK-REPAIR, a diffusion-based framework that identifies, remasks, and repairs outdated regions with masked diffusion language models. To evaluate evolving-context summarization, we introduce StreamSum, a benchmark of synthetic event timelines.
Experiments on DialogSum and StreamSum show that localized diffusion repair provides a controllable alternative to full rewriting: faithfulness-steered repair improves early drafts, one-step repair reduces repair cost to under half a second, with the framework enabling faithfulness-speed-preservation tradeoffs across datasets. We also find that the framework can provide a post-hoc correction step that improves faithfulness for autoregressive systems.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The REFLECT benchmark reveals that current LLM judges are unreliable, achieving below 55% accuracy in evaluating reasoning and evidence use, highlighting the need for improved evaluation methods for deep research agents.