Self-Generated Error Training for Token Editing in Diffusion Language Models
Quick Answer
This paper shows that The self-generated T2T editing method enhances LLaDA2.1's performance by addressing training-inference mismatches, improving accuracy while reducing edit intensity.
Quick Take
The self-generated T2T editing method enhances LLaDA2.1's performance by addressing training-inference mismatches, improving accuracy while reducing edit intensity. This approach involves a no-gradient draft pass and a recovery supervision pass, leading to fewer transcription errors and excessive self-corrections in generated outputs.
Key Points
- Introduces self-generated T2T editing for LLaDA2.1, improving accuracy.
- Addresses training-inference mismatch by using model-generated corruptions.
- Reduces T2T edit intensity, minimizing final-digit transcription errors.
- Implemented as a short LoRA continued-pretraining pass.
- Evaluated on multiple benchmarks with unchanged inference parameters.
Paper Resources
Article Excerpt
From source RSS / original summaryarXiv:2606. 17175v1 Announce Type: new Abstract: Token-to-token (T2T) editing lets LLaDA2. 1 revise committed tokens during block-diffusion decoding. The released recipe trains this editor on random vocabulary corruptions, but at inference the editor sees the model's own fluent, high-confidence draft errors instead.
We study this training-inference mismatch and propose self-generated T2T, which performs a no-gradient draft pass, fills masked positions with predicted tokens, and supervises recovery in a second pass under these self-generated corruptions. We implement the update as a short LoRA continued-pretraining pass on LLaDA2. 1-mini and evaluate on several benchmarks under the official Q-Mode T2T procedure with unchanged inference parameters.
The method generally improves accuracy while reducing T2T edit intensity, mitigating failure modes such as final-digit transcription errors after otherwise correct reasoning and excessive self-correction before short factual answers.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The REFLECT benchmark reveals that current LLM judges are unreliable, achieving below 55% accuracy in evaluating reasoning and evidence use, highlighting the need for improved evaluation methods for deep research agents.