Discrete Diffusion Language Models for Interactive Radiology Report Drafting
Quick Answer
This paper shows that The DiffusionGemma-26B model outperforms its autoregressive counterpart Gemma-4-26B in medical visual question answering, achieving faster decoding and superior drafting capabilities.
Quick Take
The DiffusionGemma-26B model outperforms its autoregressive counterpart Gemma-4-26B in medical visual question answering, achieving faster decoding and superior drafting capabilities. This diffusion model allows radiologists to infill report fragments bidirectionally, addressing inconsistencies in clinical reports.
Key Points
- DiffusionGemma-26B matches or exceeds AR performance on all tested datasets.
- The finetuned model operates 3.5-4.4x faster than autoregressive models.
- Diffusion models enable any-order infill, enhancing report drafting.
- Medical foundation models remain predominantly autoregressive despite advancements.
- Results are evaluated by a verbosity-robust LLM judge.
Paper Resources
Article Excerpt
From source RSS / original summaryarXiv:2607. 01436v1 Announce Type: new Abstract: Diffusion language models, which generate text by denoising a token canvas bidirectionally instead of emitting tokens left to right, have become competitive with autoregressive (AR) generation. Medical foundation models, however, remain almost entirely autoregressive.
We adapt a mixture-of-experts diffusion language model, DiffusionGemma-26B, and benchmark it against its same-size AR sibling Gemma-4-26B under an identical LoRA recipe on medical visual question answering datasets, scored by a verbosity-robust LLM judge. Diffusion matches or exceeds AR on all of them, and the finetuned model (3. 8B active) is competitive with frontier ; its decoding is also 3. 5-4. 4x faster.
Beyond this parity, the diffusion model offers a drafting capability AR lacks: any-order infill. Because the canvas is denoised bidirectionally, a radiologist can fix report fragments and have the model fill the text between them, an operation inherent to diffusion but not to autoregression, which is subpar at it. This suits real reports, which are often terse or inconsistent across clinicians and institutions.
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.AI
See more →Procedural Memory Distillation: Online Reflection for Self-Improving Language Models
Procedural Memory Distillation (PMD) enhances reinforcement learning by converting cross-episode signals into reusable memory, improving Qwen3-8B and OLMo3-Instruct-7B models by 3.8-5.5% on SCIKNOWEVAL and 7.9-13.6% on . The co-evolution of policy and memory allows for more effective self-supervision, demonstrating significant performance gains when both components are active.