Where to Place the Query? Unveiling and Mitigating Positional Bias in In-Context Learning for Diffusion LLMs via Decoding Dynamics
Quick Answer
This paper reveals that query position is a critical variable in diffusion large language models (dLLMs), impacting generation quality significantly.
Quick Take
This paper reveals that query position is a critical variable in diffusion large language models (dLLMs), impacting generation quality significantly. It introduces Average Confidence ($\overline{C}$) as a new metric for iterative decoding and proposes Auto-ICL, an adaptive routing strategy that optimizes query placement, achieving near-oracle performance across various tasks.
Key Points
- Positional variance in dLLMs affects generation quality comparably to semantic quality.
- Traditional single-step confidence metrics are ineffective in dLLMs.
- Average Confidence ($\overline{C}$) tracks iterative decoding for better performance.
- Auto-ICL dynamically optimizes query placement without requiring training.
- The study highlights the importance of bidirectional attention in dLLMs.
Paper Resources
Article Content
From source RSS / original summaryarXiv:2606. 19349v1 Announce Type: new Abstract: While In-Context Learning (ICL) is extensively studied in Autoregressive (AR) LLMs, its mechanism within Diffusion Large Language Models (dLLMs) remains largely unexplored. Unlike AR models restricted by unidirectional causal masking, dLLMs intrinsically utilize bidirectional attention, offering extensive spatial flexibility for query placement.
Unfortunately, current practices conventionally inherit AR-style trailing-query templates, often overlooking the structural paradigm shift. This paper presents a comprehensive analysis unveiling that query position is actually a first-order variable in dLLMs. Through empirical decoupling, we demonstrate that positional variance impacts generation quality on par with example semantic quality.
Internally, this positional sensitivity stems from a spatial ``Recency Effect'' in attention flow and task-dependent shifts in decoding trajectories. To mitigate this instability without ground-truth labels, we reveal that traditional single-step confidence ($C_{decoded}$) fails in dLLMs. Instead, we propose Average Confidence ($\overline{C}$), a novel metric tracking the iterative decoding process.
By establishing the foundational spatial ICL baselines, we introduce Auto-ICL, a training-free adaptive routing strategy that dynamically optimizes query placement, robustly approaching oracle performance across heterogeneous reasoning and perception tasks.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The REFLECT benchmark reveals that current LLM judges are unreliable, achieving below 55% accuracy in evaluating reasoning and evidence use, highlighting the need for improved evaluation methods for deep research agents.