Toward Robust In-Context Learning: Leveraging Out-of-distribution Proxies for Target Inaccessible Demonstration Retrieval
Quick Take
The DOPA framework enhances Large Language Models' (LLMs) robustness in Out-of-Distribution (OOD) tasks by utilizing an OOD proxy for demonstration retrieval. This approach, validated across multiple LLMs, incorporates a Mahalanobis distance-based diversity constraint, significantly improving inference capabilities when target domains are inaccessible.
Key Points
- DOPA uses an Out-of-Distribution proxy to approximate inaccessible target domains.
- Incorporates a Mahalanobis distance-based constraint for demonstration diversity.
- Experimental results show improved robustness in OOD settings across multiple LLMs.
- Addresses challenges in evaluating unknown distributions for demonstration retrieval.
- Enhances inference capabilities of LLMs in practical scenarios.
Article Excerpt
From source RSS / original summaryarXiv:2606. 00014v1 Announce Type: new Abstract: Although studies have demonstrated that Large Language Models (LLMs) can perform well on Out-of-Distribution (OOD) tasks, their advantage tends to diminish as the distribution shift becomes more severe. Consequently, researchers aim to retrieve distributionally similar and informative demonstrations from the available source domain to boost the inference capabilities of LLMs.
However, in practical scenarios where the target domain is inaccessible, evaluating the unknown distribution is challenging, which indirectly impacts the quality of the selected demonstrations. To address this problem, we propose \textbf{DOPA}, a demonstration search framework that incorporates an OOD proxy to approximate the inaccessible target domain and guide the retrieval process.
Building on proxy-based evaluation, DOPA further introduces a Mahalanobis distance-based global diversity constraint to ensure sufficient diversity among the retrieved demonstrations. Experimental results on multiple LLMs and tasks demonstrate that DOPA effectively enhances robustness in OOD settings\footnote{https://github. com/bort64/ood\_code}.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The REFLECT benchmark reveals that current LLM judges are unreliable, achieving below 55% accuracy in evaluating reasoning and evidence use, highlighting the need for improved evaluation methods for deep research agents.