Toward Robust In-Context Learning: Leveraging Out-of-distribution Proxies for Target Inaccessible Demonstration Retrieval

arXiv cs.CL·Hao Xu, Rite Bo, Fausto Giunchiglia, Yingji Li, Rui Song

6/2/2026

·~2 min·6/2/2026·en·1

Quick Answer

The DOPA framework enhances Large Language Models' (LLMs) robustness in Out-of-Distribution (OOD) tasks by utilizing an OOD proxy for demonstration retrieval.

Quick Take

The DOPA framework enhances Large Language Models' (LLMs) robustness in Out-of-Distribution (OOD) tasks by utilizing an OOD proxy for demonstration retrieval. This approach, validated across multiple LLMs, incorporates a Mahalanobis distance-based diversity constraint, significantly improving inference capabilities when target domains are inaccessible.

Key Points

DOPA uses an Out-of-Distribution proxy to approximate inaccessible target domains.
Incorporates a Mahalanobis distance-based constraint for demonstration diversity.
Experimental results show improved robustness in OOD settings across multiple LLMs.
Addresses challenges in evaluating unknown distributions for demonstration retrieval.
Enhances inference capabilities of LLMs in practical scenarios.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

📖 Reader Mode

~2 min read

[Submitted on 13 Apr 2026]

View PDF HTML (experimental)

Abstract:Although studies have demonstrated that Large Language Models (LLMs) can perform well on Out-of-Distribution (OOD) tasks, their advantage tends to diminish as the distribution shift becomes more severe. Consequently, researchers aim to retrieve distributionally similar and informative demonstrations from the available source domain to boost the inference capabilities of LLMs. However, in practical scenarios where the target domain is inaccessible, evaluating the unknown distribution is challenging, which indirectly impacts the quality of the selected demonstrations. To address this problem, we propose \textbf{DOPA}, a demonstration search framework that incorporates an OOD proxy to approximate the inaccessible target domain and guide the retrieval process. Building on proxy-based evaluation, DOPA further introduces a Mahalanobis distance-based global diversity constraint to ensure sufficient diversity among the retrieved demonstrations. Experimental results on multiple LLMs and tasks demonstrate that DOPA effectively enhances robustness in OOD settings\footnote{this https URL\_code}.

Comments:	Accepted by ACL 2026 main
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2606.00014 [cs.CL]
	(or arXiv:2606.00014v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2606.00014 arXiv-issued DOI via DataCite

Submission history

From: Rui Song [view email]
[v1] Mon, 13 Apr 2026 10:22:52 UTC (15,228 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CL

See more →

arXiv cs.CL·Miguel Arana-Catania, Catherine Conisbee, Matthew Kidd

4d ago

FeaturedOriginal

Letting the Data Speak: Extracting Keywords from Crowdsourced Collections with AI

AI Summary

The study evaluates three NLP approaches—Named Entity Recognition, Keyword Extraction, and Topic Modelling—using the Their Finest Hour Online Archive to automate keyword extraction from crowdsourced WWII collections. Findings suggest that while NLP methods show promise, no single approach is sufficient, and ethical considerations in automated keyword extraction are crucial for responsible stewardship.

#AI Coding #Inference #Open Source #Policy

Toward Robust In-Context Learning: Leveraging Out-of-distribution Proxies for Target Inaccessible Demonstration Retrieval

Quick Answer

Quick Take

Key Points

Paper Resources

📖 Reader Mode

Submission history

Want this in your inbox every morning?

More from arXiv cs.CL

Letting the Data Speak: Extracting Keywords from Crowdsourced Collections with AI

Quantifying Prior Dominance in Systems

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

Quick Answer

Quick Take

Key Points

Paper Resources

📖 Reader Mode

Submission history

Want this in your inbox every morning?

More from arXiv cs.CL

Letting the Data Speak: Extracting Keywords from Crowdsourced Collections with AI

Quantifying Prior Dominance in RAG Systems

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

Quantifying Prior Dominance in Systems