In-Context Optimization for Retrieval-Augmented Generation: A Gradient-Descent Perspective

arXiv cs.CL·Mingchen Li, Jiatan Huang, Chuxu Zhang, Liang Zhao, Hong Yu

5/27/2026

·~2 min·5/27/2026·en·5

Quick Answer

This study presents a novel in-context optimization approach for Retrieval-Augmented Generation (RAG), demonstrating that a single linear self-attention layer can perform a gradient-descent step on a unified RAG objective.

Quick Take

This study presents a novel in-context optimization approach for (RAG), demonstrating that a single linear self-attention layer can perform a gradient-descent step on a unified RAG objective. The method enhances performance across seven QA benchmarks, achieving improvements over a shared-interface baseline while maintaining low per-query costs.

Key Points

One linear self-attention layer can implement a gradient-descent step for RAG.
The method improves performance on seven QA benchmarks with two retrievers.
It achieves better results than a shared-interface baseline at lower costs.
The approach adapts interaction between queries and retrieved evidence effectively.
Stability under linear extensions, but feature-distribution dependency in nonlinear architectures.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Article Content

From source RSS / original summary

arXiv:2605. 26356v1 Announce Type: new Abstract: In-context learning has recently been linked to implicit gradient descent in linear self-attention models, suggesting that context can induce a forward-pass update. (RAG) also relies on context, but retrieved documents are usually treated as static evidence rather than signals for adaptation. We study RAG as an in-context optimization process.

First, we show that one linear self-attention layer can implement one gradient-descent step on a unified linearized RAG objective covering both projection-based and dot-product retrieval interfaces. This gives an exact regime where retrieval-augmented prediction and in-context optimization coincide. We use this result not as a literal model of LLM computation, but as a guide for adapting the interaction between queries and retrieved evidence.

We then test the boundary of this correspondence: it remains stable under controlled linear extensions, but becomes feature-distribution dependent under nonlinear architectures. Finally, we turn this view into a lightweight method for frozen RAG LLMs. The method keeps the retriever and backbone fixed, and predicts a context-conditioned update to a generator-side evidence-use interface.

Across seven QA benchmarks, two retrievers, and two frozen LLM backbones, this forward-only update improves a shared-interface baseline, transfers to held-out tasks, and approaches test-time gradient adaptation at much lower per-query cost.

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CL

See more →

arXiv cs.CL·Barak Or

2w ago

FeaturedOriginal

Quantifying Prior Dominance in Systems

AI Summary

The study introduces the Normalized Context Utilization (NCU) metric to evaluate Retrieval-Augmented Generation (RAG) systems, revealing that Small Language Models (SLMs) outperform larger models in factual extraction. The findings indicate that traditional scaling laws yield diminishing returns, with a commercial API frequently failing against adversarial evidence due to systemic confidence collapse.

#LLM #AI Coding #Inference #AI Startup

In-Context Optimization for Retrieval-Augmented Generation: A Gradient-Descent Perspective

Quick Answer

Quick Take

Key Points

Paper Resources

Article Content

Want this in your inbox every morning?

More from arXiv cs.CL

Quantifying Prior Dominance in Systems

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

From Solvers to Research: Large Language Model-Driven Formal Mathematics at the Research Frontier

Quick Answer

Quick Take

Key Points

Paper Resources

Article Content

Want this in your inbox every morning?

More from arXiv cs.CL

Quantifying Prior Dominance in RAG Systems

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

From Solvers to Research: Large Language Model-Driven Formal Mathematics at the Research Frontier

Quantifying Prior Dominance in Systems