When More Documents Hurt RAG: Mitigating Vector Search Dilution with Domain-Scoped, Model-Agnostic Retrieval
Quick Answer
The study introduces MASDR-RAG, addressing vector search dilution in retrieval-augmented generation (RAG) by using domain-scoped metadata, improving P@10 from 0.77 to 0.86 across various LLMs and datasets.
Quick Take
The study introduces MASDR-, addressing vector search dilution in retrieval-augmented generation (RAG) by using domain-scoped metadata, improving P@10 from 0.77 to 0.86 across various LLMs and datasets. This method mitigates accuracy loss when scaling document collections, as demonstrated in a Wyoming DOT corpus, where accuracy dropped from 75% to below 40% when increasing documents from 54 to 1,128. The findings suggest prioritizing domain scoping before synthesis calls.
Key Points
- MASDR-RAG improves retrieval accuracy in large document collections by using domain-scoped metadata.
- Accuracy dropped from 75% to below 40% when scaling from 54 to 1,128 documents.
- P@10 improved significantly from 0.77 to 0.86 with domain scoping ($p < 0.05$).
- Multi-agent orchestration revealed a precision-faithfulness paradox due to configuration dependence.
- Code and data will be publicly available upon acceptance of the paper.
Paper Resources
Article Content
From source RSS / original summaryarXiv:2606. 11350v1 Announce Type: new Abstract: degrades when scaled to large, heterogeneous document collections, where dense similarity loses discriminative power, and top-k retrieval increasingly returns semantically similar but contextually incorrect chunks. We refer to this failure mode as vector search dilution.
Even when using hybrid dense+sparse retrieval, we observed this firsthand in a deployed Wyoming Department of Transportation corpus, where scaling from 54 to 1,128 documents (88,907 chunks) reduced accuracy from 75% to below 40%. To address this dilution, we propose MASDR-RAG ( Multi-Agent Scoped Domain Retrieval for RAG) and evaluate it on 200 expert-validated queries across five LLM backbones, six corpora, and two index stacks.
Our results indicate that domain scoping using organizational metadata is the key fix, significantly improving P@10 from 0. 77 to 0. 86 ($p < 0. 05$). Furthermore, our investigation of multi-agent orchestration revealed that a high degree of configuration dependence results --creating what we call the precision-faithfulness paradox.
Based on these varied outcomes, our practical recommendation is simple: scope first, then perform a single synthesis call, reserving full multi-agent orchestration for genuinely multi-domain corpora paired with native-tool-call backbones. Code and Data will be made public upon acceptance.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The REFLECT benchmark reveals that current LLM judges are unreliable, achieving below 55% accuracy in evaluating reasoning and evidence use, highlighting the need for improved evaluation methods for deep research agents.