CONCORD: Asynchronous Sparse Aggregation for Device-Cloud RAG… | AI Deep Signal

CONCORD: Asynchronous Sparse Aggregation for Device-Cloud RAG under Document Isolation

arXiv cs.AI·Xuedong Hu, Zhiqing Tang, Zhi Yao, Tian Wang, Weijia Jia

6/16/2026

·~2 min·6/16/2026·en·4

Quick Answer

CONCORD introduces an asynchronous sparse aggregation framework for device-cloud retrieval-augmented generation (RAG) under document isolation, improving throughput by 1.66x and 2.15x on Natural Questions and WikiText-2 benchmarks, respectively.

Quick Take

It reduces per-token communication significantly while maintaining answer quality.

Key Points

CONCORD operates under a dual-end setting with document isolation.
It improves end-to-end throughput by 1.66x and 2.15x on specific benchmarks.
The framework reduces per-token communication by over two orders of magnitude.
Waiting debt control optimizes remote participation during decoding steps.
Maintains comparable answer quality and perplexity to existing methods.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Source Excerpt

arXiv:2606. 15179v1 Announce Type: new Abstract: (RAG) has emerged as a pivotal technique for improving language models by incorporating external knowledge at inference time. As device-cloud collaborative inference makes it feasible to deploy small language models on edge devices, a new setting arises in which private documents remain on the device and public knowledge resides in the cloud.

Privacy and policy constraints often forbid raw document exchange, creating a document-isolated dual-end RAG setting. …

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.AI

See more →

arXiv cs.AI·Ji Wu, Yunshan Peng, Wentao Bai, Yunke Bai, Wenzheng Shu, Jinan Pang, Yanxiang Zeng, Xialong Liu

2d ago

FeaturedOriginal

HOBA: Hierarchical On-Policy Bidding Agents for Adaptive Online Advertising

AI Summary

HOBA (Hierarchical On-policy Bidding Agents) is a novel hierarchical reinforcement learning framework that enhances online advertising bidding systems by improving adaptability and reducing hyperparameter tuning costs. It utilizes a for hyperparameter inference, a SARSA agent for expert model selection, and a dynamic expert pool for bid execution, achieving a +3.6% increase in target cost during large-scale deployment and outperforming state-of-the-art baselines on AuctionNet.

#LLM #Agent #Inference #AI Startup

CONCORD: Asynchronous Sparse Aggregation for Device-Cloud RAG under Document Isolation

Quick Answer

Quick Take

Key Points

Paper Resources

Source Excerpt

Want this in your inbox every morning?

More from arXiv cs.AI

HOBA: Hierarchical On-Policy Bidding Agents for Adaptive Online Advertising

AINTMA: Agentic AI Architecture for Autonomous Test Management with Generative Intelligence, Secure Cloud Communication and Adaptive Quality Analytics

RAIL Guard: Closing the Evaluation-to-Remediation Gap in Responsible AI for Agents

Quick Answer

Quick Take

Key Points

Paper Resources

Source Excerpt

Want this in your inbox every morning?

More from arXiv cs.AI

HOBA: Hierarchical On-Policy Bidding Agents for Adaptive Online Advertising

AINTMA: Agentic AI Architecture for Autonomous Test Management with Generative Intelligence, Secure Cloud Communication and Adaptive Quality Analytics

RAIL Guard: Closing the Evaluation-to-Remediation Gap in Responsible AI for LLM Agents

RAIL Guard: Closing the Evaluation-to-Remediation Gap in Responsible AI for Agents