Decoupled Mixture-of-Experts for Parametric Knowledge Injection

arXiv cs.CL·Baoqing Yue, Weihang Su, Qingyao Ai, Yichen Tang, Changyue Wang, Jiacheng Kang, Jingtao Zhan, Yiqun Liu

6h ago

·~1 min·6/15/2026·en·0

Quick Answer

This paper shows that The Decoupled Mixture-of-Experts (DMoE) architecture enhances large language models by modularizing knowledge injection, improving answer quality on knowledge-intensive benchmarks while avoiding catastrophic forgetting.

Quick Take

The Decoupled Mixture-of-Experts (DMoE) architecture enhances large language models by modularizing knowledge injection, improving answer quality on knowledge-intensive benchmarks while avoiding catastrophic forgetting. DMoE uses independently updatable expert modules activated by a lightweight router, outperforming retrieval and adapter-based methods.

Key Points

DMoE decouples experts and routers from the base model for flexible knowledge integration.
The architecture allows independent updates of expert modules from external knowledge corpora.
A lightweight uncertainty-aware router activates relevant experts only when needed.
DMoE improves answer quality consistently over retrieval and adapter-based baselines.
Efficient auto-regressive inference is supported by attaching experts to the final-layer network.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Article Content

From source RSS / original summary

arXiv:2606. 14243v1 Announce Type: new Abstract: Knowledge injection aims to equip large language models (LLMs) with external, domain-specific, or time-sensitive knowledge.

Existing approaches typically face a trade-off between flexibility and integration: keeps knowledge outside the model but only provides prompt-level augmentation, whereas post-training based methods encode new knowledge into shared parameters but may introduce catastrophic forgetting, knowledge conflict, and costly updates.

In this paper, we propose Decoupled Mixture-of-Experts (DMoE), a modular architecture for parametric knowledge injection that decouples both experts and the router from the base model. DMoE converts external knowledge corpora into independently updatable expert modules and uses a lightweight uncertainty-aware router to activate relevant experts only when the base model lacks sufficient knowledge during generation.

To support efficient auto-regressive inference, DMoE attaches experts only to the final-layer feed-forward network, preserving KV-cache reuse while enabling parameter-level knowledge augmentation. Experiments on knowledge-intensive benchmarks show that DMoE consistently improves answer quality over retrieval and adapter-based baselines.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CL

See more →

arXiv cs.CL·Leyao Wang, Yanan He, Peng Chen, Asaf Yehudai, Yixin Liu, Rex Ying, Michal Shmueli-Scheuer, Arman Cohan

3w ago

FeaturedOriginal

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

AI Summary

The REFLECT benchmark reveals that current LLM judges are unreliable, achieving below 55% accuracy in evaluating reasoning and evidence use, highlighting the need for improved evaluation methods for deep research agents.

#LLM #Agent #Inference #Policy