Decoupled Mixture-of-Experts for Parametric Knowledge Injection
Quick Answer
This paper shows that The Decoupled Mixture-of-Experts (DMoE) architecture enhances large language models by modularizing knowledge injection, improving answer quality on knowledge-intensive benchmarks while avoiding catastrophic forgetting.
Quick Take
The Decoupled Mixture-of-Experts (DMoE) architecture enhances large language models by modularizing knowledge injection, improving answer quality on knowledge-intensive benchmarks while avoiding catastrophic forgetting. DMoE uses independently updatable expert modules activated by a lightweight router, outperforming retrieval and adapter-based methods.
Key Points
- DMoE decouples experts and routers from the base model for flexible knowledge integration.
- The architecture allows independent updates of expert modules from external knowledge corpora.
- A lightweight uncertainty-aware router activates relevant experts only when needed.
- DMoE improves answer quality consistently over retrieval and adapter-based baselines.
- Efficient auto-regressive inference is supported by attaching experts to the final-layer network.
Paper Resources
Article Content
From source RSS / original summaryarXiv:2606. 14243v1 Announce Type: new Abstract: Knowledge injection aims to equip large language models (LLMs) with external, domain-specific, or time-sensitive knowledge.
Existing approaches typically face a trade-off between flexibility and integration: keeps knowledge outside the model but only provides prompt-level augmentation, whereas post-training based methods encode new knowledge into shared parameters but may introduce catastrophic forgetting, knowledge conflict, and costly updates.
In this paper, we propose Decoupled Mixture-of-Experts (DMoE), a modular architecture for parametric knowledge injection that decouples both experts and the router from the base model. DMoE converts external knowledge corpora into independently updatable expert modules and uses a lightweight uncertainty-aware router to activate relevant experts only when the base model lacks sufficient knowledge during generation.
To support efficient auto-regressive inference, DMoE attaches experts only to the final-layer feed-forward network, preserving KV-cache reuse while enabling parameter-level knowledge augmentation. Experiments on knowledge-intensive benchmarks show that DMoE consistently improves answer quality over retrieval and adapter-based baselines.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The REFLECT benchmark reveals that current LLM judges are unreliable, achieving below 55% accuracy in evaluating reasoning and evidence use, highlighting the need for improved evaluation methods for deep research agents.