Routing-Aware Expert Calibration for Machine Unlearning in Mixture-of-Experts Language Models
Quick Answer
The TRACE method enhances machine unlearning in Mixture-of-Experts models by addressing forget-retain routing mismatches, achieving a 9% relative utility improvement over existing methods on benchmarks like WMDP and MUSE-BOOKS.
Quick Take
The TRACE method enhances machine unlearning in Mixture-of-Experts models by addressing forget-retain routing mismatches, achieving a 9% relative utility improvement over existing methods on benchmarks like WMDP and MUSE-BOOKS.
Key Points
- TRACE detects forget-critical experts using offline activation statistics.
- The method reweights retain losses to match forget-side activations.
- Experiments show improved forget-utility trade-off across multiple MoE LLMs.
- Achieved best performance on three out of four MUSE-BOOKS metrics.
Paper Resources
Article Content
From source RSS / original summaryarXiv:2606. 10338v1 Announce Type: new Abstract: Machine unlearning is increasingly important for large language models, yet unlearning in Mixture-of-Experts (MoE) architectures remains underexplored. Unlike dense models, MoE architectures employ a router at each layer to assign each token to a sparse subset of experts. In this work, we observe that forget data often activates a small subset of experts disproportionately, while these experts may receive much weaker activation from retain data.
This forget--retain routing mismatch can leave forget-critical experts under-regularized during unlearning. To address this, we propose \textbf{TRACE}, Targeted Routing-Aware Calibration of Experts, for MoE unlearning. TRACE first detects forget-critical experts from offline activation statistics, and then calibrates retain regularization by reweighting token-level retain losses so that each selected expert's retain-side activation frequency better matches its forget-side counterpart.
Experiments on WMDP and MUSE-BOOKS across multiple MoE LLMs show that TRACE consistently improves the forget-utility trade-off, yielding a 9\% relative utility improvement over the strongest baseline under comparable forgetting quality and the best performance on three out of four MUSE-BOOKS metrics.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The REFLECT benchmark reveals that current LLM judges are unreliable, achieving below 55% accuracy in evaluating reasoning and evidence use, highlighting the need for improved evaluation methods for deep research agents.