DLLG: Dynamic Logit-Level Gating of LLM Experts

arXiv cs.CL·Bingnan Li, Zhaoyang Zhang, Xiaoze Liu, Yantao Shen, Shuli Jiang, Shuo Yang, Wei Xia, Zhuowen Tu, Stefano Soatto

3h ago

·~1 min·6/4/2026·en·0

Quick Take

DLLG (Dynamic Logit-Level Gating) enhances LLM performance by dynamically learning token-level expert fusion without requiring token-level labels. It outperforms existing methods like heuristic ensembling and parameter merging across various reasoning and code benchmarks, demonstrating a robust approach to integrating specialized models.

Key Points

DLLG learns token-level expert fusion from sparse response-level supervision.
It predicts step-wise fusion weights using a lightweight gating module.
DLLG consistently outperforms heuristic ensembling and parameter merging.
The framework is robust and scalable across various model scales.
Demonstrated effectiveness on diverse reasoning and code benchmarks.

Article Excerpt

From source RSS / original summary

arXiv:2606. 04378v1 Announce Type: new Abstract: Leveraging multiple specialized LLMs can combine complementary strengths, but existing approaches trade adaptability for stability: routing commits prematurely, heuristic ensembling depends on fragile proxies, and parameter merging introduces interference. We propose DLLG (Dynamic Logit-Level Gating), a dynamic logit-level ensembling framework that learns token-level expert fusion from sparse response-level supervision.

A lightweight gating module predicts step-wise fusion weights, linking trajectory-level correctness to generation without token-level labels or expert retraining. Across diverse reasoning and code benchmarks, DLLG consistently outperforms strong routing, heuristic ensembling, and parameter-merging baselines across model scales, highlighting learned logit-level fusion as a robust and scalable paradigm for integrating specialized experts.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CL

See more →

arXiv cs.CL·Leyao Wang, Yanan He, Peng Chen, Asaf Yehudai, Yixin Liu, Rex Ying, Michal Shmueli-Scheuer, Arman Cohan

2w ago

FeaturedOriginal

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

AI Summary

The REFLECT benchmark reveals that current LLM judges are unreliable, achieving below 55% accuracy in evaluating reasoning and evidence use, highlighting the need for improved evaluation methods for deep research agents.

#LLM #Agent #Inference #Policy