Multilingual Steering by Design: Multilingual Sparse Autoencoders and Principled Layer Selection
Quick Take
The study introduces multilingual sparse autoencoders (SAEs) that enhance cross-lingual representations and improve language control in large language models like LLaMA-3.1-8B and Gemma-2-9B. By employing a principled layer-selection rule, the method stabilizes the balance between language identification accuracy and generation quality, as evidenced by improved SpBLEU and ROUGE-L scores in machine translation and cross-lingual summarization tasks.
Key Points
- Multilingual SAEs trained on diverse data improve language control reliability.
- A new layer-selection rule predicts effective intervention depths without exhaustive searches.
- Evaluated on LLaMA-3.1-8B and Gemma-2-9B with positive results in SpBLEU and ROUGE-L.
- Stabilizes trade-off between language identification accuracy and generation quality.
- Applicable to machine translation and cross-lingual summarization tasks.
Article Content
From source RSS / original summaryarXiv:2605. 23036v1 Announce Type: new Abstract: Sparse autoencoders (SAEs) enable feature-level mechanistic interpretability and activation steering in large language models (LLMs), but SAE-based language control remains unreliable in multilingual settings: most SAEs are trained on English-only data, and steering layers are chosen heuristically. We address these limitations by advancing a principled, mechanistic account of multilingual language steering with SAEs.
First, we show that training SAEs on multilingual data consistently strengthens cross-lingual representations and yields more reliable, quality-preserving language control across layers and model families. Second, we introduce an \emph{a priori} steering layer-selection rule based on the intersection of multilingual alignment and language separability, which predicts effective intervention depths without exhaustive layerwise search. We evaluate our approach on LLaMA-3.
1-8B and Gemma-2-9B across machine translation and cross-lingual summarization (CrossSumm), using SpBLEU, ROUGE-L, COMET, and LaSE. Our results show that multilingual SAEs combined with intersection-selected layers stabilize the trade-off between language identification accuracy and generation quality, providing a principled, predictive, representation-level account of multilingual SAE steering.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The reliability of LLM judges for evaluating deep research agents is critically assessed using the REFLECT benchmark.