Multilingual Steering by Design: Multilingual Sparse Autoencoders and Principled Layer Selection

arXiv cs.CL·Yusser Al Ghussin, Daniil Gurgurov, Tanja Baeumel, Josef van Genabith, Patrick Schramowski, Simon Ostermann

5/25/2026

·~1 min·5/25/2026·en·1

Quick Answer

The study introduces multilingual sparse autoencoders (SAEs) that enhance cross-lingual representations and improve language control in large language models like LLaMA-3.1-8B and Gemma-2-9B.

Quick Take

The study introduces multilingual sparse autoencoders (SAEs) that enhance cross-lingual representations and improve language control in large language models like LLaMA-3.1-8B and Gemma-2-9B. By employing a principled layer-selection rule, the method stabilizes the balance between language identification accuracy and generation quality, as evidenced by improved SpBLEU and ROUGE-L scores in machine translation and cross-lingual summarization tasks.

Key Points

Multilingual SAEs trained on diverse data improve language control reliability.
A new layer-selection rule predicts effective intervention depths without exhaustive searches.
Evaluated on LLaMA-3.1-8B and Gemma-2-9B with positive results in SpBLEU and ROUGE-L.
Stabilizes trade-off between language identification accuracy and generation quality.
Applicable to machine translation and cross-lingual summarization tasks.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Article Content

From source RSS / original summary

arXiv:2605. 23036v1 Announce Type: new Abstract: Sparse autoencoders (SAEs) enable feature-level mechanistic interpretability and activation steering in large language models (LLMs), but SAE-based language control remains unreliable in multilingual settings: most SAEs are trained on English-only data, and steering layers are chosen heuristically. We address these limitations by advancing a principled, mechanistic account of multilingual language steering with SAEs.

First, we show that training SAEs on multilingual data consistently strengthens cross-lingual representations and yields more reliable, quality-preserving language control across layers and model families. Second, we introduce an \emph{a priori} steering layer-selection rule based on the intersection of multilingual alignment and language separability, which predicts effective intervention depths without exhaustive layerwise search. We evaluate our approach on LLaMA-3.

1-8B and Gemma-2-9B across machine translation and cross-lingual summarization (CrossSumm), using SpBLEU, ROUGE-L, COMET, and LaSE. Our results show that multilingual SAEs combined with intersection-selected layers stabilize the trade-off between language identification accuracy and generation quality, providing a principled, predictive, representation-level account of multilingual SAE steering.

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CL

See more →

arXiv cs.CL·Miguel Arana-Catania, Catherine Conisbee, Matthew Kidd

1d ago

FeaturedOriginal

Letting the Data Speak: Extracting Keywords from Crowdsourced Collections with AI

AI Summary

The study evaluates three NLP approaches—Named Entity Recognition, Keyword Extraction, and Topic Modelling—using the Their Finest Hour Online Archive to automate keyword extraction from crowdsourced WWII collections. Findings suggest that while NLP methods show promise, no single approach is sufficient, and ethical considerations in automated keyword extraction are crucial for responsible stewardship.

#AI Coding #Inference #Open Source #Policy

Multilingual Steering by Design: Multilingual Sparse Autoencoders and Principled Layer Selection

Quick Answer

Quick Take

Key Points

Paper Resources

Article Content

Want this in your inbox every morning?

More from arXiv cs.CL

Letting the Data Speak: Extracting Keywords from Crowdsourced Collections with AI

Quantifying Prior Dominance in Systems

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

Quick Answer

Quick Take

Key Points

Paper Resources

Article Content

Want this in your inbox every morning?

More from arXiv cs.CL

Letting the Data Speak: Extracting Keywords from Crowdsourced Collections with AI

Quantifying Prior Dominance in RAG Systems

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

Quantifying Prior Dominance in Systems