Code-Switching Reveals Language Anchoring in Multilingual LLMs
Quick Answer
This paper shows that Multilingual Large Language Models (MLLMs) struggle with Code-Switched (CS) inputs, showing performance degradation due to Anchor Bias.
Quick Take
Multilingual Large Language Models (MLLMs) struggle with Code-Switched (CS) inputs, showing performance degradation due to Anchor Bias. The proposed CANVAS intervention improves Question Answering (QA) F1 scores across various MLLMs by aligning target-language hidden states with source anchors during inference.
Key Points
- Anchor Bias quantifies language anchoring in MLLMs, revealing a grammar-frame effect.
- Source-framed CS maintains source anchoring, while target-framed CS shows greater QA degradation.
- CANVAS intervention effectively recovers QA F1 scores across diverse MLLMs and CS conditions.
- Internal anchoring signals can mitigate CS inference failures in multilingual models.
Paper Resources
Article Content
From source RSS / original summaryarXiv:2606. 19668v1 Announce Type: new Abstract: Multilingual Large Language Models (MLLMs) are increasingly expected to handle Code-Switched (CS) inputs, yet mixing languages frequently degrades performance relative to source- or target-language monolingual counterparts. To understand this degradation, we use grammar-forced CS as a controlled diagnostic setting for locating CS representations relative to their source and target counterparts.
We introduce Anchor Bias, a geometric measure that quantifies language anchoring, whether a CS hidden state aligns closer to its source or target language counterpart. Across diverse MLLMs, Anchor Bias reveals a consistent grammar-frame effect: source-framed CS stays source-anchored, whereas target-framed CS shifts target-ward and shows larger Question Answering (QA) degradation.
Motivated by this representational pattern, we propose CANVAS (Contextual Anchor-based Neural Vector Alignment Steering), an inference-time intervention that extracts a source-side canvas from the input and softly steers target-language hidden states toward the source anchor during prefill. CANVAS consistently recovers QA F1 across MLLMs and CS conditions, showing that internal anchoring signals provide an actionable target for mitigating CS inference failures.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The REFLECT benchmark reveals that current LLM judges are unreliable, achieving below 55% accuracy in evaluating reasoning and evidence use, highlighting the need for improved evaluation methods for deep research agents.