Cross-Lingual Steering for Figurative Language Generation
Quick Take
This study demonstrates that multilingual large language models can generate figurative language effectively across languages, with activation steering revealing reusable signals. Notably, directions learned in one language enhance figurative generation in others, particularly benefiting German, and can outperform native signals.
Key Points
- Activation steering estimates figurative categories from activation differences in one language.
- Five figurative categories were tested across six languages and four multilingual LLMs.
- Metaphor and simile showed the most robust steering results.
- Directions learned in one language effectively transfer to enhance generation in another.
- German was identified as one of the most receptive target languages.
Article Excerpt
From source RSS / original summaryarXiv:2605. 30443v1 Announce Type: new Abstract: Multilingual large language models can generate figurative language, but whether the internal signals driving this behavior are language-specific or reusable across languages is unclear. Using activation steering as a probe, we estimate a direction for a figurative category from figurative--literal activation differences in one language and apply it during generation.
Across five figurative categories, six languages, and four multilingual LLMs, these directions steer reliably within their own language, most robustly for metaphor and simile. More importantly, they transfer across languages: a direction learned in one increases the target behavior when applied to another, with German among the most receptive targets.
Going further, directions assembled from other languages can match or even surpass a target language's own native direction, while removing this shared component weakens native steering. Together, these results provide direct evidence of a reusable but target-dependent cross-lingual signal for figurative generation.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The REFLECT benchmark reveals that current LLM judges are unreliable, achieving below 55% accuracy in evaluating reasoning and evidence use, highlighting the need for improved evaluation methods for deep research agents.