Cultural Value Alignment Via Latent Activation Steering in Large Language Models

arXiv cs.CL·Trung Duc Anh Dang, Sarah Masud

3d ago

·~1 min·5/27/2026·en·1

Quick Take

This study introduces a framework for cultural value alignment in large language models (LLMs) by utilizing scenario-based behavioral probing, revealing significant latent entanglement across cultural dimensions. The method enhances cultural evaluation without retraining, demonstrating varied adaptability among models and highlighting the complexities of aligning global values.

Key Points

Proposes a framework for cultural evaluation using scenario-based behavioral probing.
Extracts implicit token probabilities from 300 situational dilemmas to map cultural values.
Introduces activation steering to adjust internal alignments without retraining.
Finds significant variability in adaptability across multiple LLMs.
Reveals latent entanglement, where changes in one cultural dimension affect others.

Article Content

From source RSS / original summary

arXiv:2605. 26365v1 Announce Type: new Abstract: Large Language Models (LLMs) often exhibit homogenized cultural perspectives. While the World Values Survey (WVS) provides a gold standard for mapping human values, traditional direct prompting of LLMs on WVS often fails to access the model's latent cultural depth, leading to safety-aligned refusals or neutral responses.

Here, we propose a generalizable framework for cultural evaluation and intervention that transitions from abstract queries to scenario-based behavioral probing. By extracting implicit token probabilities across 300 situational dilemmas, we bypass surface-level alignment to map the latent coordinates of LLMs cultural value. We further introduce activation steering to shift these internal alignments during the forward pass without retraining.

Across multiple LLMs, we find substantial variation in adaptability and uncover a consistent phenomenon of latent entanglement, where interventions along one cultural dimension induce shifts along another. These results suggest that cultural values are encoded as coupled structures, limiting precise alignment. This work establishes a computationally efficient framework for cultural steering, highlighting the structural complexities when navigating global value with LLMs.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

Cultural Value Alignment Via Latent Activation Steering in Large Language Models

Quick Take

Key Points

Article Content

Want this in your inbox every morning?

More from arXiv cs.CL

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

What are They Thinking? Delineation, Probing and Tracking of Concepts in LLMs

In-Context Optimization for Retrieval-Augmented Generation: A Gradient-Descent Perspective