Cultural Value Alignment Via Latent Activation Steering in Large Language Models
Quick Take
This study introduces a framework for cultural value alignment in large language models (LLMs) by utilizing scenario-based behavioral probing, revealing significant latent entanglement across cultural dimensions. The method enhances cultural evaluation without retraining, demonstrating varied adaptability among models and highlighting the complexities of aligning global values.
Key Points
- Proposes a framework for cultural evaluation using scenario-based behavioral probing.
- Extracts implicit token probabilities from 300 situational dilemmas to map cultural values.
- Introduces activation steering to adjust internal alignments without retraining.
- Finds significant variability in adaptability across multiple LLMs.
- Reveals latent entanglement, where changes in one cultural dimension affect others.
Article Content
From source RSS / original summaryarXiv:2605. 26365v1 Announce Type: new Abstract: Large Language Models (LLMs) often exhibit homogenized cultural perspectives. While the World Values Survey (WVS) provides a gold standard for mapping human values, traditional direct prompting of LLMs on WVS often fails to access the model's latent cultural depth, leading to safety-aligned refusals or neutral responses.
Here, we propose a generalizable framework for cultural evaluation and intervention that transitions from abstract queries to scenario-based behavioral probing. By extracting implicit token probabilities across 300 situational dilemmas, we bypass surface-level alignment to map the latent coordinates of LLMs cultural value. We further introduce activation steering to shift these internal alignments during the forward pass without retraining.
Across multiple LLMs, we find substantial variation in adaptability and uncover a consistent phenomenon of latent entanglement, where interventions along one cultural dimension induce shifts along another. These results suggest that cultural values are encoded as coupled structures, limiting precise alignment. This work establishes a computationally efficient framework for cultural steering, highlighting the structural complexities when navigating global value with LLMs.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The reliability of LLM judges for evaluating deep research agents is critically assessed using the REFLECT benchmark.