What Do People Actually Want From AI? Mapping Preference Plurality
Quick Answer
A study analyzing 1,500 responses from 75 countries reveals that preferences for AI vary widely, with only 49% agreeing on 'truthfulness' but defining it differently.
Quick Take
A study analyzing 1,500 responses from 75 countries reveals that preferences for AI vary widely, with only 49% agreeing on 'truthfulness' but defining it differently. Current alignment methods fail to capture these diverse preferences, leading to high hallucination rates in models despite user demands for accuracy.
Key Points
- 49% of respondents prioritize 'truthfulness', but definitions vary widely.
- Most values requested by AI users are mentioned by less than 25% of respondents.
- Current alignment practices fail to account for conflicting preferences.
- High hallucination rates persist in models despite clear user demands for accuracy.
- Contextual distinctions in AI expectations complicate binary comparisons.
Article Content
From source RSS / original summaryarXiv:2606. 06674v1 Announce Type: new Abstract: Large Language Models (LLMs) are often fine-tuned through Reinforcement Learning from Human Feedback (RLHF) to align with people's preferences and values. However, this method has known limitations: it aggregates conflicting preferences, often relies on unrepresentative samples, and uses only binary comparisons.
Analysing 1,500 open-ended responses from the PRISM dataset across 75 countries, we examine what people actually want from AI systems and reveal concrete failures of current methods. We find that different people want different things: most values are requested by fewer than a quarter of respondents, with truthfulness the sole exception at 49%.
Furthermore, the same words hide divergent meanings: when people describe what they mean by "truthfulness", they reveal distinct, potentially incompatible, epistemological bases, as some ask for sourced claims, some for expert opinions, and some even ask for unpopular views. Certain capabilities, namely how human-like a model behaves, and some features, like AI guardrails, are outright controversial, with some desiring them and others rejecting them.
We additionally find that people often use contextual distinctions (what AI should do "by default" versus "if requested") that binary comparisons cannot capture. These findings expose fundamental problems in current alignment practices. When 49% request truthfulness but define it differently, this is unlikely to be captured by a single reward model.
The persistence of high hallucination rates in well-funded models, despite users' clear demands for accuracy, suggests that current methods fail to identify actual preferences. This paper sheds light on the situated, contested, imperfect signals that are currently being flattened into universal preference models, a practice others have characterised as epistemic violence.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The REFLECT benchmark reveals that current LLM judges are unreliable, achieving below 55% accuracy in evaluating reasoning and evidence use, highlighting the need for improved evaluation methods for deep research agents.