Greener Than Humans? Environmental Attitudes in Large Language Models
Quick Take
This study benchmarks environmental attitudes in 31 large language models (LLMs), revealing they often reflect more progressive environmental views than average human respondents. The models show contextual sensitivity and ideological shifts based on user prompts, raising concerns about their reliability in sustainability decision-making.
Key Points
- 31 LLMs were evaluated for environmental cognition and behavioral recommendations.
- Many models showed higher environmental affect than average respondents from Germany.
- No consistent relationship was found between model attributes and sustainability responses.
- Models exhibited ideological shifts based on user prompts, raising reliability concerns.
- The study provides a framework for assessing sustainability alignment in AI systems.
Article Content
From source RSS / original summaryarXiv:2606. 02741v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used in sustainability-related decision support, reporting, and public communication, yet little systematic evidence exists on the environmental attitudes embedded in their outputs. This paper develops a benchmark for evaluating environmental cognition, affect, and behavioural recommendations in LLMs and applies it to 31 widely used proprietary and open-weight models.
Drawing on questions from established environmental awareness surveys and additional sustainability-related behavioural measures, we compare LLM responses 1) among models and 2) between models and human survey benchmarks from Germany. We assess their robustness across prompting conditions.
We find that many LLMs align more closely with environmentally progressive attitudes than the average survey respondent, exhibiting higher levels of environmental affect and cognition and recommending behaviours associated with substantial potential CO2 reductions. At the same time, we observe no systematic relationship between sustainability-oriented responses and model origin, size, or release context.
However, models exhibit contextual sensitivity, controlled by persona-based prompting and show sycophantic shifts mirroring user-specified ideological positions, which raises concerns about steerability and normative reliability in real-world deployments.
Our findings provide a reusable evaluation framework for assessing sustainability-related value alignment in LLMs and highlight the importance of governance, transparency, and critical oversight as AI systems become increasingly embedded in sustainability transformations and public decision-making.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The REFLECT benchmark reveals that current LLM judges are unreliable, achieving below 55% accuracy in evaluating reasoning and evidence use, highlighting the need for improved evaluation methods for deep research agents.