Every Act Has Its Price: Compressed Moral Composition in Frontier LLMs

arXiv cs.CL·Weijia Zhang, Ruiqi Chen, Yunze Xiao, Weihao Xuan

2d ago

·~1 min·6/11/2026·en·0

Quick Answer

The study introduces the Moral Trolley Arena benchmark to assess how frontier LLMs compose moral evidence across multiple scenarios.

Quick Take

The study introduces the Moral Trolley Arena benchmark to assess how frontier LLMs compose moral evidence across multiple scenarios. Results show that moral judgments are influenced by the strength of individual acts but exhibit a compressed, non-additive relationship, suggesting a need for more nuanced moral audits in AI models.

Key Points

Moral Trolley Arena benchmarks LLMs on moral evidence composition across 229 scenarios.
Composite judgments are largely predicted by individual act strength but are compressed.
Models show non-additive intensity anchoring and bounded foundation-specific residuals.
Results indicate a need for measuring composition rules in moral audits of AI.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Article Excerpt

From source RSS / original summary

arXiv:2606. 11232v1 Announce Type: new Abstract: Existing LLM moral benchmarks usually ask which isolated moral act, value, or foundation a model prefers. This is useful but incomplete. Realistic judgments often require a model to combine several moral signals within the same option. We introduce **Moral Trolley Arena**, a two-stage blind ELO benchmark for measuring how LLMs compose moral evidence.

The single-scene arena first calibrates individual moral acts from a 229-scenario corpus across five Moral Foundations Theory foundations; the composite arena then combines calibrated acts into two-act moral items over a controlled intensity grid and measures the resulting composite preferences. Across ten frontier models, composite judgments are largely predicted by component act strength, but the relation is consistently compressed rather than simply additive.

Models also show non-additive intensity anchoring, bounded foundation-specific residuals after component control, and highly convergent composite preference surfaces across providers. These results suggest that moral audits should measure composition rules for moral evidence, not only rankings over isolated acts.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CL

See more →

arXiv cs.CL·Leyao Wang, Yanan He, Peng Chen, Asaf Yehudai, Yixin Liu, Rex Ying, Michal Shmueli-Scheuer, Arman Cohan

3w ago

FeaturedOriginal

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

AI Summary

The REFLECT benchmark reveals that current LLM judges are unreliable, achieving below 55% accuracy in evaluating reasoning and evidence use, highlighting the need for improved evaluation methods for deep research agents.

#LLM #Agent #Inference #Policy