Safety is Contextual, LLM-Judges Are Not: Navigating the Rigid Priors of Evaluators

arXiv cs.AI·Anissa Alloula, Federico Licini, Ava Batchkala, Seraphina Goldfarb-Tarrant

6d ago

·~1 min·6/9/2026·en·0

Quick Answer

The study reveals that LLM-judges struggle to adapt their safety evaluations based on new context or definitions, often sticking to their established priors.

Quick Take

The study reveals that LLM-judges struggle to adapt their safety evaluations based on new context or definitions, often sticking to their established priors. Despite their potential for large-scale safety assessments, their effectiveness is limited by their rigidity in interpreting safety criteria. This raises concerns about the reliability of LLMs in nuanced safety evaluations.

Key Points

LLM-judges evaluated show limited adaptability to new safety definitions.
Task demonstrations significantly influence LLM-judges' evaluations.
Generalist LLMs struggle with context-specific safety assessments.
Safety evaluation effectiveness is hindered by rigid internal safety priors.
The study highlights the need for better evaluation frameworks for LLM-judges.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Article Excerpt

From source RSS / original summary

arXiv:2606. 07874v1 Announce Type: new Abstract: LLMs-as-judges are the only way to evaluate safety at scale. Despite their importance, LLM-judges themselves are rarely evaluated beyond human agreement in simple, static benchmarks. We therefore investigate two under-explored but crucial properties of LLMs-as-judges: their susceptibility to relying on in context-information, and their steerability to differing safety definitions, which may not align with their internal safety priors.

We evaluate the safety judging abilities of many generalist LLMs and safety-specific judges, and investigate the impact of task demonstrations, novel in-context information, and changing safety definitions. We find that while LLM-judges can learn from new information, they are broadly unlikely to adjust their evaluations if the context or safety definition contradicts their prior.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.AI

See more →

arXiv cs.AI·Neha Prakriya, Chaojun Hou, Zheng Gong, Huasha Zhao, Xi Zhao, Mou Li, Zhenyu Gu, Emad Barsoum

3d ago

FeaturedOriginal

Arbor: Tree Search as a Cognition Layer for Autonomous Agents

AI Summary

Arbor introduces a multi-agent framework utilizing structured tree search for optimizing LLM inference, achieving up to 193% throughput-latency improvement compared to vendor-optimized systems. It employs an Orchestrator and Critic agent for stability and coordination, demonstrating hardware-agnostic performance with minimal variance.

#LLM #Agent #Inference #AI Startup