Safety is Contextual, LLM-Judges Are Not: Navigating the Rigid Priors of Evaluators
Quick Answer
The study reveals that LLM-judges struggle to adapt their safety evaluations based on new context or definitions, often sticking to their established priors.
Quick Take
The study reveals that LLM-judges struggle to adapt their safety evaluations based on new context or definitions, often sticking to their established priors. Despite their potential for large-scale safety assessments, their effectiveness is limited by their rigidity in interpreting safety criteria. This raises concerns about the reliability of LLMs in nuanced safety evaluations.
Key Points
- LLM-judges evaluated show limited adaptability to new safety definitions.
- Task demonstrations significantly influence LLM-judges' evaluations.
- Generalist LLMs struggle with context-specific safety assessments.
- Safety evaluation effectiveness is hindered by rigid internal safety priors.
- The study highlights the need for better evaluation frameworks for LLM-judges.
Paper Resources
Article Excerpt
From source RSS / original summaryarXiv:2606. 07874v1 Announce Type: new Abstract: LLMs-as-judges are the only way to evaluate safety at scale. Despite their importance, LLM-judges themselves are rarely evaluated beyond human agreement in simple, static benchmarks. We therefore investigate two under-explored but crucial properties of LLMs-as-judges: their susceptibility to relying on in context-information, and their steerability to differing safety definitions, which may not align with their internal safety priors.
We evaluate the safety judging abilities of many generalist LLMs and safety-specific judges, and investigate the impact of task demonstrations, novel in-context information, and changing safety definitions. We find that while LLM-judges can learn from new information, they are broadly unlikely to adjust their evaluations if the context or safety definition contradicts their prior.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.AI
See more →Arbor: Tree Search as a Cognition Layer for Autonomous Agents
Arbor introduces a multi-agent framework utilizing structured tree search for optimizing LLM inference, achieving up to 193% throughput-latency improvement compared to vendor-optimized systems. It employs an Orchestrator and Critic agent for stability and coordination, demonstrating hardware-agnostic performance with minimal variance.