GPF-LiveNews: A Streaming Evaluation Protocol for… · DeepSignal

GPF-LiveNews: A Streaming Evaluation Protocol for Group-Conditioned Framing in Large Language Models

arXiv cs.CL·Mohd Ariful Haque, Fahad Rahman, Kishor Datta Gupta, Roy George

1d ago

·~1 min·5/29/2026·en·1

Quick Take

The GPF-LIVENEWS protocol evaluates group-conditioned framing in LLMs, analyzing 23 models across 42 identity labels. Results indicate that Policy/Action prompts yield significant semantic shifts, while sentiment variation remains consistent across prompts.

Key Points

Introduces GPF-LIVENEWS for auditing LLM outputs in dynamic environments.
Evaluates 23 models using 42 identity labels and seven prompt families.
Policy/Action prompts show the strongest semantic movement in evaluations.
Sentiment variation across prompts is relatively flat and consistent.
Released artifact includes metadata, templates, and reproduction scripts.

Article Excerpt

From source RSS / original summary

arXiv:2605. 28848v1 Announce Type: new Abstract: Deployed language models are evaluated in a non-stationary environment: model versions, retrieval layers, safety systems, and real-world inputs all change over time. Static bias benchmarks remain useful, but they do not show how models frame newly emerging events for different prompted audiences. We introduce GPF-LIVENEWS, a streaming evaluation protocol and benchmark snapshot for auditing group-conditioned framing in open-ended LLM outputs.

The protocol expands fresh BBC/Reuters news anchors across 42 identity labels and seven prompt families, then evaluates response bundles using semantic-sensitivity and sentiment-disparity signals. In a pilot over 12 monitoring runs and 23 hosted models, Policy/Action prompts produce the strongest semantic movement, while sentiment variation is flatter across dimensions and prompt families.

The released artifact includes article metadata, prompt templates, instantiated prompts, model-output metadata, score tables, documentation, and reproduction scripts. We interpret all scores as observed-window audit signals for human review, not as permanent fairness rankings or direct proof of harmful bias.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

GPF-LiveNews: A Streaming Evaluation Protocol for Group-Conditioned Framing in Large Language Models

Quick Take

Key Points

Article Excerpt

Want this in your inbox every morning?

More from arXiv cs.CL

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

What are They Thinking? Delineation, Probing and Tracking of Concepts in LLMs

In-Context Optimization for Retrieval-Augmented Generation: A Gradient-Descent Perspective