POLAR-Bench: A Diagnostic Benchmark for Privacy-Utility Trade-offs in LLM Agents

arXiv cs.AI·Qiaoyuan Zheng, Yiqu Yang, Qi Gao, Imanol Schlag

5/20/2026

·~2 min·5/20/2026·en·2

Quick Answer

Quick Take

POLAR-Bench introduces a diagnostic benchmark assessing privacy-utility trade-offs in LLM agents, revealing that current frontier models retain over 99% of protected attributes, while smaller models leak more than 50%. This benchmark evaluates 7,852 samples across 10 domains, highlighting the need for improved privacy alignment in user-trusted agents.

Key Points

POLAR-Bench scores privacy and utility across 10 domains with 7,852 samples.
Current frontier models withhold over 99% of protected attributes.
Smaller open-weight models (1-30B) score significantly worse, with some leaking over 50%.
Benchmark helps identify breakdowns in intent-following for privacy alignment.
Adversarial probing assesses both task-relevant and protected attributes.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

📖 Reader Mode

~2 min read

[Submitted on 18 May 2026]

View PDF HTML (experimental)

Abstract:LLM agents increasingly have access to private user data and act on the user's behalf when interacting with third-party systems. The user defines what may and must not be shared, and the agent must robustly follow that intent even when third-party systems behave adversarially. We introduce POLAR-Bench (Policy-aware adversarial Benchmark), in which a trusted model with a privacy policy and a task converses with a third-party model that adversarially probes for both task-relevant and protected attributes. Across 10 domains and 7,852 samples, we score privacy and utility by deterministic set-membership and vary privacy policy dimension and attack strategy along two orthogonal axes, producing a 5 times 5 diagnostic surface per model. Our results reveal a sharp split: current frontier models withhold over 99% of protected attributes, while smaller open-weight models in the 1--30B range, the class users most commonly run as their own trusted agent on-device or via private inference, score notably worse, with the weakest leaking over half. POLAR-Bench thus localizes where each model's intent-following breaks down, providing a foothold for privacy alignment where it matters most.

Comments:	Preprint
Subjects:	Artificial Intelligence (cs.AI)
Cite as:	arXiv:2605.19127 [cs.AI]
	(or arXiv:2605.19127v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2605.19127 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Qiaoyuan Zheng [view email]
[v1] Mon, 18 May 2026 21:27:07 UTC (4,346 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.AI

See more →

arXiv cs.AI·Ye Liu, Srijan Bansal, Bo Pang, Yang Li, Zeyu Leo Liu, Yifei Ming, Zixuan Ke, Shafiq Joty, Semih Yavuz

1d ago

FeaturedOriginal

Procedural Memory Distillation: Online Reflection for Self-Improving Language Models

AI Summary

Procedural Memory Distillation (PMD) enhances reinforcement learning by converting cross-episode signals into reusable memory, improving Qwen3-8B and OLMo3-Instruct-7B models by 3.8-5.5% on SCIKNOWEVAL and 7.9-13.6% on . The co-evolution of policy and memory allows for more effective self-supervision, demonstrating significant performance gains when both components are active.

#LLM #AI Coding #Inference #Policy