POLAR-Bench: A Diagnostic Benchmark for Privacy-Utility Trade-offs in LLM Agents
Quick Take
POLAR-Bench evaluates privacy-utility trade-offs in LLM agents against adversarial probing.
Key Points
- Introduces a benchmark for privacy policy adherence.
- Analyzes 10 domains with 7,852 samples.
- Highlights significant privacy leaks in smaller models.
📖 Reader Mode
~2 min readAbstract:LLM agents increasingly have access to private user data and act on the user's behalf when interacting with third-party systems. The user defines what may and must not be shared, and the agent must robustly follow that intent even when third-party systems behave adversarially. We introduce POLAR-Bench (Policy-aware adversarial Benchmark), in which a trusted model with a privacy policy and a task converses with a third-party model that adversarially probes for both task-relevant and protected attributes. Across 10 domains and 7,852 samples, we score privacy and utility by deterministic set-membership and vary privacy policy dimension and attack strategy along two orthogonal axes, producing a 5 times 5 diagnostic surface per model. Our results reveal a sharp split: current frontier models withhold over 99% of protected attributes, while smaller open-weight models in the 1--30B range, the class users most commonly run as their own trusted agent on-device or via private inference, score notably worse, with the weakest leaking over half. POLAR-Bench thus localizes where each model's intent-following breaks down, providing a foothold for privacy alignment where it matters most.
| Comments: | Preprint |
| Subjects: | Artificial Intelligence (cs.AI) |
| Cite as: | arXiv:2605.19127 [cs.AI] |
| (or arXiv:2605.19127v1 [cs.AI] for this version) | |
| https://doi.org/10.48550/arXiv.2605.19127 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Qiaoyuan Zheng [view email]
[v1]
Mon, 18 May 2026 21:27:07 UTC (4,346 KB)
— Originally published at arxiv.org
More from arXiv cs.AI
See more →From Prompts to Protocols: An AI Agent for Laboratory Automation
An AI agent integrates large language models for automating laboratory protocols, enhancing efficiency and accuracy.