How Does Differential Privacy Affect Social Bias in LLMs? A Systematic Evaluation

arXiv cs.CL·Eduardo Tenorio, Karuna Bhaila, Xintao Wu

4d ago

·~2 min·5/13/2026·en·1

Quick Take

Differential privacy reduces bias in some LLM tasks but not universally across all paradigms.

Key Points

DP-SGD trained LLMs show varied bias reduction.
Bias improvement is task-specific and not universal.
Multi-paradigm evaluation is crucial for fairness assessment.

📖 Reader Mode

~2 min read

[Submitted on 11 May 2026]

View PDF HTML (experimental)

Abstract:Large language models (LLMs) trained on web-scale corpora can memorize sensitive training data, posing significant privacy risks. Differential privacy (DP) has emerged as a principled framework that limits the influence of individual data points during training, yet the relationship between differential privacy and social bias in LLMs remains poorly understood. To investigate this, we present a systematic evaluation of social bias in a pretrained LLM trained with DP-SGD, comparing a DP model against non-DP baselines across four complementary paradigms: sentence scoring, text completion, tabular classification, and question answering. We find that DP reduces bias in sentence scoring tasks, where bias is measured through controlled likelihood comparisons, yet this improvement does not generalize across all tasks. Our results reveal a discrepancy between logit-level bias and output-level bias. Moreover, decreasing memorization does not necessarily reduce unfairness, underscoring the importance of multi-paradigm evaluation when assessing fairness in LLMs.

Comments:	14 pages, 1 figure
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2605.11195 [cs.CL]
	(or arXiv:2605.11195v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2605.11195 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Eduardo Tenorio [view email]
[v1] Mon, 11 May 2026 20:03:05 UTC (129 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

How Does Differential Privacy Affect Social Bias in LLMs? A Systematic Evaluation

Quick Take

Key Points

📖 Reader Mode

Submission history

More from arXiv cs.CL

Generative Floor Plan Design with LLMs via Reinforcement Learning with Verifiable Rewards

Measuring and Mitigating Toxicity in Large Language Models: A Comprehensive Replication Study

Auditing Agent Harness Safety

Related in this space

Invisible Orchestrators Suppress Protective Behavior and Dissociate Power-Holders: Safety Risks in Multi-Agent LLM Systems

Enhanced and Efficient Reasoning in Large Learning Models