Wait, am I Being Fair? Characterizing Deductive Stereotyping and Mitigating It with Fair-GCG

arXiv cs.CL·Naihao Deng, Yilun Zhu, Joan Nwatu, Clayton Scott, Rada Mihalcea

2h ago

·~1 min·7/1/2026·en·0

Quick Answer

This paper identifies deductive stereotyping in large language models (LLMs), where models make biased inferences based on population statistics.

Quick Take

This paper identifies deductive stereotyping in large language models (LLMs), where models make biased inferences based on population statistics. To counteract this, the authors propose Fair-GCG, a framework that enhances fairness-aware reasoning by discovering effective injection phrases, leading to improved performance on fairness benchmarks and real-world tasks.

Key Points

Deductive stereotyping leads to biased inferences in LLMs despite improved reasoning.
Fair-GCG systematically discovers injection phrases to enhance fairness in reasoning.
The proposed framework improves performance across multiple fairness benchmarks.
Fair-GCG generalizes from smaller to larger LLMs, reducing bias in outputs.
The approach transfers effectively to real-world fairness-sensitive applications.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Article Excerpt

From source RSS / original summary

arXiv:2606. 30989v1 Announce Type: new Abstract: Warning: This paper contains several toxic and offensive statements. While reasoning generally improves fairness in recent large language models (LLMs), failures persist. In this work, we identify a failure mode, deductive stereotyping, in which models apply population-level statistical regularities to individual cases, producing logically coherent yet socially biased inferences. We provide a statistical interpretation of this phenomenon.

To steer models toward fairness-aware reasoning, we propose a reasoning-time injection framework. We further introduce Fair-GCG to systematically discover effective injection phrases. Injection phrases discovered by Fair-GCG improve performance across multiple fairness benchmarks, generalize from smaller to larger LLMs, improves reasoning-level fairness, reduces bias in open-ended generation, and transfer to real-world fairness-sensitive tasks.

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CL

See more →

arXiv cs.CL·Barak Or

1w ago

FeaturedOriginal

Quantifying Prior Dominance in Systems

AI Summary

The study introduces the Normalized Context Utilization (NCU) metric to evaluate Retrieval-Augmented Generation (RAG) systems, revealing that Small Language Models (SLMs) outperform larger models in factual extraction. The findings indicate that traditional scaling laws yield diminishing returns, with a commercial API frequently failing against adversarial evidence due to systemic confidence collapse.

#LLM #AI Coding #Inference #AI Startup

Wait, am I Being Fair? Characterizing Deductive Stereotyping and Mitigating It with Fair-GCG

Quick Answer

Quick Take

Key Points

Paper Resources

Article Excerpt

Want this in your inbox every morning?

More from arXiv cs.CL

Quantifying Prior Dominance in Systems

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

When Plausible Is Not Realistic: Evaluating Human Mobility in LLM-Based Urban Simulation

Quick Answer

Quick Take

Key Points

Paper Resources

Article Excerpt

Want this in your inbox every morning?

More from arXiv cs.CL

Quantifying Prior Dominance in RAG Systems

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

When Plausible Is Not Realistic: Evaluating Human Mobility in LLM-Based Urban Simulation

Quantifying Prior Dominance in Systems