Annotator Positionality as Signal: Psychometric Weighting for Anti-Autistic Ableism Detection
Quick Take
This study introduces a bias-aware evaluation framework for detecting anti-autistic ableism in large language models (LLMs), revealing that LLMs often mislabel community-reclaimed language and exhibit negative biases against autistic individuals. The framework emphasizes annotator positionality and shows that conventional methods underrepresent autistic perspectives, leading to harmful outputs.
Key Points
- Introduces a psychometrically-weighted framework for anti-autistic ableism detection.
- Finds LLMs mislabel community-reclaimed language as ableist.
- Conventional majority-vote methods underweight autistic perspectives.
- Models rely on keyword matching, ignoring contextual factors.
- Assessment tools masked lead to more negative outputs towards autistic individuals.
Article Excerpt
From source RSS / original summaryarXiv:2605. 26397v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used in decision-making tasks where they can amplify or suppress perspectives, raising concerns in high-stakes settings affecting autistic communities. While previous research has identified disability-related biases in LLMs, it remains unclear how they conceptualize ableism or detect it in text.
We introduce a bias-aware evaluation framework targeting anti-autistic ableist language with a psychometrically-weighted, community-proximate ground truth anchored in annotator positionality. This framework constitutes a stricter standard than conventional majority-vote aggregation which significantly and consistently underweights autistic and autism-accepting perspectives.
We find that LLMs frequently produce harmful outputs, mislabel community-reclaimed language as ableist, and express more negative attitudes toward autistic people when assessment instruments are masked. Our error analysis reveals that models rely on surface-level keyword matching rather than contextual factors such as speaker identity, and whether the language fosters in-group solidarity or inflicts out-group harm.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The reliability of LLM judges for evaluating deep research agents is critically assessed using the REFLECT benchmark.