Phonological Perception of Sign Language Models

arXiv cs.CL·Kayo Yin, Jessica Carter, Alex Xijie Lu, Annemarie Kocab

1d ago

·~2 min·6/30/2026·en·0

Quick Answer

This paper shows that Recent research evaluates Sign Language Recognition (SLR) models for American Sign Language (ASL), revealing that pose-based models excel in handshape sensitivity while pixel-based models are better at capturing location changes.

Quick Take

Recent research evaluates Sign Language Recognition (SLR) models for American Sign Language (ASL), revealing that pose-based models excel in handshape sensitivity while pixel-based models are better at capturing location changes. Despite showing emergent phonological sensitivity, the models' architectural biases limit their performance, indicating a need for improved training paradigms.

Key Points

SLR models trained on ASL show emergent phonological sensitivity.
Pose-based models excel in distinguishing handshape contrasts.
Pixel-based models better capture changes in location.
Latent representations from pose-based models correlate with human perceptual judgments (r~0.49).
Current training paradigms are insufficient to overcome architectural biases.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

📖 Reader Mode

~2 min read

[Submitted on 27 Jun 2026]

View PDF HTML (experimental)

Abstract:Sign languages are compositional systems where meaning arises by combining sublexical phonological parameters, such as handshape, location, and movement. While deep learning models for Sign Language Recognition (SLR) have achieved increased performance on translation benchmarks, it remains unclear whether these models distinguish abstract phonological features or merely rely on low-level statistical correlations. This work evaluates the phonological perception of SLR models trained on American Sign Language (ASL) by probing phonological sensitivity using minimal pairs and evaluating representational alignment with human behavioral data. Our results reveal that SLR models exhibit emergent phonological sensitivity, but with clear architectural trade-offs: pose-based models are sensitive to handshape contrasts, while pixel-based models better capture location changes. Furthermore, pose-based models learn latent representations that correlate with human perceptual similarity judgments (r~0.49). These findings suggest that while SLR models exhibit emergent phonology, current training paradigms are insufficient to scale them beyond their architectural inductive biases.

Comments:	Accepted to CogSci 2026
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2606.28667 [cs.CL]
	(or arXiv:2606.28667v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2606.28667 arXiv-issued DOI via DataCite

Submission history

From: Kayo Yin [view email]
[v1] Sat, 27 Jun 2026 01:02:35 UTC (7,695 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CL

See more →

arXiv cs.CL·Barak Or

1w ago

FeaturedOriginal

Quantifying Prior Dominance in Systems

AI Summary

The study introduces the Normalized Context Utilization (NCU) metric to evaluate Retrieval-Augmented Generation (RAG) systems, revealing that Small Language Models (SLMs) outperform larger models in factual extraction. The findings indicate that traditional scaling laws yield diminishing returns, with a commercial API frequently failing against adversarial evidence due to systemic confidence collapse.

#LLM #AI Coding #Inference #AI Startup

Phonological Perception of Sign Language Models

Quick Answer

Quick Take

Key Points

Paper Resources

📖 Reader Mode

Submission history

Want this in your inbox every morning?

More from arXiv cs.CL

Quantifying Prior Dominance in Systems

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

When Plausible Is Not Realistic: Evaluating Human Mobility in LLM-Based Urban Simulation

Quick Answer

Quick Take

Key Points

Paper Resources

📖 Reader Mode

Submission history

Want this in your inbox every morning?

More from arXiv cs.CL

Quantifying Prior Dominance in RAG Systems

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

When Plausible Is Not Realistic: Evaluating Human Mobility in LLM-Based Urban Simulation

Quantifying Prior Dominance in Systems