Layer-wise Probing of wav2vec 2.0 and Whisper for Consonant Cluster Reduction in African American English

arXiv cs.CL·Hamid Mojarad, Kevin Tang

4h ago

·~1 min·6/24/2026·en·0

Quick Answer

This study probes wav2vec 2.0 and Whisper models to analyze consonant cluster reduction (CCR) in African American English (AAE).

Quick Take

This study probes wav2vec 2.0 and Whisper models to analyze consonant cluster reduction (CCR) in African American English (AAE). Both models accurately differentiate between reduced and canonical forms, revealing that CCR is represented as structured phonological variation rather than mere deletion, impacting automatic speech recognition (ASR) performance.

Key Points

Layer-wise probing of wav2vec2-base and Whisper-small reveals insights into AAE phonology.
Both models achieve high accuracy in detecting segmental reduction and restoration tasks.
Reduced segments maintain cues to underlying stops, indicating structured phonological encoding.
Findings highlight the need for improved ASR systems for African American English.
Study contributes to understanding linguistic representation in modern speech models.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Article Excerpt

From source RSS / original summary

arXiv:2606. 23948v1 Announce Type: new Abstract: Self-supervised and supervised speech models are increasingly used to investigate which linguistic information their internal representations encode, and at what level of abstraction they encode it. One underexplored phenomenon is consonant cluster reduction (CCR) in African American English (AAE), a widespread phonological process and a source of automatic speech recognition (ASR) disparity.

To examine how CCR is represented, we conduct speaker-independent layer-wise probing of wav2vec2-base and Whisper-small using two tasks: segmental reduction detection and segmental restoration of underlying cluster identity. Both models distinguish reduced and canonical forms with high accuracy. Crucially, reduced segments retain cues to their underlying stops, indicating that CCR is encoded as structured gradient phonological variation rather than simple segmental deletion.

These results demonstrate structured phonological encoding of AAE CCR patterns in modern speech models.

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CL

See more →

arXiv cs.CL·Barak Or

4h ago

FeaturedOriginal

Quantifying Prior Dominance in Systems

AI Summary

The study introduces the Normalized Context Utilization (NCU) metric to evaluate Retrieval-Augmented Generation (RAG) systems, revealing that Small Language Models (SLMs) outperform larger models in factual extraction. The findings indicate that traditional scaling laws yield diminishing returns, with a commercial API frequently failing against adversarial evidence due to systemic confidence collapse.

#LLM #AI Coding #Inference #AI Startup

Layer-wise Probing of wav2vec 2.0 and Whisper for Consonant Cluster Reduction in African American English

Quick Answer

Quick Take

Key Points

Paper Resources

Article Excerpt

Want this in your inbox every morning?

More from arXiv cs.CL

Quantifying Prior Dominance in Systems

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

When Plausible Is Not Realistic: Evaluating Human Mobility in LLM-Based Urban Simulation

Quick Answer

Quick Take

Key Points

Paper Resources

Article Excerpt

Want this in your inbox every morning?

More from arXiv cs.CL

Quantifying Prior Dominance in RAG Systems

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

When Plausible Is Not Realistic: Evaluating Human Mobility in LLM-Based Urban Simulation

Quantifying Prior Dominance in Systems