JSPG: Dynamic Dictionary Filtering via Joint Semantic-Pinyin-Glyph Retrieval for Chinese Contextual ASR

arXiv cs.CL·Shilin Zhou, Zhenghua Li

5/19/2026

·~2 min·5/19/2026·en·3

Quick Answer

The JSPG framework enhances Chinese contextual ASR by integrating Semantic, Pinyin, and Glyph features, significantly improving keyword recognition accuracy on Aishell-1 and RWCS-NER datasets.

Quick Take

The JSPG framework enhances Chinese contextual ASR by integrating Semantic, Pinyin, and Glyph features, significantly improving keyword recognition accuracy on Aishell-1 and RWCS-NER datasets. This approach addresses homophonic errors that hinder traditional models, demonstrating superior performance over single-feature baselines.

Key Points

JSPG integrates Semantic, Pinyin, and Glyph features for improved ASR accuracy.
Significant performance gains observed on Aishell-1 and RWCS-NER datasets.
Traditional models struggle with homophonic errors affecting semantic meaning.
JSPG's extended Smith-Waterman algorithm bridges character and sequence-level filtering.
Downstream ASR models guided by JSPG show substantial keyword recognition improvements.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

📖 Reader Mode

~2 min read

[Submitted on 16 May 2026]

View PDF HTML (experimental)

Abstract:Contextual Automatic Speech Recognition (ASR) faces challenges with large-scale keyword dictionaries, as excessive irrelevant candidates introduce noise that degrades accuracy. To address this, dynamic filtering typically uses a base ASR model to generate preliminary hypotheses, followed by semantic text retrievers to fetch a concise subset of relevant keywords. However, this approach frequently fails in Chinese ASR. Base models often produce homophonic or near-homophonic errors that preserve the phonetic cues of the target keywords but severely distort their semantic meaning, rendering standard semantic retrievers ineffective. To resolve this, we propose a filtering framework that jointly integrates Semantic, Pinyin, and Glyph features (JSPG). Pinyin effectively retrieves targets based on phonetic similarity, while glyph provides complementary structural cues to filter out numerous irrelevant homophones inherent in Chinese. To bridge the gap between character-level pinyin/glyph metrics and sequence-level filtering, we introduce an extended Smith-Waterman algorithm that computes similarity scores between the N-best hypothesis sequences and keywords. Experiments on the Aishell-1 and RWCS-NER datasets demonstrate that JSPG significantly outperforms single-feature baselines. Furthermore, downstream contextual ASR models guided by JSPG achieve substantial improvements in keyword recognition accuracy.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2605.16896 [cs.CL]
	(or arXiv:2605.16896v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2605.16896 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Shilin Zhou [view email]
[v1] Sat, 16 May 2026 09:16:09 UTC (72 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CL

See more →

arXiv cs.CL·Barak Or

1w ago

FeaturedOriginal

Quantifying Prior Dominance in Systems

AI Summary

The study introduces the Normalized Context Utilization (NCU) metric to evaluate Retrieval-Augmented Generation (RAG) systems, revealing that Small Language Models (SLMs) outperform larger models in factual extraction. The findings indicate that traditional scaling laws yield diminishing returns, with a commercial API frequently failing against adversarial evidence due to systemic confidence collapse.

#LLM #AI Coding #Inference #AI Startup

JSPG: Dynamic Dictionary Filtering via Joint Semantic-Pinyin-Glyph Retrieval for Chinese Contextual ASR

Quick Answer

Quick Take

Key Points

Paper Resources

📖 Reader Mode

Submission history

Want this in your inbox every morning?

More from arXiv cs.CL

Quantifying Prior Dominance in Systems

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

When Plausible Is Not Realistic: Evaluating Human Mobility in LLM-Based Urban Simulation

Quick Answer

Quick Take

Key Points

Paper Resources

📖 Reader Mode

Submission history

Want this in your inbox every morning?

More from arXiv cs.CL

Quantifying Prior Dominance in RAG Systems

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

When Plausible Is Not Realistic: Evaluating Human Mobility in LLM-Based Urban Simulation

Quantifying Prior Dominance in Systems