JSPG: Dynamic Dictionary Filtering via Joint Semantic-Pinyin-Glyph Retrieval for Chinese Contextual ASR
Quick Take
JSPG enhances Chinese ASR accuracy by integrating Semantic, Pinyin, and Glyph features for dynamic keyword filtering.
Key Points
- Dynamic filtering reduces irrelevant keyword noise.
- Joint feature integration improves retrieval accuracy.
- Extended Smith-Waterman algorithm bridges filtering gaps.
📖 Reader Mode
~2 min readAbstract:Contextual Automatic Speech Recognition (ASR) faces challenges with large-scale keyword dictionaries, as excessive irrelevant candidates introduce noise that degrades accuracy. To address this, dynamic filtering typically uses a base ASR model to generate preliminary hypotheses, followed by semantic text retrievers to fetch a concise subset of relevant keywords. However, this approach frequently fails in Chinese ASR. Base models often produce homophonic or near-homophonic errors that preserve the phonetic cues of the target keywords but severely distort their semantic meaning, rendering standard semantic retrievers ineffective. To resolve this, we propose a filtering framework that jointly integrates Semantic, Pinyin, and Glyph features (JSPG). Pinyin effectively retrieves targets based on phonetic similarity, while glyph provides complementary structural cues to filter out numerous irrelevant homophones inherent in Chinese. To bridge the gap between character-level pinyin/glyph metrics and sequence-level filtering, we introduce an extended Smith-Waterman algorithm that computes similarity scores between the N-best hypothesis sequences and keywords. Experiments on the Aishell-1 and RWCS-NER datasets demonstrate that JSPG significantly outperforms single-feature baselines. Furthermore, downstream contextual ASR models guided by JSPG achieve substantial improvements in keyword recognition accuracy.
| Subjects: | Computation and Language (cs.CL) |
| Cite as: | arXiv:2605.16896 [cs.CL] |
| (or arXiv:2605.16896v1 [cs.CL] for this version) | |
| https://doi.org/10.48550/arXiv.2605.16896 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Shilin Zhou [view email]
[v1]
Sat, 16 May 2026 09:16:09 UTC (72 KB)
— Originally published at arxiv.org
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The reliability of LLM judges for evaluating deep research agents is critically assessed using the REFLECT benchmark.