Error-Aware TF-IDF Retrieval-Augmented Generation for ASR Error Correction
Quick Answer
This study introduces an error-aware TF-IDF retrieval-augmented generation framework for ASR error correction, achieving a hit rate increase from 53.7% to 90.9% on the Persian subset of the FLEURS dataset.
Quick Take
This study introduces an error-aware TF-IDF framework for ASR error correction, achieving a hit rate increase from 53.7% to 90.9% on the Persian subset of the FLEURS dataset. The method reduces the final word error rate from 23.06% to 18.83%, providing significant accuracy improvements with minimal latency.
Key Points
- Proposed a lexical error-aware framework for ASR error correction.
- Increased error-aware hit rate from 53.7% to 90.9% on FLEURS dataset.
- Reduced final word error rate from 23.06% to 18.83%.
- Utilizes a symmetric text normalization module and novel TF-IDF algorithm.
- Addresses phonetic misrecognitions effectively with minimal latency.
Paper Resources
Article Content
From source RSS / original summaryarXiv:2606. 24915v1 Announce Type: new Abstract: End-to-end automatic speech recognition systems frequently hallucinate rare entities and domain-specific terms, especially in low-resource languages. While frameworks can mitigate these errors using large language models, current architectures face significant challenges. They either rely on standard sparse retrieval that ignores phonetic misrecognitions or utilize heavyweight cross-modal embeddings that introduce high latency.
This letter proposes a highly efficient, purely lexical error-aware framework designed to explicitly resolve phonetic and loop hallucinations. Our approach integrates a symmetric text normalization module with a novel error-aware term frequency-inverse document frequency algorithm. By constructing a sparse diagonal penalty matrix based on historical errors, the retriever mathematically prioritizes corrective documents containing specific high-risk misrecognitions.
Evaluated on the Persian subset of the FLEURS dataset, our method increased the error-aware hit rate from 53. 7% to 90. 9%. In end-to-end evaluations, the integrated framework reduced the final word error rate from 23. 06% to 18. 83%, achieving significant accuracy gains with near-zero inference latency.
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Quantifying Prior Dominance in Systems
The study introduces the Normalized Context Utilization (NCU) metric to evaluate Retrieval-Augmented Generation (RAG) systems, revealing that Small Language Models (SLMs) outperform larger models in factual extraction. The findings indicate that traditional scaling laws yield diminishing returns, with a commercial API frequently failing against adversarial evidence due to systemic confidence collapse.