CSRP: Chain-of-Thought Reasoning for Chinese Text Correction via Reinforcement Learning with Efficiency-Aware Rewards

arXiv cs.CL·Wei Tian, Yuhao Zhou, Man Lan

6/2/2026

·~2 min·6/2/2026·en·4

Quick Answer

CSRP introduces a three-stage framework for Chinese Grammatical Error Correction, achieving state-of-the-art performance on the NACGEC benchmark with 50.99 F0.5 and 57.17 precision.

Quick Take

CSRP introduces a three-stage framework for Chinese Grammatical Error Correction, achieving state-of-the-art performance on the NACGEC benchmark with 50.99 F0.5 and 57.17 precision. This method effectively reduces over-correction bias seen in MLE-trained models and surpasses GPT-4 in spelling correction by 5.20 points.

Key Points

CSRP utilizes Continual Pre-training on 5.9M samples to enhance domain knowledge.
Chain-of-Thought SFT improves diagnostic transparency through explicit error reasoning.
Group Relative Policy Optimization employs Efficiency-Aware Rewards to minimize unnecessary edits.
CSRP achieves an 8% relative gain over the SFT baseline in grammatical error correction.
The method advances CSCD spelling correction performance to 59.61 F1.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Article Content

From source RSS / original summary

arXiv:2606. 00020v1 Announce Type: new Abstract: Large Language Model (LLM) based Chinese Grammatical Error Correction (CGEC) systems face two critical challenges: general-purpose models lack specialized linguistic priors for subtle grammatical distinctions, and Supervised Fine-Tuning (SFT) with Maximum Likelihood Estimation fails to optimize for precision-focused metrics, leading to systematic over-correction.

We propose CSRP, a three-stage framework that progressively builds correction capability through Continual Pre-training (CPT) on 5. 9M balanced samples to internalize domain knowledge, Chain-of-Thought SFT with explicit error reasoning for diagnostic transparency, and Group Relative Policy Optimization with a novel Efficiency-Aware Reward that explicitly penalizes unnecessary edits. On the NACGEC benchmark, CSRP achieves state-of-the-art performance with 50. 99 $F_{0. 5}$ and 57.

17 precision, substantially outperforming previous best results while effectively mitigating the over-correction bias inherent in MLE-trained models. Our method also advances CSCD spelling correction to 59. 61 F1, surpassing GPT-4 by 5. 20 points.

Comprehensive ablation studies demonstrate that the RL alignment stage contributes a 8\% relative gain over the SFT baseline, and that this gain is orthogonal to the contribution of large-scale CPT, validating that explicit optimization for edit efficiency is essential for high-quality grammatical error correction. Our code is available at https://github. com/TW-NLP/ChineseErrorCorrector.

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CL

See more →

arXiv cs.CL·Miguel Arana-Catania, Catherine Conisbee, Matthew Kidd

4d ago

FeaturedOriginal

Letting the Data Speak: Extracting Keywords from Crowdsourced Collections with AI

AI Summary

The study evaluates three NLP approaches—Named Entity Recognition, Keyword Extraction, and Topic Modelling—using the Their Finest Hour Online Archive to automate keyword extraction from crowdsourced WWII collections. Findings suggest that while NLP methods show promise, no single approach is sufficient, and ethical considerations in automated keyword extraction are crucial for responsible stewardship.

#AI Coding #Inference #Open Source #Policy

CSRP: Chain-of-Thought Reasoning for Chinese Text Correction via Reinforcement Learning with Efficiency-Aware Rewards

Quick Answer

Quick Take

Key Points

Paper Resources

Article Content

Want this in your inbox every morning?

More from arXiv cs.CL

Letting the Data Speak: Extracting Keywords from Crowdsourced Collections with AI

Quantifying Prior Dominance in Systems

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

Quick Answer

Quick Take

Key Points

Paper Resources

Article Content

Want this in your inbox every morning?

More from arXiv cs.CL

Letting the Data Speak: Extracting Keywords from Crowdsourced Collections with AI

Quantifying Prior Dominance in RAG Systems

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

Quantifying Prior Dominance in Systems