RAG-Coding: Enhancing LLM Medical Coding with Structured External Knowledge

arXiv cs.CL·Yidong Gan, David D. Nguyen, Yang Lin, Peter Zhong, Thanh Vu, Long Duong, Yuan-Fang Li

5/28/2026

·~1 min·5/28/2026·en·2

Quick Answer

RAG-Coding enhances ICD-10-CM coding accuracy by 8-13% in micro-F1 and 2-8% in macro-F1 using four LLM agents grounded in external knowledge.

Quick Take

-Coding enhances ICD-10-CM coding accuracy by 8-13% in micro-F1 and 2-8% in macro-F1 using four LLM agents grounded in external knowledge. It outperforms the PLM-ICD model in micro recall by 11%, while releasing the updated MDACE-2025 dataset with expert re-annotations for current clinical standards.

Key Points

RAG-Coding orchestrates four LLM agents for automated ICD-10-CM coding.
Achieves 8-13% improvement in micro-F1 on the MDACE dataset.
Outperforms PLM-ICD in micro recall by 11%, while PLM-ICD has higher precision.
Releases MDACE-2025 dataset with expert re-annotations for 2025 ICD-10-CM guidelines.
Demonstrates the importance of integrating external knowledge for coding accuracy.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Article Excerpt

From source RSS / original summary

arXiv:2605. 27377v1 Announce Type: new Abstract: We present -Coding, an agentic method for automated ICD-10-CM coding. RAG-Coding orchestrates four large language model (LLM) agents and grounds their coding decisions in external knowledge sources (e. g. the official coding tabular list and guidelines). By retrieving and cross-referencing relevant knowledge in these sources, the agents enhance coding accuracy and ensure clinical compliance.

On the MDACE dataset, RAG-Coding outperforms the best LLM-based baseline by 8-13\% in micro-F1 and 2-8\% in macro-F1 across multiple LLM backbones. Compared to the state-of-the-art pretrained language model method, PLM-ICD, RAG-Coding exhibits higher micro recall (+11\%), while PLM-ICD exhibits higher micro precision (+6\%), yielding comparable micro- and macro-F1. Ablations show stepwise gains, highlighting the importance of incorporating external knowledge.

We also release MDACE-2025, updating the original dataset with expert re-annotations with the latest 2025 ICD-10-CM guidelines. This update features more fine-grained code labels and enables evaluation against current clinical standards.

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CL

See more →

arXiv cs.CL·Miguel Arana-Catania, Catherine Conisbee, Matthew Kidd

1d ago

FeaturedOriginal

Letting the Data Speak: Extracting Keywords from Crowdsourced Collections with AI

AI Summary

The study evaluates three NLP approaches—Named Entity Recognition, Keyword Extraction, and Topic Modelling—using the Their Finest Hour Online Archive to automate keyword extraction from crowdsourced WWII collections. Findings suggest that while NLP methods show promise, no single approach is sufficient, and ethical considerations in automated keyword extraction are crucial for responsible stewardship.

#AI Coding #Inference #Open Source #Policy

RAG-Coding: Enhancing LLM Medical Coding with Structured External Knowledge

Quick Answer

Quick Take

Key Points

Paper Resources

Article Excerpt

Want this in your inbox every morning?

More from arXiv cs.CL

Letting the Data Speak: Extracting Keywords from Crowdsourced Collections with AI

Quantifying Prior Dominance in Systems

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

Quick Answer

Quick Take

Key Points

Paper Resources

Article Excerpt

Want this in your inbox every morning?

More from arXiv cs.CL

Letting the Data Speak: Extracting Keywords from Crowdsourced Collections with AI

Quantifying Prior Dominance in RAG Systems

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

Quantifying Prior Dominance in Systems