Automated Scoring of Arabic Text Using Large Language Models: A Literature Review

arXiv cs.CL·Khaoula Dahimi, Hadda Cherroun, Amel Belabbaci

3d ago

·~1 min·6/10/2026·en·0

Quick Answer

This paper shows that This literature review explores LLM-based approaches for Automated Text Scoring (ATS) of Arabic texts, focusing on short answer grading and essay scoring.

Quick Take

This literature review explores LLM-based approaches for Automated Text Scoring (ATS) of Arabic texts, focusing on short answer grading and essay scoring. It introduces a five-dimensional taxonomy for comparative analysis of methodologies, datasets, and performance metrics, emphasizing the need for ongoing research to enhance educational quality in Arabic-speaking communities.

Key Points

Automated Text Scoring (ATS) enables scalable evaluation of learner responses without human input.
The study introduces a taxonomy with five dimensions for analyzing ATS methodologies.
Focus areas include short answer grading (ASAG) and essay scoring (AES) for Arabic texts.
Findings stress the importance of pedagogically grounded research in Arabic ATS.
Increased accessibility of LLMs has revitalized interest in Arabic text evaluation.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Article Excerpt

From source RSS / original summary

arXiv:2606. 09830v1 Announce Type: new Abstract: In modern educational systems, Automatic Text Scoring (ATS) plays a central role by enabling scalable and consistent evaluation of learner responses without human intervention. Recently, the increased accessibility of LLMs and Arabic-specific datasets has sparked renewed interest in this area. In this work, we investigate LLM-Based approaches for the automated evaluation of Arabic texts, focusing on both short answer grading (ASAG) and essay scoring (AES).

We further introduce a structured taxonomy comprising five dimensions: application domain, feedback generation capability, LLM architecture deployed, alignment with competency referential frameworks, and prompt engineering strategy. By applying this taxonomy, we conduct a comparative analysis of existing studies, examining their methodological approaches, datasets, evaluation metrics, and reported performance.

The findings highlight the need for sustained and pedagogically grounded research efforts in Arabic ATS, given its significance for improving educational quality across Arabic-speaking communities.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CL

See more →

arXiv cs.CL·Leyao Wang, Yanan He, Peng Chen, Asaf Yehudai, Yixin Liu, Rex Ying, Michal Shmueli-Scheuer, Arman Cohan

3w ago

FeaturedOriginal

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

AI Summary

The REFLECT benchmark reveals that current LLM judges are unreliable, achieving below 55% accuracy in evaluating reasoning and evidence use, highlighting the need for improved evaluation methods for deep research agents.

#LLM #Agent #Inference #Policy