DetectRL-X: Towards Reliable Multilingual and Real-World LLM-Generated Text Detection

arXiv cs.CL·Junchao Wu, Yefeng Liu, Chenyu Zhu, Hao Zhang, Zeyu Wu, Tianqi Shi, Yichao Du, Longyue Wang, Weihua Luo, Jinsong Su, Derek F. Wong

4d ago

·~2 min·5/18/2026·en·1

Quick Take

DetectRL-X benchmarks multilingual LLM-generated text detection across diverse real-world scenarios.

Key Points

Evaluates detectors across 8 dimensions in 8 languages.
Simulates real-world usage with various LLMs and writing styles.
Analyzes performance impacts from domain and modification strategies.

📖 Reader Mode

~2 min read

[Submitted on 15 May 2026]

Authors:Junchao Wu, Yefeng Liu, Chenyu Zhu, Hao Zhang, Zeyu Wu, Tianqi Shi, Yichao Du, Longyue Wang, Weihua Luo, Jinsong Su, Derek F. Wong

View PDF

Abstract:The effective detection and governance of Large Language Model (LLM) generated content has become increasingly critical due to the growing risk of misuse. Despite the impressive performance of existing detectors, their reliability and potential in multilingual, real-world scenarios remain largely underexplored. In this study, we introduce DetectRL-X, a comprehensive multilingual benchmark designed to evaluate advanced detectors across 8 dimensions. The benchmark encompasses 8 languages commonly used in commercial contexts and collects human-written texts from 6 domains highly susceptible to LLM misuse. To better aligned with real-world applications, We create LLM-generated texts using 4 popular commercial LLMs, and include typical AI-assisted writing operations such as polishing, expanding, and condensing to capture authentic usage patterns. Furthermore, we develop a multilingual framework for paraphrasing and perturbation attacks to simulate diverse human modifications and writing noise, enabling stress testing of detectors across languages. Experimental results on DetectRL-X reveal the strengths and limitations of current state-of-the-art detectors when applied to diverse linguistic resources. We further analyze how domains, generators, attack strategies, text length, and refinement operations influence performance in different languages, underscoring DetectRL-X as an effective benchmark for strengthening multilingual and language-specific detectors.

Comments:	ACL 2026 Main
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2605.15518 [cs.CL]
	(or arXiv:2605.15518v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2605.15518 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Junchao Wu [view email]
[v1] Fri, 15 May 2026 01:29:26 UTC (10,065 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

DetectRL-X: Towards Reliable Multilingual and Real-World LLM-Generated Text Detection

Quick Take

Key Points

📖 Reader Mode

Submission history

Want this in your inbox every morning?

More from arXiv cs.CL

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

Diagnosing Multi-step Reasoning Failures in Black-box LLMs via Stepwise Confidence Attribution

Comparing LLM and Fine-Tuned Model Performance on NVDRS Circumstance Extraction with Varying Prompt Complexity

Related in this space

From Prompts to Protocols: An AI Agent for Laboratory Automation

Agentic Trading: When LLM Agents Meet Financial Markets