A Systematic Analysis of Linguistic Features in AI-Generated Text Detection Across Domains and Models
Quick Take
A large-scale study reveals that 284 linguistic features can effectively distinguish AI-generated text from human-written text across 27 LLMs and ten domains. While many indicators are context-dependent, measures of lexical richness consistently serve as robust signals, enhancing interpretability for non-experts.
Key Points
- Study assesses 284 linguistic features across 27 LLMs and ten text domains.
- Classifiers based on linguistic features reliably distinguish AI and human text.
- Lexical richness measures are robust across different models and domains.
- Findings address gaps in understanding AI-generated text characteristics.
- Results support more reliable analyses of AI-generated language.
Article Excerpt
From source RSS / original summaryarXiv:2606. 04177v1 Announce Type: new Abstract: Interpretable linguistic features offer a promising approach for explaining why a given text appears machine-generated, particularly for non-expert users. However, existing findings on which features reliably indicate LLM-generated text remain fragmented across feature sets, models, and text domains. To address this gap, we conduct a large-scale empirical study assessing the robustness of linguistic signals for characterizing AI-generated text.
Our analysis covers 284 interpretable linguistic features across outputs from 27 LLMs and ten text domains under cross-model and cross-domain generalization settings. We show that classifiers based solely on linguistic features can reliably distinguish AI-generated from human-written text. However, many previously proposed indicators prove strongly context-dependent, with the exception of measures of lexical richness, which remain robust signals across model families and text domains.
These results demonstrate which linguistic signals generalize across contexts and provide a foundation for more reliable, interpretable analyses of AI-generated language.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The REFLECT benchmark reveals that current LLM judges are unreliable, achieving below 55% accuracy in evaluating reasoning and evidence use, highlighting the need for improved evaluation methods for deep research agents.