Cross-Prompt Generalization in Detecting AI-Generated Fake News Using Interpretable Linguistic Features
Quick Take
This study demonstrates that a random forest classifier can effectively detect AI-generated fake news across different prompts, achieving AUC values between 0.988 and 1.000. By analyzing interpretable linguistic features such as lexical diversity and readability, the model shows robust performance despite variations in prompting strategies, indicating stable properties of AI-generated text.
Key Points
- Classifier trained on one prompt tested successfully on others with high AUC scores.
- AI-generated texts show higher lexical diversity and lower emotional intensity.
- Performance remains strong despite variations in prompting strategies.
- Study utilizes three datasets combining AI-generated and real news articles.
- Feature-based approaches can enhance detection of AI-generated fake news.
Article Content
From source RSS / original summaryarXiv:2606. 04199v1 Announce Type: new Abstract: The increasing use of large language models has raised concerns about the spread of AI-generated fake news, particularly under varying prompting strategies. Most existing detection models are trained and evaluated under a single generation setting, leaving their ability to generalize across unseen prompts unclear.
In this study, we investigate cross-prompt generalization in fake news detection using three datasets of AI-generated articles produced under distinct prompts, combined with real news articles. We extract interpretable linguistic features capturing lexical diversity, readability, and emotion-based characteristics and evaluate a random forest classifier under a cross-prompt framework, where models trained on one prompt are tested on another.
Across all six train-test combinations, performance remains consistently high, with AUC values ranging from 0. 988 to 1. 000. Analysis of feature distributions shows that AI-generated text exhibits increased lexical diversity, reduced readability, and substantially lower emotional intensity compared to the overall dataset, with variations across prompts.
Despite these distributional shifts, the classifier maintains strong performance, indicating that these features capture stable properties of AI-generated text that generalize across prompting strategies. These findings suggest that feature-based approaches can provide robust detection of AI-generated fake news under prompt variability.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The REFLECT benchmark reveals that current LLM judges are unreliable, achieving below 55% accuracy in evaluating reasoning and evidence use, highlighting the need for improved evaluation methods for deep research agents.
