Hitting a Moving Target: Test-Time Adaptation for AI Text Detection under Continual Distribution Shift
Quick Answer
The proposed test-time adaptation (TTA) approach significantly enhances AI text detection under distribution shifts, achieving 90.5% detection of adversarial AI-generated text compared to just 24.1% by Pangram.
Quick Take
The proposed test-time adaptation (TTA) approach significantly enhances AI text detection under distribution shifts, achieving 90.5% detection of adversarial AI-generated text compared to just 24.1% by Pangram. This method leverages inference-time homogeneity and semi-supervised learning to address vulnerabilities in existing models, which fail during shifts in human and AI-generated writing. The code is publicly available for further research.
Key Points
- Test-time adaptation (TTA) leverages unlabeled samples for improved detection.
- Existing models fail under adversarial and natural distribution shifts.
- Pangram detects only 24.1% of adversarial AI-generated text.
- TTA achieves 90.5% detection rate in the same scenario.
- Code for TTA is available on GitHub for public use.
Paper Resources
Article Content
From source RSS / original summaryarXiv:2606. 25152v1 Announce Type: new Abstract: Deployed approaches for AI text detection often rely on training-time access to labeled datasets of both human-written and AI-generated text. This approach is vulnerable to three types of distribution shifts that occur continually post-deployment, and for which labeled data is often unavailable: adversarial humanization, new LLMs being released, and temporal drift in human writing.
Simultaneously, existing approaches do not leverage a key signal of LLM usage: inference-time homogeneity. We propose a test-time adaptation (TTA) approach, using semi-supervised learning, that adapts to distribution shifts by leveraging homogeneity among unlabeled samples observed at inference time.
Empirically, we find that state-of-the-art supervised detectors systematically fail when they encounter distribution shifts in AI-generated and human writing, both adversarial and natural, while test-time adaptation with semi-supervised learning is largely robust; e. g. , the commercial model Pangram detects just 24. 1% of our adversarial AI-generated text, compared to 90. 5% for our test-time approach. We establish that test-time adaptation is a promising framework for AI text detection in the wild.
We publicly release our code (which includes code for model training, evaluation, and plots) at https://github. com/kkr36/llm_detection.
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Quantifying Prior Dominance in Systems
The study introduces the Normalized Context Utilization (NCU) metric to evaluate Retrieval-Augmented Generation (RAG) systems, revealing that Small Language Models (SLMs) outperform larger models in factual extraction. The findings indicate that traditional scaling laws yield diminishing returns, with a commercial API frequently failing against adversarial evidence due to systemic confidence collapse.