AEyeDE: An Attention-Based Attribution Framework for AI-Generated Text Detection

arXiv cs.CL·Aria Nourbakhsh, Adelaide Danilov, Christoph Schommer, Salima Lamsiyah

2h ago

·~1 min·6/2/2026·en·0

Quick Take

AEyeDE introduces an attention-based framework for detecting AI-generated text, outperforming traditional text-only methods across various benchmarks. Utilizing attention matrices from a proxy Transformer model, it shows robust performance in generator-specific detection and cross-dataset transfer, providing a novel and interpretable signal for authorship attribution.

Key Points

AEyeDE uses attention-based attribution matrices for detecting AI-generated text.
Outperforms text-only baselines in encoder-decoder translation settings.
Demonstrates strong performance in generator-specific detection.
Shows robustness against cross-dataset transfer and spelling variations.
Attention maps reveal consistent local structures differentiating human and AI text.

Article Content

From source RSS / original summary

arXiv:2606. 00016v1 Announce Type: new Abstract: Detecting AI-generated text is becoming increasingly challenging as modern language models approach human-level fluency and can evade detectors that rely on surface statistics or likelihood-based signals. We propose \textsc{AEyeDE}, an attribution-driven approach to human-AI authorship detection that leverages model attention as a discriminative signal.

Specifically, we extract attention-based attribution matrices for both human- and AI-generated text using a \emph{proxy} Transformer model with white-box access and train a lightweight Convolutional Neural Network to learn representations from these attribution maps. Across encoder-decoder translation settings, our method consistently outperforms a text-only baseline.

In decoder-only settings, it performs strongly in generator-specific detection, remains competitive on standard benchmarks, and shows robustness under cross-dataset transfer and alternative-spelling perturbations. We further show that attention maps exhibit recurring local structures whose relative frequencies differ consistently between human- and AI-generated text across datasets and proxy models.

These findings suggest that attention-based attribution maps provide a complementary and interpretable signal for AI-generated text detection. We will make the code publicly available to support future research.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CL

See more →

arXiv cs.CL·Leyao Wang, Yanan He, Peng Chen, Asaf Yehudai, Yixin Liu, Rex Ying, Michal Shmueli-Scheuer, Arman Cohan

1w ago

FeaturedOriginal

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

AI Summary

The REFLECT benchmark reveals that current LLM judges are unreliable, achieving below 55% accuracy in evaluating reasoning and evidence use, highlighting the need for improved evaluation methods for deep research agents.

#LLM #Agent #Inference #Policy