Perception, Verdict, and Evolution: Hindsight-Driven Self-Refining Forensics Agent for AI-Generated Image Detection

arXiv cs.CV·Yangjun Wu, Keyu Yan, Yu Liu, Jingren Zhou, Fei Huang, Rong Zhang, Zhou Zhao, Fei Wu

5d ago

·~2 min·6/26/2026·en·2

Quick Answer

ForeAgent is a novel forensics framework for AI-generated image detection, achieving 82.18% accuracy on the Chameleon benchmark, outperforming AIDE by 16.41%.

Quick Take

ForeAgent is a novel forensics framework for AI-generated image detection, achieving 82.18% accuracy on the Chameleon benchmark, outperforming AIDE by 16.41%. It employs a Perception-Verdict architecture and a Hindsight-Driven Self-Refining strategy for continual self-improvement, demonstrating superior reasoning consistency compared to GPT-5.

Key Points

ForeAgent uses a Perception-Verdict architecture to integrate multi-view cues.
The framework achieves 93.3% mean accuracy on the AIGCDetect-Benchmark across 16 generators.
It continuously evolves through a dual-expert quality gating module for high-quality sample curation.
Extensive experiments confirm ForeAgent's state-of-the-art performance in deepfake detection.
The model reflects on failure cases to enhance reasoning quality over time.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

📖 Reader Mode

~2 min read

[Submitted on 25 Jun 2026]

View PDF HTML (experimental)

Abstract:The rapid advancement of generative models presents a significant challenge to existing deepfake detection methods, particularly given the widespread dissemination of highly realistic AI-generated images. Although Multimodal Large Language Models (MLLMs) show strong potential for this task, existing approaches suffer from two key limitations: insufficient sensitivity to fine-grained forensic artifacts and reliance on static synthetic supervision from frontier models, leading to limited flexibility and high-cost. To address these issues, we propose ForeAgent, an agentic forensics framework for AI-generated image detection with iterative self-evolution. First, ForeAgent adopts a Perception-Verdict architecture that aggregates multi-view cues spanning semantic, spatial, and frequency-domain features, and leverages an MLLM as a verdict module to fuse these signals for a logical-grounded verdict. Second, to enable continual self-improvement, we introduce a Hindsight-Driven Self-Refining strategy following a Sampling-Reflection-Evolution paradigm. The agent performs inference rollouts on training instances. Guided by ground-truth labels as hindsight, it reflects on failure cases and low-quality reasoning trajectories to regenerate higher-quality reasoning traces. These synthesized samples are then strictly filtered through a dual-expert quality gating module. ForeAgent continuously evolves via fine-tuning on self-curated high-quality samples. Extensive experiments demonstrate that ForeAgent achieves state-of-the-art performance on the Chameleon benchmark, reaching 82.18% accuracy (+16.41% over AIDE), and achieves 93.3% mean accuracy on AIGCDetect-Benchmark across 16 generators. In addition, external evaluation shows that ForeAgent produces more consistent and causally grounded reasoning compared to GPT-5 and GPT-5-mini.

Comments:	10 pages
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2606.26552 [cs.CV]
	(or arXiv:2606.26552v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2606.26552 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Yangjun Wu [view email]
[v1] Thu, 25 Jun 2026 02:59:33 UTC (534 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Shahrzad Esmat, Chaunte W. Lacewell, Sameh Gobriel, Nilesh Jain, Ali Jannesari

3w ago

FeaturedOriginal

LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval

AI Summary

A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.

#LLM #Agent #Inference #AI Startup