Inline Critic Steers Image Editing

arXiv cs.CV·Weitai Kang, Xiaohang Zhan, Yizhou Wang, Mang Tik Chiu, Jason Kuen, Kangning Liu, Yan Yan

3d ago

·~2 min·5/14/2026·en·1

Quick Take

Inline Critic enhances image editing by refining model predictions during the forward pass.

Key Points

Refinement occurs where models struggle during image editing.
Introduces a learnable token for intermediate critique.
Achieves state-of-the-art results on multiple benchmarks.

📖 Reader Mode

~2 min read

[Submitted on 12 May 2026]

View PDF HTML (experimental)

Abstract:Instruction-based image editing exhibits heterogeneous difficulty not only across cases but also across regions of an image, motivating refinement approaches that allocate correction to where the model struggles. Existing refinement signals arrive late, after a fully generated image or a completed denoising step. We ask whether such a signal can act within an ongoing forward pass. To investigate this, we probe a frozen image-editing model and find that although generation capability emerges only in the last few layers, the error pattern is already set in early layers (rank correlation \r{ho} = 0.83 with the final-layer error map). Based on this, we introduce Inline Critic, a learnable token that critiques a frozen model's predictions at its intermediate layers and steers its hidden states to refine generation during the forward pass. A three-stage recipe is proposed to stabilize the training from learning how to critique to steering generation. As a result, we achieve state of the art on GEdit-Bench (7.89), a +9.4 gain on RISEBench over the same backbone, and the strongest open-source result on KRIS-Bench (81.92, surpassing GPT-4o). We further provide analyses showing that the critic genuinely shapes the model's attention and prediction updates at subsequent layers.

Comments:	9 pages
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2605.12724 [cs.CV]
	(or arXiv:2605.12724v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2605.12724 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Weitai Kang [view email]
[v1] Tue, 12 May 2026 20:29:26 UTC (7,833 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

Inline Critic Steers Image Editing

Quick Take

Key Points

📖 Reader Mode

Submission history

More from arXiv cs.CV

CoReDiT: Spatial Coherence-Guided Token Pruning and Reconstruction for Efficient Diffusion Transformers

ProtoMedAgent: Multimodal Clinical Interpretability via Privacy-Aware Agentic Workflows

Diagnosing and Correcting Concept Omission in Multimodal Diffusion Transformers

Related in this space

Invisible Orchestrators Suppress Protective Behavior and Dissociate Power-Holders: Safety Risks in Multi-Agent LLM Systems

Generative Floor Plan Design with LLMs via Reinforcement Learning with Verifiable Rewards

Distribution-Aware Algorithm Design with LLM Agents