TaskTok: Delving into Task Tokens for Task-driven Image Restoration
Quick Answer
TaskTok introduces a framework for Task-Driven Image Restoration (TDIR) that selectively refines task-relevant tokens, improving computational efficiency and performance in image classification, semantic segmentation, and object detection.
Quick Take
TaskTok introduces a framework for Task-Driven Image Restoration (TDIR) that selectively refines task-relevant tokens, improving computational efficiency and performance in image classification, semantic segmentation, and object detection. By focusing on unevenly distributed visual information, TaskTok enhances task performance significantly while minimizing unnecessary updates to latent tokens.
Key Points
- TaskTok selectively restores task-relevant tokens for improved performance.
- Framework shows significant efficiency gains in image restoration tasks.
- Extensive experiments validate TaskTok's effectiveness across multiple vision tasks.
- Source code available on GitHub for further research and development.
- Focus on index-wise specialization in latent token space enhances results.
Paper Resources
📖 Reader Mode
~2 min readAbstract:While traditional image restoration focuses on perceptual quality, Task-Driven Image Restoration (TDIR) aims to maximize the performance of downstream high-level vision tasks. Recent approaches leveraging generative priors have shown promise for TDIR; however, they typically suffer from computational inefficiency and potential semantic alteration by indiscriminately updating all latent tokens. In this paper, we posit that not all visual information is equally important for machine perception. Through an analysis of the latent token space, we observe that task-relevant cues are unevenly distributed across the token sequence, exhibiting index-wise specialization. This suggests that selectively refining a subset of tokens can be sufficient for task-driven objectives. Leveraging this insight, we propose TaskTok, a novel framework that selectively restores only task-relevant tokens via a learnable token switch and a lightweight token refinement module. Extensive experiments across image classification, semantic segmentation, and object detection demonstrate that TaskTok significantly enhances task performance with high computational efficiency. The source code is available at this https URL
| Comments: | ECCV 2026 |
| Subjects: | Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV) |
| Cite as: | arXiv:2606.26615 [cs.CV] |
| (or arXiv:2606.26615v1 [cs.CV] for this version) | |
| https://doi.org/10.48550/arXiv.2606.26615 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Hongjae Lee [view email]
[v1]
Thu, 25 Jun 2026 05:20:01 UTC (42,275 KB)
— Originally published at arxiv.org
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval
A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.