ECA: Efficient Continual Alignment for Open-Ended Image-to-Text Generation

arXiv cs.CV·Jiangtao Kong, Peijun Zhao, Chun-Fu Chen, Youngwook Do, Shaohan Hu, Tianyi Zhou, Huajie Shao

1d ago

·~2 min·6/12/2026·en·0

Quick Answer

This paper shows that The Efficient Continual Alignment (ECA) method enhances Open-ended Image-to-Text Generation by enabling models to adaptively align with evolving visual data while mitigating catastrophic forgetting.

Quick Take

The Efficient Continual Alignment (ECA) method enhances Open-ended Image-to-Text Generation by enabling models to adaptively align with evolving visual data while mitigating catastrophic forgetting. ECA employs a Mixture of Query module, Fisher Dynamic Expansion, and Dictionary Replay to retain knowledge without accessing previous raw data, showing significant performance improvements on newly constructed benchmarks.

Key Points

ECA introduces continual alignment for adapting visual data in OpenITG.
Utilizes Mixture of Query, Fisher Dynamic Expansion, and Dictionary Replay mechanisms.
Significantly reduces catastrophic forgetting in incremental learning tasks.
New benchmarks constructed reflect real-world scenarios for better evaluation.
Code and benchmarks available on GitHub for further research.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Article Content

From source RSS / original summary

arXiv:2606. 12633v1 Announce Type: new Abstract: Incremental Learning (IL) for Open-ended Image-to-Text Generation (OpenITG) enables models to continuously generate accurate, contextually relevant text for new images while preserving previously acquired knowledge. Unlike prior studies, this paper addresses a more practical scenario in which the predominant category of visual data shifts over time as environments evolve.

In this context, we introduce a new notion of continual alignment, which incrementally adapts the alignment module within pre-trained VLMs to preserve high-quality cross-modal representations. Based on this idea, we propose Efficient Continual Alignment (ECA), a novel exemplar-free IL approach for OpenITG. The key challenge is enabling the model to acquire new, task-specific features while minimizing interference with the established alignment without accessing raw data from previous tasks.

To address this, ECA employs three core mechanisms: a Mixture of Query (MoQ) module that adapts task-specific query tokens, a Fisher Dynamic Expansion (FeDEx) that dynamically expands model structure based on a Fisher Information Matrix (FIM)-based metric, and an embedding dictionary with Dictionary Replay (DR) to retain past knowledge. To evaluate ECA's performance, we construct four new IL OpenITG benchmarks that better reflect real-world scenarios.

Experimental results demonstrate that ECA significantly mitigates catastrophic forgetting and improves IL performance compared to baseline methods. Code and benchmarks are available at https://github. com/Snowball0823/ECA.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Shahrzad Esmat, Chaunte W. Lacewell, Sameh Gobriel, Nilesh Jain, Ali Jannesari

1w ago

FeaturedOriginal

LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval

AI Summary

A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.

#LLM #Agent #Inference #AI Startup