FreeStory: Training-Free Character Consistency for Free-Form Visual Storytelling
Quick Answer
FreeStory introduces a training-free framework for visual storytelling that enhances character consistency without structured prompts.
Quick Take
FreeStory introduces a training-free framework for visual storytelling that enhances character consistency without structured prompts. By utilizing entity-grounded feature reuse, it outperforms existing methods on structured benchmarks and maintains stronger consistency in free-form prompts. The new benchmark, FreeStoryBench, supports both single and multi-character narratives.
Key Points
- FreeStory achieves state-of-the-art performance among training-free methods.
- It reformulates character consistency using dynamic character masks and feature matching.
- The framework allows for natural storytelling without repeated character descriptions.
- FreeStoryBench benchmark includes both single and multi-character story scenarios.
- Experiments show improved consistency over baselines under free-form prompts.
Paper Resources
📖 Reader Mode
~2 min readAbstract:Visual storytelling aims to generate image sequences that are both aligned with narrative prompts and consistent in character appearance across images. Recent training-free methods improve character consistency by reusing attention features, but rely on structured prompts where full character descriptions are repeated in every prompt. This assumption simplifies the task but deviates from natural storytelling, where characters are typically introduced once and later referred to using pronouns or type-based expressions. We propose \textbf{FreeStory}, a training-free framework that reformulates character consistency under free-form prompts as entity-grounded feature reuse. Our method associates reference mentions with their corresponding character descriptions and combines dynamic character masks, correspondence-aware feature matching, key-value injection, and query blending to preserve identity while retaining generation diversity. We also introduce \textbf{FreeStoryBench}, a benchmark for this setting that includes both single- and multi-character stories. Experiments show that FreeStory achieves state-of-the-art performance among training-free methods on structured benchmarks and stronger overall consistency over baselines under free-form prompts.
| Subjects: | Computer Vision and Pattern Recognition (cs.CV) |
| Cite as: | arXiv:2606.25079 [cs.CV] |
| (or arXiv:2606.25079v1 [cs.CV] for this version) | |
| https://doi.org/10.48550/arXiv.2606.25079 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Sibo Dong [view email]
[v1]
Tue, 23 Jun 2026 18:37:31 UTC (16,592 KB)
— Originally published at arxiv.org
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval
A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.