FreeStory: Training-Free Character Consistency for Free-Form Visual Storytelling

arXiv cs.CV·Sibo Dong, Ismail Shaheen, Sarah Adel Bargal

6d ago

·~2 min·6/25/2026·en·0

Quick Answer

FreeStory introduces a training-free framework for visual storytelling that enhances character consistency without structured prompts.

Quick Take

FreeStory introduces a training-free framework for visual storytelling that enhances character consistency without structured prompts. By utilizing entity-grounded feature reuse, it outperforms existing methods on structured benchmarks and maintains stronger consistency in free-form prompts. The new benchmark, FreeStoryBench, supports both single and multi-character narratives.

Key Points

FreeStory achieves state-of-the-art performance among training-free methods.
It reformulates character consistency using dynamic character masks and feature matching.
The framework allows for natural storytelling without repeated character descriptions.
FreeStoryBench benchmark includes both single and multi-character story scenarios.
Experiments show improved consistency over baselines under free-form prompts.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

📖 Reader Mode

~2 min read

[Submitted on 23 Jun 2026]

View PDF HTML (experimental)

Abstract:Visual storytelling aims to generate image sequences that are both aligned with narrative prompts and consistent in character appearance across images. Recent training-free methods improve character consistency by reusing attention features, but rely on structured prompts where full character descriptions are repeated in every prompt. This assumption simplifies the task but deviates from natural storytelling, where characters are typically introduced once and later referred to using pronouns or type-based expressions. We propose \textbf{FreeStory}, a training-free framework that reformulates character consistency under free-form prompts as entity-grounded feature reuse. Our method associates reference mentions with their corresponding character descriptions and combines dynamic character masks, correspondence-aware feature matching, key-value injection, and query blending to preserve identity while retaining generation diversity. We also introduce \textbf{FreeStoryBench}, a benchmark for this setting that includes both single- and multi-character stories. Experiments show that FreeStory achieves state-of-the-art performance among training-free methods on structured benchmarks and stronger overall consistency over baselines under free-form prompts.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2606.25079 [cs.CV]
	(or arXiv:2606.25079v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2606.25079 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Sibo Dong [view email]
[v1] Tue, 23 Jun 2026 18:37:31 UTC (16,592 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Shahrzad Esmat, Chaunte W. Lacewell, Sameh Gobriel, Nilesh Jain, Ali Jannesari

3w ago

FeaturedOriginal

LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval

AI Summary

A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.

#LLM #Agent #Inference #AI Startup