Navigating User Behavior toward Personalized Multimodal Generation

arXiv cs.AI·Hengji Zhou, Yufeng Liu, Ye Liu, Yong Xu, Lianghao Xia, Liqiang Nie

1w ago

·~2 min·6/24/2026·en·0

Quick Answer

Quick Take

NaviGen enhances personalized multimodal content generation by transforming user interaction history into executable instructions, addressing the challenges of behavior encoding and instruction writing. The model improves image and video generation across various domains, yielding more relevant and visually generatable outputs.

Key Points

NaviGen uses dual identifiers for behavioral and semantic representation.
Implements a two-stage SFT+RL pipeline for preference reasoning.
Demonstrated improvements in next-item prediction across domains.
Enhances the specificity and relevance of generated instructions.
Code available at GitHub for further research and development.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

📖 Reader Mode

~2 min read

[Submitted on 23 Jun 2026]

View PDF HTML (experimental)

Abstract:Modern AIGC pipelines deliver high-fidelity images and videos but presuppose a well-formed creation instruction, while end users rarely articulate visual details, leaving generators misaligned with user demand. We study personalized content generation, which turns a user's interaction history into an executable instruction for downstream synthesis, and identify two obstacles: behavior must be encoded in a form legible to language reasoning, and the model must acquire instruction-writing skill absent from both pretraining and behavior data. We propose NaviGen, which represents each item with a dual identifier coupling a collaborative code and a textual code as a behavioral substrate and a semantic bridge in one token stream. On this representation, a two-stage SFT+RL pipeline first distills preference reasoning and instruction writing from evolutionarily searched supervision, then aligns generation with user intent through hierarchical and self-consistent rewards. Experiments across product, game, and short-video domains show that NaviGen improves personalized image and video generation, strengthens next-item prediction, and yields more specific, relevant, and visually generatable instructions. Our code is anonymously released at: this https URL.

Comments:	16 pages, 15 figures, 5 tables. Code is available at this https URL
Subjects:	Artificial Intelligence (cs.AI)
Cite as:	arXiv:2606.24196 [cs.AI]
	(or arXiv:2606.24196v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2606.24196 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Hengji Zhou [view email]
[v1] Tue, 23 Jun 2026 06:31:21 UTC (3,769 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.AI

See more →

arXiv cs.AI·Binghai Wang, Chenlong Zhang, Dayiheng Liu, Jiajun Zhang, Jiawei Chen, Mouxiang Chen, Rongyao Fang, Siyuan Zhang, Xuwu Wang, Yuheng Jing, Zeyao Ma, Zeyu Cui

5d ago

FeaturedOriginal

The Verification Horizon: No Silver Bullet for Coding Agent Rewards

AI Summary

As coding agents evolve, verifying solutions becomes more challenging than generating them, necessitating a focus on scalable, faithful, and robust verification methods. The study reveals that no fixed reward function can sustain effectiveness as model capabilities advance, emphasizing the need for verification to evolve alongside solution generation.

#Agent #AI Coding #Inference #Policy