ViASNet: A Video Ad Saliency Network for Predicting Dynamic Saliency and Viewer Engagement

arXiv cs.CV·Jianping Ye, Michel Wedel

1d ago

·~1 min·5/29/2026·en·1

Quick Take

ViASNet, a deep dynamic saliency prediction model based on 3D U-Net, enhances viewer engagement in short-form video ads by analyzing eye fixation patterns. Tested on 151 ads with eye-tracking data, it identifies ineffective scenes and accelerates ad design through automated saliency mapping.

Key Points

ViASNet utilizes 3D U-Net architecture for dynamic saliency prediction.
Model tested on 151 video ads with eye-tracking from approximately 20 viewers each.
Entropy of saliency maps helps identify ads that fail to engage viewers.
Automated systems like ViASNet significantly speed up ad design processes.
Ablation experiments reveal critical factors influencing model performance.

Article Content

From source RSS / original summary

arXiv:2605. 29302v1 Announce Type: new Abstract: The digital media landscape has seen a pervasive shift toward short-form video advertising on TV, social media and e-commerce platforms. The present study focuses on deep saliency prediction for short-form video advertising. Deep saliency models have been used to generate predictions of human eye fixation patterns with the purpose of enhancing user interaction with digital technology and optimizing its design.

For video ads, dynamic saliency maps capture where and when viewers are looking, revealing why video ads are effective, and how their content should be optimized. We develop and test a new deep dynamic saliency prediction model called ViASNet (Video Ad Saliency Network), which has an architecture founded on the 3D U-Net, and accommodates the influence of audio and the semantic meaning of scenes.

We assess the model's performance on 151 video ads, each seen by about 20 viewers wile their eye movements were tracked, and explore the critical factors influencing model performance through ablation experiments. We calculate the entropy of the predicted saliency maps frame-by-frame as a diagnostic tool to identify ads and scenes that fail to engage viewers, and illustrate its use on test data of 15 unseen ads.

Our study reveals that ad design and testing can be sped up considerably through automated systems built on deep saliency models such as ViASNet.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Taha Koleilat, Hassan Rivaz, Yiming Xiao

3d ago

FeaturedOriginal

Evi-Steer: Learning to Steer Biomedical Vision-Language Models through Efficient and Generalizable Evidential Tuning

AI Summary

Evi-Steer introduces a novel evidential tuning framework for BiomedCLIP, achieving 0.11% parameter updates while enhancing uncertainty-aware fine-tuning. It outperforms state-of-the-art methods across 15 biomedical imaging datasets, proving effective in few-shot learning and domain shifts for clinical applications.

#AI Coding #Inference #Open Source