ViASNet: A Video Ad Saliency Network for Predicting Dynamic Saliency and Viewer Engagement
Quick Take
ViASNet, a deep dynamic saliency prediction model based on 3D U-Net, enhances viewer engagement in short-form video ads by analyzing eye fixation patterns. Tested on 151 ads with eye-tracking data, it identifies ineffective scenes and accelerates ad design through automated saliency mapping.
Key Points
- ViASNet utilizes 3D U-Net architecture for dynamic saliency prediction.
- Model tested on 151 video ads with eye-tracking from approximately 20 viewers each.
- Entropy of saliency maps helps identify ads that fail to engage viewers.
- Automated systems like ViASNet significantly speed up ad design processes.
- Ablation experiments reveal critical factors influencing model performance.
Article Content
From source RSS / original summaryarXiv:2605. 29302v1 Announce Type: new Abstract: The digital media landscape has seen a pervasive shift toward short-form video advertising on TV, social media and e-commerce platforms. The present study focuses on deep saliency prediction for short-form video advertising. Deep saliency models have been used to generate predictions of human eye fixation patterns with the purpose of enhancing user interaction with digital technology and optimizing its design.
For video ads, dynamic saliency maps capture where and when viewers are looking, revealing why video ads are effective, and how their content should be optimized. We develop and test a new deep dynamic saliency prediction model called ViASNet (Video Ad Saliency Network), which has an architecture founded on the 3D U-Net, and accommodates the influence of audio and the semantic meaning of scenes.
We assess the model's performance on 151 video ads, each seen by about 20 viewers wile their eye movements were tracked, and explore the critical factors influencing model performance through ablation experiments. We calculate the entropy of the predicted saliency maps frame-by-frame as a diagnostic tool to identify ads and scenes that fail to engage viewers, and illustrate its use on test data of 15 unseen ads.
Our study reveals that ad design and testing can be sped up considerably through automated systems built on deep saliency models such as ViASNet.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →Evi-Steer: Learning to Steer Biomedical Vision-Language Models through Efficient and Generalizable Evidential Tuning
Evi-Steer introduces a novel evidential tuning framework for BiomedCLIP, achieving 0.11% parameter updates while enhancing uncertainty-aware fine-tuning. It outperforms state-of-the-art methods across 15 biomedical imaging datasets, proving effective in few-shot learning and domain shifts for clinical applications.