Two-Stage Multimodal Framework for Emotion Mimicry Intensity Prediction
Quick Take
A two-stage multimodal framework predicts emotion mimicry intensity using text, audio, and visual data.
Key Points
- Predicts six emotion intensity dimensions from video clips.
- Combines textual, acoustic, and visual representations.
- Achieved third place in the EMI challenge with 0.57 correlation.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →GeoSym127K: Scalable Symbolically-verifiable Synthesis for Multimodal Geometric Reasoning
GeoSym127K introduces a scalable neuro-symbolic framework for enhanced geometric reasoning in multimodal models.