Turbulence-Robust Dynamic Object Segmentation with Multi-Signal Priors and SAM2 Refinement
Quick Take
The proposed method for the CVPR 2026 UG2+ Challenge achieves dynamic object segmentation in turbulence using a training-free pipeline that integrates RAFT, DINOv2, ViBe, and SAM2 for mask refinement, yielding 0.425041 mIoU and 0.457206 mDice on the leaderboard. This approach circumvents the need for model training, making it robust against atmospheric turbulence effects.
Key Points
- Utilizes a training-free multi-signal segmentation pipeline for dynamic object segmentation.
- Combines RAFT for motion estimation and DINOv2 for semantic priors.
- Achieved leaderboard scores of 0.425041 mIoU and 0.457206 mDice.
- Employs SAM2 for box-prompt mask refinement without end-to-end training.
- Designed to handle severe atmospheric turbulence effects on visibility.
Article Content
From source RSS / original summaryarXiv:2605. 29292v1 Announce Type: new Abstract: This technical report presents our solution for the CVPR 2026 UG2+ Challenge Track 3: Dynamic Object Segmentation in Turbulence (DOST). We design a training-free multi-signal segmentation pipeline that combines pretrained motion estimation, self-supervised semantic priors, background anomaly modeling, manually calibrated proposal fusion, and SAM2-based mask refinement.
The method uses RAFT for dense motion responses, DINOv2 for semantic objectness priors, ViBe for training-free background modeling, and pretrained SAM2 for box-prompt mask refinement. Instead of optimizing an end-to-end segmentation network, our system operates entirely in inference mode. This design is suitable for the DOST setting, where severe atmospheric turbulence produces pseudo-motion, blur, and intermittent target visibility, making a single motion cue unreliable.
The final submitted masks are evaluated by the official leaderboard, which reports 0. 425041 mIoU and 0. 457206 mDice. Since no task-specific model training or fine-tuning is performed, stronger learned temporal association, adaptive proposal selection, or task-specific adaptation may further improve the system.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →Evi-Steer: Learning to Steer Biomedical Vision-Language Models through Efficient and Generalizable Evidential Tuning
Evi-Steer introduces a novel evidential tuning framework for BiomedCLIP, achieving 0.11% parameter updates while enhancing uncertainty-aware fine-tuning. It outperforms state-of-the-art methods across 15 biomedical imaging datasets, proving effective in few-shot learning and domain shifts for clinical applications.
