Turbulence-Robust Dynamic Object Segmentation with Multi-Signal Priors and SAM2 Refinement

arXiv cs.CV·Bolian Peng, Ying Tang, Xu Liu, Long Sun, Xiaoqiang Lu

1d ago

·~1 min·5/29/2026·en·0

Quick Take

The proposed method for the CVPR 2026 UG2+ Challenge achieves dynamic object segmentation in turbulence using a training-free pipeline that integrates RAFT, DINOv2, ViBe, and SAM2 for mask refinement, yielding 0.425041 mIoU and 0.457206 mDice on the leaderboard. This approach circumvents the need for model training, making it robust against atmospheric turbulence effects.

Key Points

Utilizes a training-free multi-signal segmentation pipeline for dynamic object segmentation.
Combines RAFT for motion estimation and DINOv2 for semantic priors.
Achieved leaderboard scores of 0.425041 mIoU and 0.457206 mDice.
Employs SAM2 for box-prompt mask refinement without end-to-end training.
Designed to handle severe atmospheric turbulence effects on visibility.

Article Content

From source RSS / original summary

arXiv:2605. 29292v1 Announce Type: new Abstract: This technical report presents our solution for the CVPR 2026 UG2+ Challenge Track 3: Dynamic Object Segmentation in Turbulence (DOST). We design a training-free multi-signal segmentation pipeline that combines pretrained motion estimation, self-supervised semantic priors, background anomaly modeling, manually calibrated proposal fusion, and SAM2-based mask refinement.

The method uses RAFT for dense motion responses, DINOv2 for semantic objectness priors, ViBe for training-free background modeling, and pretrained SAM2 for box-prompt mask refinement. Instead of optimizing an end-to-end segmentation network, our system operates entirely in inference mode. This design is suitable for the DOST setting, where severe atmospheric turbulence produces pseudo-motion, blur, and intermittent target visibility, making a single motion cue unreliable.

The final submitted masks are evaluated by the official leaderboard, which reports 0. 425041 mIoU and 0. 457206 mDice. Since no task-specific model training or fine-tuning is performed, stronger learned temporal association, adaptive proposal selection, or task-specific adaptation may further improve the system.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Taha Koleilat, Hassan Rivaz, Yiming Xiao

3d ago

FeaturedOriginal

Evi-Steer: Learning to Steer Biomedical Vision-Language Models through Efficient and Generalizable Evidential Tuning

AI Summary

Evi-Steer introduces a novel evidential tuning framework for BiomedCLIP, achieving 0.11% parameter updates while enhancing uncertainty-aware fine-tuning. It outperforms state-of-the-art methods across 15 biomedical imaging datasets, proving effective in few-shot learning and domain shifts for clinical applications.

#AI Coding #Inference #Open Source

Turbulence-Robust Dynamic Object Segmentation with Multi-Signal Priors and SAM2 Refinement

Quick Take

Key Points

Article Content

Want this in your inbox every morning?

More from arXiv cs.CV

Evi-Steer: Learning to Steer Biomedical Vision-Language Models through Efficient and Generalizable Evidential Tuning

Deep Learning-Based Automated Quantification of TIMI Myocardial Perfusion Frame Count (DL-TMPFC) from Coronary Angiography: A Novel Framework for Rapid Assessment of Microvascular Dysfunction

GeoSym127K: Scalable Symbolically-verifiable Synthesis for Multimodal Geometric Reasoning

Related in this space

The Importance of Out-of-Band Metadata for Safe Autonomous Agents: The Redpanda Agentic Data Plane

TorqueAGI Announces Collaborations with NVIDIA, John Deere, and Dexterity to Advance Physical AI for Enterprise-Grade Robots

FORT Robotics Acquires Mapless AI to Expand Its Trust Platform with Remote Supervision and Active Safety Capabilities