WildPose: A Unified Framework for Robust Pose Estimation in the Wild
Quick Take
WildPose is a unified framework for robust pose estimation in dynamic and static environments.
Key Points
- Combines feedforward models with differentiable bundle adjustment.
- Outperforms existing methods on various benchmarks.
- Robust in dynamic environments while excelling in static datasets.
📖 Reader Mode
~2 min readAbstract:Estimating camera pose in dynamic environments is a critical challenge, as most visual SLAM and SfM methods assume static scenes. While recent dynamic-aware methods exist, they are often not unified: semantic-based approaches are brittle, per-sequence optimization methods fail on short sequences, and other learned models may degrade on static-only scenes. We present WildPose, a unified monocular pose estimation framework that is robust in dynamic environments while maintaining state-of-the-art performance on static and low-ego-motion datasets. Our key insight is to connect two powerful paradigms in modern 3D vision: the rich perceptual frontend of feedforward models and the end-to-end optimization of differentiable bundle adjustment (BA). We achieve this with a 3D-aware update operator built on a frozen, pre-trained MASt3R feature backbone, together with a high-capacity motion mask detector that uses multi-level 3D-aware features from the same backbone. Extensive experiments show WildPose consistently outperforms prior methods across dynamic (Wild-SLAM, Bonn), static (TUM, 7-Scenes), and low-ego-motion (Sintel) benchmarks.
| Subjects: | Computer Vision and Pattern Recognition (cs.CV) |
| Cite as: | arXiv:2605.12774 [cs.CV] |
| (or arXiv:2605.12774v1 [cs.CV] for this version) | |
| https://doi.org/10.48550/arXiv.2605.12774 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Jianhao Zheng [view email]
[v1]
Tue, 12 May 2026 21:39:44 UTC (25,276 KB)
— Originally published at arxiv.org
More from arXiv cs.CV
See more →CoReDiT: Spatial Coherence-Guided Token Pruning and Reconstruction for Efficient Diffusion Transformers
CoReDiT enhances Diffusion Transformers by optimizing token pruning for efficiency and quality.
