WildPose: A Unified Framework for Robust Pose Estimation in the Wild

arXiv cs.CV·Jianhao Zheng, Liyuan Zhu, Zihan Zhu, Iro Armeni

3d ago

·~2 min·5/14/2026·en·2

Quick Take

WildPose is a unified framework for robust pose estimation in dynamic and static environments.

Key Points

Combines feedforward models with differentiable bundle adjustment.
Outperforms existing methods on various benchmarks.
Robust in dynamic environments while excelling in static datasets.

📖 Reader Mode

~2 min read

[Submitted on 12 May 2026]

View PDF HTML (experimental)

Abstract:Estimating camera pose in dynamic environments is a critical challenge, as most visual SLAM and SfM methods assume static scenes. While recent dynamic-aware methods exist, they are often not unified: semantic-based approaches are brittle, per-sequence optimization methods fail on short sequences, and other learned models may degrade on static-only scenes. We present WildPose, a unified monocular pose estimation framework that is robust in dynamic environments while maintaining state-of-the-art performance on static and low-ego-motion datasets. Our key insight is to connect two powerful paradigms in modern 3D vision: the rich perceptual frontend of feedforward models and the end-to-end optimization of differentiable bundle adjustment (BA). We achieve this with a 3D-aware update operator built on a frozen, pre-trained MASt3R feature backbone, together with a high-capacity motion mask detector that uses multi-level 3D-aware features from the same backbone. Extensive experiments show WildPose consistently outperforms prior methods across dynamic (Wild-SLAM, Bonn), static (TUM, 7-Scenes), and low-ego-motion (Sintel) benchmarks.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2605.12774 [cs.CV]
	(or arXiv:2605.12774v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2605.12774 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Jianhao Zheng [view email]
[v1] Tue, 12 May 2026 21:39:44 UTC (25,276 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

WildPose: A Unified Framework for Robust Pose Estimation in the Wild

Quick Take

Key Points

📖 Reader Mode

Submission history

More from arXiv cs.CV

CoReDiT: Spatial Coherence-Guided Token Pruning and Reconstruction for Efficient Diffusion Transformers

ProtoMedAgent: Multimodal Clinical Interpretability via Privacy-Aware Agentic Workflows

Diagnosing and Correcting Concept Omission in Multimodal Diffusion Transformers

Related in this space

Generative Floor Plan Design with LLMs via Reinforcement Learning with Verifiable Rewards

China bypasses US GPU bans with 1.54-exaflops 'LineShine' supercomputer — CPU-only monster packs 2.4 million Huawei-designed Armv9 cores