WristCompass: Kinematic Coupling as a Learnable Visual Concept for Ego-Camera Orientation

arXiv cs.CV·Varun Nair, Vidyut Baradwaj, Jiahang He, Anya Singh, Jai Relan, Cabrel Happi

5h ago

·~1 min·6/1/2026·en·0

Quick Take

WristCompass introduces kinematic coupling dynamics for ego-camera orientation recovery, outperforming traditional scene geometry methods. Trained on tabletop manipulation, it achieves a median geodesic error of 14.3° on Epic Kitchens, approaching the performance of a 1B-parameter model with only 200K GRU parameters.

Key Points

WristCompass leverages kinematic coupling for ego-camera orientation recovery.
Outperforms VGGT model, which scores worse than constant predictors on TACO benchmark.
Achieves 14.3° median geodesic error on Epic Kitchens cooking video.
Utilizes 4D inter-wrist features instead of 126D full hand keypoints.
Zero-shot transfer capability across datasets due to anatomical foundations.

Article Content

From source RSS / original summary

arXiv:2605. 30671v1 Announce Type: new Abstract: Recovering ego-camera orientation from manipulation video is a prerequisite for disentangling hand motion from camera motion, a key step in imitation learning from egocentric demonstrations. The obvious approach, inferring orientation from scene geometry, fails when hands occlude the frame: VGGT, a 1B-parameter scene reconstruction model, scores worse than a constant predictor on the TACO benchmark.

We identify an alternative visual concept that is present precisely when scene geometry is absent: kinematic coupling dynamics, the structured physical relationship between wrist motion and camera orientation imposed by the arm-shoulder-head chain.

We find that this concept is compact (4D inter-wrist features outperform 126D full hand keypoints), temporal (requiring a GRU over short windows rather than per-frame retrieval), and physically grounded (transferring zero-shot across datasets because it is rooted in anatomy rather than scene appearance). Trained only on tabletop manipulation, WristCompass transfers zero-shot to Epic Kitchens cooking video, achieving 14.

3$^\circ$ median geodesic error and approaching the performance of a 1B-parameter scene model at 200K GRU parameters.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Taha Koleilat, Hassan Rivaz, Yiming Xiao

5d ago

FeaturedOriginal

Evi-Steer: Learning to Steer Biomedical Vision-Language Models through Efficient and Generalizable Evidential Tuning

AI Summary

Evi-Steer introduces a novel evidential tuning framework for BiomedCLIP, enabling efficient fine-tuning with only 0.11% parameter updates. It significantly enhances performance in few-shot learning and domain shifts across 15 biomedical imaging datasets, demonstrating robustness for clinical applications.

#AI Coding #Inference #Open Source

WristCompass: Kinematic Coupling as a Learnable Visual Concept for Ego-Camera Orientation

Quick Take

Key Points

Article Content

Want this in your inbox every morning?

More from arXiv cs.CV

Evi-Steer: Learning to Steer Biomedical Vision-Language Models through Efficient and Generalizable Evidential Tuning

Deep Learning-Based Automated Quantification of TIMI Myocardial Perfusion Frame Count (DL-TMPFC) from Coronary Angiography: A Novel Framework for Rapid Assessment of Microvascular Dysfunction

GeoSym127K: Scalable Symbolically-verifiable Synthesis for Multimodal Geometric Reasoning

Related in this space

The Importance of Out-of-Band Metadata for Safe Autonomous Agents: The Redpanda Agentic Data Plane

TorqueAGI Announces Collaborations with NVIDIA, John Deere, and Dexterity to Advance Physical AI for Enterprise-Grade Robots

FORT Robotics Acquires Mapless AI to Expand Its Trust Platform with Remote Supervision and Active Safety Capabilities