EgoTraj: Real-World Egocentric Human Trajectory Dataset for Multimodal Prediction
Quick Take
EgoTraj is a new egocentric dataset for predicting human trajectories in urban environments.
Key Points
- Contains 75 sequences of human navigation.
- Includes synchronized RGB video and ground-truth data.
- Supports AR-based perception and navigation systems.
📖 Reader Mode
~2 min readAbstract:Accurately forecasting human trajectories from an egocentric perspective plays a central role in applications such as humanoid robotics, wearable sensing systems, and assistive navigation. However, progress in this direction remains limited due to the scarcity of egocentric trajectory datasets collected in real-world environments. Addressing this need, we introduce EgoTraj, an egocentric multimodal open dataset recorded using Meta Quest Pro (MQPro). EgoTraj contains 75 sequences of human navigation collected from multiple MQPro wearers in real-world urban environments. Each recording provides synchronized RGB video along with ground-truth data, including continuous time-synchronized 6-degree-of-freedom head poses, per-frame 3D eye gaze vectors, scene annotations. To the best of our knowledge, EgoTraj differs from typical egocentric trajectory datasets by capturing long-horizon, self-directed navigation across diverse urban routes with broad participant diversity. To demonstrate the potential of the dataset, we benchmark several state-of-the-art methods for egocentric trajectory prediction and conduct ablation studies to analyze the contributions of gaze, scene, and motion cues. The results highlight the utility of EgoTraj for AR-based perception, navigation, and assistive systems. The EgoTraj dataset, code, and EgoViz Dashboard are publicly available at this https URL.
| Comments: | 21 pages, 14 figures. Project page: this https URL |
| Subjects: | Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO) |
| ACM classes: | I.2.10; I.4.8; I.5.4 |
| Cite as: | arXiv:2605.19004 [cs.CV] |
| (or arXiv:2605.19004v1 [cs.CV] for this version) | |
| https://doi.org/10.48550/arXiv.2605.19004 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Ahmad Yehia [view email]
[v1]
Mon, 18 May 2026 18:26:51 UTC (44,620 KB)
— Originally published at arxiv.org
More from arXiv cs.CV
See more →GeoSym127K: Scalable Symbolically-verifiable Synthesis for Multimodal Geometric Reasoning
GeoSym127K introduces a scalable neuro-symbolic framework for enhanced geometric reasoning in multimodal models.