Social-Mamba: Socially-Aware Trajectory Forecasting with State-Space Models
Quick Take
Social-Mamba introduces a novel approach for efficient human trajectory forecasting in crowded environments.
Key Points
- Utilizes Cycle Mamba block for bidirectional information flow.
- Achieves state-of-the-art accuracy with parameter efficiency.
- Integrates into flow-matching framework for enhanced performance.
📖 Reader Mode
~2 min readAbstract:Human trajectory forecasting is crucial for safe navigation in crowded environments, requiring models that balance accuracy with computational efficiency. Efficiently modeling social interactions is key to performance in dense crowds. Yet, most recent methods rely on attention mechanisms, which are effective at capturing complex dependencies, but incur quadratic computational costs that scale poorly with the growing number of neighbors. Recently, Selective State-Space Models have provided a linear-time alternative; however, their inherently sequential design is misaligned with the unstructured and dynamic nature of social interactions. To address this challenge, we propose Social-Mamba, a forecasting architecture that reformulates social interactions as structured sequential processes. At its core is the Cycle Mamba block, a novel module that enables continuous bidirectional information flow. Social-Mamba organizes agents on an egocentric grid and introduces social triplet factorization, which decomposes interactions into temporal, egocentric, and goal-centric scans. These are dynamically integrated through a learnable social gate and global scan to generate accurate and efficient trajectory predictions. Extensive experiments on five trajectory forecasting benchmarks show that Social-Mamba achieves state-of-the-art accuracy while offering superior parameter efficiency and computational scalability. Furthermore, embedding Social-Mamba into a flow-matching framework further enhances both accuracy and efficiency, establishing it as a flexible and robust foundation for future trajectory forecasting research. The code is publicly available: this https URL
| Subjects: | Computer Vision and Pattern Recognition (cs.CV) |
| Cite as: | arXiv:2605.15424 [cs.CV] |
| (or arXiv:2605.15424v1 [cs.CV] for this version) | |
| https://doi.org/10.48550/arXiv.2605.15424 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Po-Chien Luan [view email]
[v1]
Thu, 14 May 2026 21:16:01 UTC (1,358 KB)
— Originally published at arxiv.org
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →GeoSym127K: Scalable Symbolically-verifiable Synthesis for Multimodal Geometric Reasoning
GeoSym127K introduces a scalable neuro-symbolic framework for enhanced geometric reasoning in multimodal models.