DiffCrossGait: Trajectory-Level Alignment for 2D-3D Cross-Modal Gait Recognition via Latent Diffusion

arXiv cs.CV·Zhiyang Lu, Ming Cheng

2h ago

·~1 min·6/2/2026·en·0

Quick Take

DiffCrossGait introduces a novel approach for 2D-3D gait recognition by aligning trajectories in a latent diffusion space, achieving state-of-the-art results on SUSTech1K and FreeGait benchmarks. This method enhances modality-invariant gait features while ensuring efficient inference by decoupling generative alignment from the discriminative backbone.

Key Points

DiffCrossGait reformulates cross-modal matching as trajectory-level alignment.
Utilizes shared Gaussian noise for continuous alignment in latent space.
Introduces Tri-Phase Alignment Strategy for identity anchoring and dynamics consistency.
Decouples generative alignment from discriminative backbone for efficient inference.
Achieves state-of-the-art performance on SUSTech1K and FreeGait datasets.

Article Content

From source RSS / original summary

arXiv:2606. 00153v1 Announce Type: new Abstract: Cross-modal 2D-3D gait recognition is impeded by inherent domain discrepancies between 2D silhouette and 3D LiDAR range-view representations. While prior methods align only final embeddings, we propose DiffCrossGait, which reformulates cross-modal matching as trajectory-level alignment in an identity-relevant latent diffusion space, rather than assuming full equivalence between 2D and 3D observations.

By driving both modalities with shared Gaussian noise within a latent space, we enable continuous alignment throughout the generative evolution. We introduce a Tri-Phase Alignment Strategy that exploits varying noise intensities to enforce identity anchoring, dynamics consistency, and cross-modal structural recoverability, thereby constraining both modalities to share denoising dynamics and bottleneck structure, which promotes modality-invariant gait features.

Crucially, our framework decouples generative alignment from the discriminative backbone; the diffusion mechanism serves exclusively as a training objective, ensuring high inference efficiency by eliminating the computational overhead of iterative denoising. Extensive experiments on the SUSTech1K and FreeGait benchmarks demonstrate that DiffCrossGait achieves state-of-the-art performance.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Taha Koleilat, Hassan Rivaz, Yiming Xiao

6d ago

FeaturedOriginal

Evi-Steer: Learning to Steer Biomedical Vision-Language Models through Efficient and Generalizable Evidential Tuning

AI Summary

Evi-Steer introduces a novel evidential tuning framework for BiomedCLIP, enabling efficient fine-tuning with only 0.11% parameter updates. It significantly enhances performance in few-shot learning and domain shifts across 15 biomedical imaging datasets, demonstrating robustness for clinical applications.

#AI Coding #Inference #Open Source