Memory-Augmented LSTM Autoencoder for Unsupervised Activity Recognition with IMU Sensor Fusion

arXiv cs.CV·Saeid Arabzadeh, Farshad Almasganj, Mohammad Mahdi Ahmadi

1d ago

·~2 min·6/30/2026·en·0

Quick Answer

Quick Take

The proposed memory-augmented LSTM autoencoder framework achieves 96.6% and 98.4% accuracy on DaLiAc and PAMAP2 datasets, respectively, outperforming both supervised and unsupervised methods in unsupervised human activity recognition using IMU sensor fusion. This approach effectively captures spatiotemporal dependencies despite challenges like noisy data and overlapping activities.

Key Points

Introduces a fully unsupervised spatiotemporal feature fusion framework.
Utilizes a memory-augmented autoencoder for enhanced activity representation.
Achieves up to 9% improvement in feature separability with shorter temporal windows.
Evaluated on realistic inter-class window segmentation for practical relevance.
Surpasses traditional supervised baselines in accuracy.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

📖 Reader Mode

~2 min read

[Submitted on 19 Jun 2026]

View PDF

Abstract:HAR using Inertial Measurement Unit (IMU) sensors is vital for healthcare monitoring and rehabilitation. Despite deep learning advancements, major challenges remain: reliance on labeled data, multi-sensor fusion complexity, and the limited ability of unsupervised methods to capture spatiotemporal dependencies. These issues are pronounced in real-world scenarios with noisy data, overlapping activities, and missing labels. We propose a fully unsupervised spatiotemporal feature fusion framework using a memory-augmented autoencoder. It enhances activity representations via short temporal windows of multi-sensor IMU data, enabling real-time applications. Our framework extracts hierarchical static features via a Stacked Autoencoder, fusing them within and across sensors. A sequence-to-sequence LSTM Autoencoder then temporally refines these features, incorporating historical motion patterns without labels. We analyze key hyperparameters to identify configurations that maximize feature separability under short-window constraints. Evaluated on DaLiAc and PAMAP2 using realistic inter-class window segmentation, our method achieves 96.6% and 98.4% accuracy, respectively, surpassing supervised baselines and unsupervised approaches. Our method improves feature separability by up to 9% despite shorter temporal windows. While our realistic inter-class segmentation reduces accuracy by ~7%, it was intentionally adopted to better reflect real-world activity transitions and practical relevance.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2606.28377 [cs.CV]
	(or arXiv:2606.28377v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2606.28377 arXiv-issued DOI via DataCite

Submission history

From: Saeed Arabzadeh [view email]
[v1] Fri, 19 Jun 2026 06:28:20 UTC (1,535 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Shahrzad Esmat, Chaunte W. Lacewell, Sameh Gobriel, Nilesh Jain, Ali Jannesari

3w ago

FeaturedOriginal

LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval

AI Summary

A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.

#LLM #Agent #Inference #AI Startup