Interpretable Temporal Facial-Region Motion Analysis for In-the-Wild Parkinson's Disease Video Classification

arXiv cs.CV·Riyadh Almushrafy (Majmaah University, Saudi Arabia)

3d ago

·~1 min·6/10/2026·en·0

Quick Answer

Quick Take

This study demonstrates that normalized facial-region motion descriptors can effectively classify Parkinson's disease videos, achieving a balanced accuracy of 0.826 and an AUROC of 0.855 on the YouTubePD benchmark. The Random Forest classifier outperformed other models, indicating the potential for interpretable and lightweight representations in real-world applications.

Key Points

Normalized velocity descriptors achieved the best classification performance with a Random Forest classifier.
The study reached a balanced accuracy of 0.826 and AUROC of 0.855.
Seed-robustness analysis confirmed the stability of the representation across 10 random seeds.
The approach utilizes geometric descriptors from 14 predefined facial regions.
Results suggest potential for real-world applications in Parkinson's disease video classification.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Article Content

From source RSS / original summary

arXiv:2606. 10088v1 Announce Type: new Abstract: Reduced facial expressivity is a common motor manifestation of Parkinson's disease (PD), often described as hypomimia or facial bradykinesia. This paper examines whether temporal motion descriptors extracted from facial-region keypoints can support in-the-wild PD-related video classification on the YouTubePD benchmark. Each video is represented using geometric descriptors from 14 predefined facial regions.

Static geometry, normalized geometry, velocity-based descriptors, relative-velocity descriptors, and a GRU sequence baseline are compared under the same binary classification protocol. To assess stability and interpretability, the study includes seed-robustness analysis, region-level ablation, and permutation importance. The best result is obtained with normalized velocity descriptors and a Random Forest classifier, reaching a balanced accuracy of 0. 826 and an AUROC of 0. 855 on the held-out test split.

Across 10 random seeds, this representation remains stable, with balanced accuracy of 0. 810 +/- 0. 018 and AUROC of 0. 855 +/- 0. 005. Overall, the results suggest that normalized facial-region motion is a lightweight and interpretable representation for YouTubePD video classification. The study is framed as a benchmark-level analysis and does not claim clinical severity assessment or MDS-UPDRS facial-expression scoring.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Shahrzad Esmat, Chaunte W. Lacewell, Sameh Gobriel, Nilesh Jain, Ali Jannesari

1w ago

FeaturedOriginal

LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval

AI Summary

A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.

#LLM #Agent #Inference #AI Startup