Interpretable Temporal Facial-Region Motion Analysis for In-the-Wild Parkinson's Disease Video Classification
Quick Answer
This study demonstrates that normalized facial-region motion descriptors can effectively classify Parkinson's disease videos, achieving a balanced accuracy of 0.826 and an AUROC of 0.855 on the YouTubePD benchmark.
Quick Take
This study demonstrates that normalized facial-region motion descriptors can effectively classify Parkinson's disease videos, achieving a balanced accuracy of 0.826 and an AUROC of 0.855 on the YouTubePD benchmark. The Random Forest classifier outperformed other models, indicating the potential for interpretable and lightweight representations in real-world applications.
Key Points
- Normalized velocity descriptors achieved the best classification performance with a Random Forest classifier.
- The study reached a balanced accuracy of 0.826 and AUROC of 0.855.
- Seed-robustness analysis confirmed the stability of the representation across 10 random seeds.
- The approach utilizes geometric descriptors from 14 predefined facial regions.
- Results suggest potential for real-world applications in Parkinson's disease video classification.
Paper Resources
Article Content
From source RSS / original summaryarXiv:2606. 10088v1 Announce Type: new Abstract: Reduced facial expressivity is a common motor manifestation of Parkinson's disease (PD), often described as hypomimia or facial bradykinesia. This paper examines whether temporal motion descriptors extracted from facial-region keypoints can support in-the-wild PD-related video classification on the YouTubePD benchmark. Each video is represented using geometric descriptors from 14 predefined facial regions.
Static geometry, normalized geometry, velocity-based descriptors, relative-velocity descriptors, and a GRU sequence baseline are compared under the same binary classification protocol. To assess stability and interpretability, the study includes seed-robustness analysis, region-level ablation, and permutation importance. The best result is obtained with normalized velocity descriptors and a Random Forest classifier, reaching a balanced accuracy of 0. 826 and an AUROC of 0. 855 on the held-out test split.
Across 10 random seeds, this representation remains stable, with balanced accuracy of 0. 810 +/- 0. 018 and AUROC of 0. 855 +/- 0. 005. Overall, the results suggest that normalized facial-region motion is a lightweight and interpretable representation for YouTubePD video classification. The study is framed as a benchmark-level analysis and does not claim clinical severity assessment or MDS-UPDRS facial-expression scoring.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval
A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.