Fine-tuning a multimodal large language model for clinician-grade autism behavioral scoring from short home videos
Quick Answer
This paper shows that Fine-tuning Gemini 2.5 Pro on 400 clinician-rated home videos improved ASD diagnosis accuracy by 53%, achieving 77% accuracy and an AUC of 86%.
Quick Take
Fine-tuning Gemini 2.5 Pro on 400 clinician-rated home videos improved ASD diagnosis accuracy by 53%, achieving 77% accuracy and an AUC of 86%. This approach enhances early diagnosis for 1 in 31 US children affected by autism.
Key Points
- Inter-rater reliability improved by 40% with clinician-rated features.
- 27 of 28 behavioral features showed improvement after fine-tuning.
- Classifier-assisted pipelines achieved 77% accuracy and 86% AUC.
- Direct ASD diagnosis F1 score matched or exceeded clinician outcomes.
- Fine-tuned LLMs can scale behavioral feature extraction for autism assessment.
Paper Resources
Article Content
From source RSS / original summaryarXiv:2606. 27484v1 Announce Type: new Abstract: Autism spectrum disorder (ASD) affects 1 in 31 US children, yet median age at diagnosis exceeds four years. Artificial intelligence pipelines that provide quantified diagnosis using easy to access observational data (e. g. , home videos) could help with earlier diagnosis, and timely delivery of early treatments. We fine-tuned Gemini 2.
5 Pro on 400 clinician-rated home videos with low-rank adaptation, training only on 30 behavioral features previously validated to produce reliable predictions when passed to various ML models. On 99 held-out children (49 ASD, 50 neurotypical), inter-rater reliability with clinicians (per-feature weighted Cohen's kappa) improved by 40% (p<0. 001), with 27 of 28 evaluable features improving. As an emergent zero-shot capability, direct ASD diagnosis F1 improved by 53% (p<0.
001), matching or exceeding clinician outcomes. Classifier-assisted pipelines using fine-tuned LLM-derived behavioral features matched clinician-scored inputs across all tested pathways and achieved 77% accuracy (95% CI: 68-85%) and an AUC of 86% (95% CI: 78-92%). Fine-tuned multimodal LLMs can serve as scalable behavioral feature extractors for use in autism assessment and diagnosis.
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval
A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.