Fine-tuning a multimodal large language model for clinician-grade autism behavioral scoring from short home videos

arXiv cs.CV·Mohammadmahdi Honarmand, Parnian Azizian, Aaron Kline, Kae Nurge, Zerin Nasrin Tumpa, Saimourya Surabhi, Kaitlyn Dunlap, Yang Qian, Ali Kargarandehkordi, Sameer Neupane, Peter Washington, Dennis P. Wall

2d ago

·~1 min·6/29/2026·en·0

Quick Answer

This paper shows that Fine-tuning Gemini 2.5 Pro on 400 clinician-rated home videos improved ASD diagnosis accuracy by 53%, achieving 77% accuracy and an AUC of 86%.

Quick Take

Fine-tuning Gemini 2.5 Pro on 400 clinician-rated home videos improved ASD diagnosis accuracy by 53%, achieving 77% accuracy and an AUC of 86%. This approach enhances early diagnosis for 1 in 31 US children affected by autism.

Key Points

Inter-rater reliability improved by 40% with clinician-rated features.
27 of 28 behavioral features showed improvement after fine-tuning.
Classifier-assisted pipelines achieved 77% accuracy and 86% AUC.
Direct ASD diagnosis F1 score matched or exceeded clinician outcomes.
Fine-tuned LLMs can scale behavioral feature extraction for autism assessment.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Article Content

From source RSS / original summary

arXiv:2606. 27484v1 Announce Type: new Abstract: Autism spectrum disorder (ASD) affects 1 in 31 US children, yet median age at diagnosis exceeds four years. Artificial intelligence pipelines that provide quantified diagnosis using easy to access observational data (e. g. , home videos) could help with earlier diagnosis, and timely delivery of early treatments. We fine-tuned Gemini 2.

5 Pro on 400 clinician-rated home videos with low-rank adaptation, training only on 30 behavioral features previously validated to produce reliable predictions when passed to various ML models. On 99 held-out children (49 ASD, 50 neurotypical), inter-rater reliability with clinicians (per-feature weighted Cohen's kappa) improved by 40% (p<0. 001), with 27 of 28 evaluable features improving. As an emergent zero-shot capability, direct ASD diagnosis F1 improved by 53% (p<0.

001), matching or exceeding clinician outcomes. Classifier-assisted pipelines using fine-tuned LLM-derived behavioral features matched clinician-scored inputs across all tested pathways and achieved 77% accuracy (95% CI: 68-85%) and an AUC of 86% (95% CI: 78-92%). Fine-tuned multimodal LLMs can serve as scalable behavioral feature extractors for use in autism assessment and diagnosis.

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Shahrzad Esmat, Chaunte W. Lacewell, Sameh Gobriel, Nilesh Jain, Ali Jannesari

3w ago

FeaturedOriginal

LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval

AI Summary

A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.

#LLM #Agent #Inference #AI Startup