A Scalable Tool for Measuring Manner and Result Verbs in Developmental Language Research
Quick Take
A new scalable tool measures manner and result verbs using large language models for developmental language research.
Key Points
- Computational approach identifies manner/result verbs in context.
- RoBERTa-based classifier achieves up to 89.6% accuracy.
- Tool supports future verb semantics research in language datasets.
📖 Reader Mode
~2 min readAbstract:Manner and result verbs encode different aspects of event structure and have been discussed in developmental work as a potentially informative distinction for studying early verb learning. However, this distinction remains difficult to measure at scale because large annotated resources for manner and result classification are not currently available. We present a computational approach for identifying manner and result verbs in sentence context. Using linguistically informed prompts, we generate sentence-level annotations with large language models over data drawn from MASC and InterCorp, extending coverage from previously annotated portions of VerbNet to 436 classes. We then train a RoBERTa-based classifier on these annotations and evaluate it on three held-out gold-standard datasets, including previously annotated items and a new expert-annotated set. Across these evaluations, the model shows promising performance, with average accuracy up to 89.6%. We present this work as a scalable measurement tool that can support future research on verb semantics in developmental and other language datasets, while noting that further validation is needed for borderline cases, mixed manner/result verbs, and downstream developmental applications.
| Comments: | 12 pages |
| Subjects: | Computation and Language (cs.CL); Artificial Intelligence (cs.AI) |
| Cite as: | arXiv:2605.16654 [cs.CL] |
| (or arXiv:2605.16654v1 [cs.CL] for this version) | |
| https://doi.org/10.48550/arXiv.2605.16654 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Ifeoma Nwogu [view email]
[v1]
Fri, 15 May 2026 21:48:47 UTC (5,555 KB)
— Originally published at arxiv.org
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The reliability of LLM judges for evaluating deep research agents is critically assessed using the REFLECT benchmark.