AfriSUD: A Dependency Treebank Collection for Evaluating Models on African Languages
Quick Answer
AfriSUD introduces the first large-scale collection of syntactically annotated treebanks for nine African languages, revealing significant syntax gaps in existing NLP models.
Quick Take
AfriSUD introduces the first large-scale collection of syntactically annotated treebanks for nine African languages, revealing significant syntax gaps in existing NLP models. Evaluations of part-of-speech tagging and dependency parsing using non-transformer baselines, multilingual pretrained encoders, and LLMs show limitations in capturing the structural diversity of African languages.
Key Points
- AfriSUD includes treebanks for nine diverse African languages from major language families.
- Data is verified by native speakers, capturing key typological features like agglutination and tone.
- Evaluation reveals significant limitations in models across all nine languages.
- Models tested include non-transformer baselines, multilingual pretrained encoders, and LLMs.
- Existing architectures may not adequately represent African-language syntax diversity.
Paper Resources
Article Excerpt
From source RSS / original summaryarXiv:2606. 12708v1 Announce Type: new Abstract: Despite their linguistic diversity and global significance, African languages remain underrepresented in research and resources to support NLP. We aim to bridge this gap by introducing AfriSUD, the first large-scale collection of syntactically annotated treebanks for nine diverse African languages spanning major language families and regions across Sub-Saharan Africa.
Using the Surface-Syntactic Universal Dependencies (SUD) framework, our community-led effort provides high-quality, native-speaker verified data that capture typological key features such as agglutination and tone. We evaluate a range of models on AfriSUD for part-of-speech tagging and dependency parsing including non-transformer baselines, multilingual pretrained encoders, and LLMs.
Our results reveal a significant syntax gap, where models still show clear limitations across the nine languages, suggesting that existing architectures may not fully capture the structural diversity of African-language syntax.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The REFLECT benchmark reveals that current LLM judges are unreliable, achieving below 55% accuracy in evaluating reasoning and evidence use, highlighting the need for improved evaluation methods for deep research agents.