The Annotation Scarcity Paradox in Low-Resource NLP Evaluation: A Decade of Acceleration and Emerging Constraints

arXiv cs.CL·Vukosi Marivate

5/20/2026

·~2 min·5/20/2026·en·3

Quick Answer

This paper shows that The Annotation Scarcity Paradox highlights a critical gap in low-resource NLP evaluation, where the rapid advancement of models outpaces the sociolinguistic expertise needed for authentic assessment.

Quick Take

The Annotation Scarcity Paradox highlights a critical gap in low-resource NLP evaluation, where the rapid advancement of models outpaces the sociolinguistic expertise needed for authentic assessment. This disconnect threatens the validity of reported progress and calls for a shift towards community-embedded evaluation practices. Emerging solutions like data augmentation and participatory curation are proposed to address these challenges.

Key Points

Low-resource NLP has grown rapidly due to cross-lingual transfer and multilingual models.
The Annotation Scarcity Paradox arises from the mismatch between model scaling and evaluation capacity.
Current evaluation methods face challenges like undercompensated ghost work and data flaring.
Proposed solutions include data augmentation, model-based evaluation, and active learning.
A paradigm shift is needed towards relational evaluation rooted in community engagement.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

📖 Reader Mode

~2 min read

[Submitted on 18 May 2026]

View PDF HTML (experimental)

Abstract:Over the past decade, low-resource natural language processing (NLP) has experienced explosive growth, propelled by cross-lingual transfer, massively multilingual models, and the rapid proliferation of benchmarks. Yet this apparent progress masks a critical, insufficiently examined tension: the deep sociolinguistic expertise required to evaluate increasingly complex generative systems is severely strained, inequitably distributed, and structurally marginalised. We present a critical narrative survey of low-resource NLP evaluation (2014--present), tracing its evolution across three phases: early heuristic optimism, the illusions of top-down benchmark scaling, and the current era of generative bottlenecks. We conceptualise the \emph{Annotation Scarcity Paradox}, the structural friction arising when the technical capacity to scale models vastly outpaces the sovereign human infrastructure required to authentically evaluate them. By examining extractive data pipelines, undercompensated ``ghost work'', and language data flaring, we argue that this paradox threatens the epistemic validity of reported progress. We survey emerging responses -- including data augmentation, model-based evaluation, participatory curation, and annotation-efficient approaches via item response theory and active learning -- and assess their equity and validity trade-offs. We close with a practitioner call to action, arguing that overcoming this bottleneck requires a paradigm shift from transactional data extraction to relational, community-embedded evaluation rooted in epistemic governance, data sovereignty, and shared ownership.

Comments:	Under Review
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2605.19066 [cs.CL]
	(or arXiv:2605.19066v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2605.19066 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Vukosi Marivate [view email]
[v1] Mon, 18 May 2026 19:48:00 UTC (54 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CL

See more →

arXiv cs.CL·Miguel Arana-Catania, Catherine Conisbee, Matthew Kidd

3d ago

FeaturedOriginal

Letting the Data Speak: Extracting Keywords from Crowdsourced Collections with AI

AI Summary

The study evaluates three NLP approaches—Named Entity Recognition, Keyword Extraction, and Topic Modelling—using the Their Finest Hour Online Archive to automate keyword extraction from crowdsourced WWII collections. Findings suggest that while NLP methods show promise, no single approach is sufficient, and ethical considerations in automated keyword extraction are crucial for responsible stewardship.

#AI Coding #Inference #Open Source #Policy

The Annotation Scarcity Paradox in Low-Resource NLP Evaluation: A Decade of Acceleration and Emerging Constraints

Quick Answer

Quick Take

Key Points

Paper Resources

📖 Reader Mode

Submission history

Want this in your inbox every morning?

More from arXiv cs.CL

Letting the Data Speak: Extracting Keywords from Crowdsourced Collections with AI

Quantifying Prior Dominance in Systems

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

Quick Answer

Quick Take

Key Points

Paper Resources

📖 Reader Mode

Submission history

Want this in your inbox every morning?

More from arXiv cs.CL

Letting the Data Speak: Extracting Keywords from Crowdsourced Collections with AI

Quantifying Prior Dominance in RAG Systems

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

Quantifying Prior Dominance in Systems