Generating in the Limit with Infinitely Many Hallucinations

arXiv cs.CL·Irene Strauss, Alexandra Butoi, Ryan Cotterell

1d ago

·~2 min·6/30/2026·en·0

Quick Answer

The paper introduces a new model for language generation in the limit, emphasizing a recall-precision trade-off.

Quick Take

The paper introduces a new model for language generation in the limit, emphasizing a recall-precision trade-off. It allows for infinitely many mistakes as long as their frequency approaches zero, potentially increasing recall when a significant portion of the target language is withheld. This approach aims to better align with the realities faced by large language models in generating valid, unseen strings.

Key Points

Introduces a precision concept in language generation, addressing recall-precision trade-off.
Allows infinitely many mistakes if their frequency tends to zero, maintaining precision.
Increases recall when adversaries withhold significant portions of the target language.
Explores a continuous relaxation of novelty constraints for language outputs.
Aims to model realistic language generation with controlled error and repetition rates.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

📖 Reader Mode

~2 min read

[Submitted on 8 Jun 2026]

View PDF HTML (experimental)

Abstract:The classic paradigm of language identification in the limit models learning as a game between an adversary, who reveals strings from an unknown target language, and a learner tasked with identifying that language. The recently introduced framework of language generation in the limit shifted the objective to better reflect modern language modeling, requiring the learner to produce valid, unseen strings from the target language. Related work highlighted a fundamental tension: a broad coverage of the target often comes at the cost of validity. We introduce a new notion of precision and recast this problem as the classic recall-precision trade-off. We analyze generation in the limit under varying constraints on enumeration, novelty, and validity, aimed at reflecting settings closer to those encountered by large language models. A key contribution is our analysis of learners that are not eventually valid: we allow infinitely many mistakes, provided their frequency tends to zero so that precision remains one. We show that this relaxation can strictly increase recall when the adversary permanently withholds a large portion of the target language. We also study a continuous relaxation of the novelty constraint that requires only a fixed fraction of outputs to be novel. Taken together, our results move toward a more realistic model of language generation where occasional errors and repetitions are unavoidable, but their rates are controlled.

Subjects:	Computation and Language (cs.CL); Formal Languages and Automata Theory (cs.FL); Machine Learning (cs.LG)
Cite as:	arXiv:2606.28354 [cs.CL]
	(or arXiv:2606.28354v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2606.28354 arXiv-issued DOI via DataCite

Submission history

From: Irene Strauss [view email]
[v1] Mon, 8 Jun 2026 09:58:13 UTC (60 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CL

See more →

arXiv cs.CL·Barak Or

1w ago

FeaturedOriginal

Quantifying Prior Dominance in Systems

AI Summary

The study introduces the Normalized Context Utilization (NCU) metric to evaluate Retrieval-Augmented Generation (RAG) systems, revealing that Small Language Models (SLMs) outperform larger models in factual extraction. The findings indicate that traditional scaling laws yield diminishing returns, with a commercial API frequently failing against adversarial evidence due to systemic confidence collapse.

#LLM #AI Coding #Inference #AI Startup

Generating in the Limit with Infinitely Many Hallucinations

Quick Answer

Quick Take

Key Points

Paper Resources

📖 Reader Mode

Submission history

Want this in your inbox every morning?

More from arXiv cs.CL

Quantifying Prior Dominance in Systems

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

When Plausible Is Not Realistic: Evaluating Human Mobility in LLM-Based Urban Simulation

Quick Answer

Quick Take

Key Points

Paper Resources

📖 Reader Mode

Submission history

Want this in your inbox every morning?

More from arXiv cs.CL

Quantifying Prior Dominance in RAG Systems

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

When Plausible Is Not Realistic: Evaluating Human Mobility in LLM-Based Urban Simulation

Quantifying Prior Dominance in Systems