NOVA: Fundamental Limits of Knowledge Discovery Through AI

arXiv cs.AI·Salman Avestimehr, Ken Duffy, Muriel M\'edard

4d ago

·~2 min·5/18/2026·en·0

Quick Take

The NOVA framework models AI knowledge discovery limits and costs through iterative self-improvement.

Key Points

Identifies failure modes in knowledge accumulation.
Analyzes contamination traps in verification processes.
Quantifies discovery costs with diminishing returns.

📖 Reader Mode

~2 min read

[Submitted on 12 May 2026]

View PDF HTML (experimental)

Abstract:Can AI systems discover genuinely new knowledge through iterative self improvement, and if so, at what cost? We introduce the NOVA framework, which models the common ``generate, verify, accumulate, retrain'' loop as an adaptive sampling process over a knowledge space. We identify sufficient conditions under which accumulated genuine knowledge eventually covers a finite domain, and show how their violations produce distinct failure modes: contamination, forgetting, exploration failure, and acceptance failure. We then analyze imperfect verification and identify a contamination trap: as easy-to-find knowledge is exhausted, the model mass assigned to new valid artifacts shrinks, so even small false-positive rates can cause invalid artifacts to enter the knowledge base faster than genuine discoveries. We clarify that Good--Turing estimation is a local batch-diversity diagnostic, not an estimator of the historically undiscovered valid mass that governs long-term discovery. Under a separate tail-equivalence assumption relating the model's effective discovery distribution to a Zipf law with exponent $\alpha>1$, we prove that the cumulative generation cost required to obtain $D$ distinct genuine discoveries satisfies $R_{\mathrm{cum}}(D)=\Theta(c_{\mathrm{gen}}D^\alpha)$, where $c_{\mathrm{gen}}$ is the per-candidate generation cost. This scaling law quantifies asymptotic diminishing returns as the discovery frontier advances. Finally, we formalize human amplification through guidance, generation, and verification, explaining why expert input is most valuable near autonomous exploration barriers.

Subjects:	Artificial Intelligence (cs.AI); Information Theory (cs.IT)
Cite as:	arXiv:2605.15219 [cs.AI]
	(or arXiv:2605.15219v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2605.15219 arXiv-issued DOI via DataCite

Submission history

From: Salman Avestimehr [view email]
[v1] Tue, 12 May 2026 21:37:09 UTC (330 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

NOVA: Fundamental Limits of Knowledge Discovery Through AI

Quick Take

Key Points

📖 Reader Mode

Submission history

Want this in your inbox every morning?

More from arXiv cs.AI

From Prompts to Protocols: An AI Agent for Laboratory Automation

Agentic Trading: When LLM Agents Meet Financial Markets

Insights Generator: Systematic Corpus-Level Trace Diagnostics for LLM Agents

Related in this space

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?