NOVA: Fundamental Limits of Knowledge Discovery Through AI
Quick Take
The NOVA framework models AI knowledge discovery limits and costs through iterative self-improvement.
Key Points
- Identifies failure modes in knowledge accumulation.
- Analyzes contamination traps in verification processes.
- Quantifies discovery costs with diminishing returns.
📖 Reader Mode
~2 min readAbstract:Can AI systems discover genuinely new knowledge through iterative self improvement, and if so, at what cost? We introduce the NOVA framework, which models the common ``generate, verify, accumulate, retrain'' loop as an adaptive sampling process over a knowledge space. We identify sufficient conditions under which accumulated genuine knowledge eventually covers a finite domain, and show how their violations produce distinct failure modes: contamination, forgetting, exploration failure, and acceptance failure. We then analyze imperfect verification and identify a contamination trap: as easy-to-find knowledge is exhausted, the model mass assigned to new valid artifacts shrinks, so even small false-positive rates can cause invalid artifacts to enter the knowledge base faster than genuine discoveries. We clarify that Good--Turing estimation is a local batch-diversity diagnostic, not an estimator of the historically undiscovered valid mass that governs long-term discovery. Under a separate tail-equivalence assumption relating the model's effective discovery distribution to a Zipf law with exponent $\alpha>1$, we prove that the cumulative generation cost required to obtain $D$ distinct genuine discoveries satisfies $R_{\mathrm{cum}}(D)=\Theta(c_{\mathrm{gen}}D^\alpha)$, where $c_{\mathrm{gen}}$ is the per-candidate generation cost. This scaling law quantifies asymptotic diminishing returns as the discovery frontier advances. Finally, we formalize human amplification through guidance, generation, and verification, explaining why expert input is most valuable near autonomous exploration barriers.
| Subjects: | Artificial Intelligence (cs.AI); Information Theory (cs.IT) |
| Cite as: | arXiv:2605.15219 [cs.AI] |
| (or arXiv:2605.15219v1 [cs.AI] for this version) | |
| https://doi.org/10.48550/arXiv.2605.15219 arXiv-issued DOI via DataCite |
Submission history
From: Salman Avestimehr [view email]
[v1]
Tue, 12 May 2026 21:37:09 UTC (330 KB)
— Originally published at arxiv.org
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.AI
See more →From Prompts to Protocols: An AI Agent for Laboratory Automation
An AI agent integrates large language models for automating laboratory protocols, enhancing efficiency and accuracy.