Solvita: Enhancing Large Language Models for Competitive Programming via Agentic Evolution
Quick Answer
Solvita introduces an agentic evolution framework for large language models, enhancing competitive programming capabilities by enabling continuous learning without weight updates.
Quick Take
Solvita introduces an agentic evolution framework for large language models, enhancing competitive programming capabilities by enabling continuous learning without weight updates. It significantly outperforms existing , achieving nearly double the accuracy of single-pass baselines across various benchmarks, including CodeContests and Codeforces.
Key Points
- Solvita utilizes four specialized agents: Planner, Solver, Oracle, and Hacker.
- The framework incorporates a closed-loop system for strategy selection and program synthesis.
- Dynamic reinforcement learning updates allow agents to improve based on past performance.
- Achieved state-of-the-art results in benchmarks like CodeContests and APPS.
- Nearly doubled accuracy compared to single-pass code generation models.
Paper Resources
📖 Reader Mode
~2 min readAuthors:Han Li, Jinyu Tian, Rili Feng, Yuqiao Du, Chong Zheng, Chenyu Wang, Chenchen Liu, Shihao Li, Xinping Lei, Yifan Yao, Weihao Xie, Letian Zhu, Jiaheng Liu
Abstract:Large language models (LLMs) still struggle with the rigorous reasoning demands of hard competitive programming. While recent multi-agent frameworks attempt to bridge this reliability gap, they remain fundamentally stateless: they rely on static retrieval and discard the valuable problem-solving and debugging experience gained from previous tasks. To address this, we present Solvita, an agentic evolution framework that enables continuous learning without requiring weight updates to the underlying LLM. Solvita reorganizes problem-solving into a closed-loop system of strategy selection, program synthesis, certified supervision, and targeted hacking, executed by four specialized agents: Planner, Solver, Oracle, and Hacker. Crucially, each agent is paired with a trainable, graph-structured knowledge network. As the system operates, outcome signals, such as pass/fail verdicts, test certification quality, and adversarial vulnerabilities discovered by the Hacker, are recast as reinforcement learning updates to these network weights. This allows the agents to dynamically route future queries based on past successes and failures, effectively accumulating transferable reasoning experience over time. Evaluated across CodeContests, APPS, AetherCode, and live Codeforces rounds, Solvita establishes a new state-of-the-art among code-generation agents, outperforming existing multi-agent pipelines and nearly doubling the accuracy of single-pass baselines.
| Subjects: | Artificial Intelligence (cs.AI) |
| Cite as: | arXiv:2605.15301 [cs.AI] |
| (or arXiv:2605.15301v1 [cs.AI] for this version) | |
| https://doi.org/10.48550/arXiv.2605.15301 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Han Li [view email]
[v1]
Thu, 14 May 2026 18:15:09 UTC (824 KB)
— Originally published at arxiv.org
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.AI
See more →Procedural Memory Distillation: Online Reflection for Self-Improving Language Models
Procedural Memory Distillation (PMD) enhances reinforcement learning by converting cross-episode signals into reusable memory, improving Qwen3-8B and OLMo3-Instruct-7B models by 3.8-5.5% on SCIKNOWEVAL and 7.9-13.6% on . The co-evolution of policy and memory allows for more effective self-supervision, demonstrating significant performance gains when both components are active.