Solvita: Enhancing Large Language Models for Competitive Programming via Agentic Evolution

arXiv cs.AI·Han Li, Jinyu Tian, Rili Feng, Yuqiao Du, Chong Zheng, Chenyu Wang, Chenchen Liu, Shihao Li, Xinping Lei, Yifan Yao, Weihao Xie, Letian Zhu, Jiaheng Liu

5/18/2026

·~2 min·5/18/2026·en·7

Quick Answer

Solvita introduces an agentic evolution framework for large language models, enhancing competitive programming capabilities by enabling continuous learning without weight updates.

Quick Take

Solvita introduces an agentic evolution framework for large language models, enhancing competitive programming capabilities by enabling continuous learning without weight updates. It significantly outperforms existing , achieving nearly double the accuracy of single-pass baselines across various benchmarks, including CodeContests and Codeforces.

Key Points

Solvita utilizes four specialized agents: Planner, Solver, Oracle, and Hacker.
The framework incorporates a closed-loop system for strategy selection and program synthesis.
Dynamic reinforcement learning updates allow agents to improve based on past performance.
Achieved state-of-the-art results in benchmarks like CodeContests and APPS.
Nearly doubled accuracy compared to single-pass code generation models.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

📖 Reader Mode

~2 min read

[Submitted on 14 May 2026]

Authors:Han Li, Jinyu Tian, Rili Feng, Yuqiao Du, Chong Zheng, Chenyu Wang, Chenchen Liu, Shihao Li, Xinping Lei, Yifan Yao, Weihao Xie, Letian Zhu, Jiaheng Liu

View PDF HTML (experimental)

Abstract:Large language models (LLMs) still struggle with the rigorous reasoning demands of hard competitive programming. While recent multi-agent frameworks attempt to bridge this reliability gap, they remain fundamentally stateless: they rely on static retrieval and discard the valuable problem-solving and debugging experience gained from previous tasks. To address this, we present Solvita, an agentic evolution framework that enables continuous learning without requiring weight updates to the underlying LLM. Solvita reorganizes problem-solving into a closed-loop system of strategy selection, program synthesis, certified supervision, and targeted hacking, executed by four specialized agents: Planner, Solver, Oracle, and Hacker. Crucially, each agent is paired with a trainable, graph-structured knowledge network. As the system operates, outcome signals, such as pass/fail verdicts, test certification quality, and adversarial vulnerabilities discovered by the Hacker, are recast as reinforcement learning updates to these network weights. This allows the agents to dynamically route future queries based on past successes and failures, effectively accumulating transferable reasoning experience over time. Evaluated across CodeContests, APPS, AetherCode, and live Codeforces rounds, Solvita establishes a new state-of-the-art among code-generation agents, outperforming existing multi-agent pipelines and nearly doubling the accuracy of single-pass baselines.

Subjects:	Artificial Intelligence (cs.AI)
Cite as:	arXiv:2605.15301 [cs.AI]
	(or arXiv:2605.15301v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2605.15301 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Han Li [view email]
[v1] Thu, 14 May 2026 18:15:09 UTC (824 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.AI

See more →

arXiv cs.AI·Ye Liu, Srijan Bansal, Bo Pang, Yang Li, Zeyu Leo Liu, Yifei Ming, Zixuan Ke, Shafiq Joty, Semih Yavuz

1d ago

FeaturedOriginal

Procedural Memory Distillation: Online Reflection for Self-Improving Language Models

AI Summary

Procedural Memory Distillation (PMD) enhances reinforcement learning by converting cross-episode signals into reusable memory, improving Qwen3-8B and OLMo3-Instruct-7B models by 3.8-5.5% on SCIKNOWEVAL and 7.9-13.6% on . The co-evolution of policy and memory allows for more effective self-supervision, demonstrating significant performance gains when both components are active.

#LLM #AI Coding #Inference #Policy