Self-Evolving Agents with Anytime-Valid Certificates
Quick Answer
The SEA architecture enables self-evolving agents to modify behavior while adhering to a fixed error budget, utilizing a versioned harness around a frozen base model.
Quick Take
The SEA architecture enables self-evolving agents to modify behavior while adhering to a fixed error budget, utilizing a versioned harness around a frozen base model. It demonstrated significant performance improvements on the with models like GLM 5.2 and GPT, achieving deltas of +4 and +5 in evaluations. Future work will focus on reducing run-to-run variance and optimizing task-specific algorithms.
Key Points
- SEA confines self-modification to a steering adapter and a versioned harness.
- Five loop controllers ensure modifications are validated against a fixed error budget.
- Performance improvements of +4 and +5 were achieved on GLM 5.2 and GPT models.
- Mechanisms prevent regressions and confirm that modifications are effective.
- Future work will address run-to-run variance in evaluations.
Paper Resources
Article Content
From source RSS / original summaryarXiv:2607. 00871v1 Announce Type: new Abstract: Self-evolving agents violate the assumption behind most learning-theoretic guarantees: the data, evaluator, components, and hypothesis space are produced by the policy being updated. We present \textbf{SEA}, an architecture that confines self-modification to a small steering adapter and a versioned harness around a \emph{frozen} base model and admits each modification only through an anytime-valid gate that emits an auditable certificate against a fixed error budget.
Five loop controllers compose published guarantees; because such gates can only \emph{select} among behaviors the frozen base already produces, five verifier-in-the-loop mechanisms -- best-of-$N$, micro-step search, self-authored reproduction oracles, search-layer control, and self-repair -- supply the dense, grader-free signal the gates require, computed from the issue text alone.
On a $52$-instance Verified subset across four base models, base capability is the dominant, confound-free effect, and on two strong base models a deliberate no-op-composite control isolates the suite's contribution at $+4$ and $+5$ (\textsc{Glm}~5. 2 $24\to28$; \textsc{Gpt} $29\to34$, the $65\%$ best), with event logs confirming that its mechanisms fire and prevent regressions.
Results are single-run on expensive evaluations; confirming run-to-run variance and adapting the per-task algorithm mix are future work.
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.AI
See more →The Verification Horizon: No Silver Bullet for Coding Agent Rewards
As coding agents evolve, verifying solutions becomes more challenging than generating them, necessitating a focus on scalable, faithful, and robust verification methods. The study reveals that no fixed reward function can sustain effectiveness as model capabilities advance, emphasizing the need for verification to evolve alongside solution generation.