OpenSkill: Open-World Self-Evolution for LLM Agents
Quick Answer
OpenSkill introduces a self-evolving framework for LLM agents that builds skills and verification signals from open-world resources without supervision.
Quick Take
OpenSkill introduces a self-evolving framework for LLM agents that builds skills and verification signals from open-world resources without supervision. It achieved the highest automated pass rate across three benchmarks while ensuring skills transfer across models and aligning self-built verifiers with ground-truth outcomes.
Key Points
- OpenSkill operates without target-task supervision, relying solely on open-world resources.
- The framework synthesizes knowledge from documentation and the web into transferable skills.
- It achieved the best automated pass rate across three benchmarks with two target agents.
- Skills transfer across models without requiring model-specific adaptations.
- Self-built verifiers align with ground-truth outcomes despite no prior access.
Article Content
From source RSS / original summaryarXiv:2606. 06741v1 Announce Type: new Abstract: Self-evolving agents requires adaptation after deployment, but existing approaches assume a usable learning loop, such as curated skills, successful trajectories, or verifier signals. Real open-world deployments may provide none of these, offering only a task prompt. In this work, we study open-world self-evolution, where an agent must build both its skills and its own verification signals from scratch, using open-world resources but no target-task supervision.
We propose OpenSkill, a framework that bootstraps this loop: it acquires grounded knowledge and verification anchors from documentation, repositories, and the web, synthesizes them into transferable skills, and refines those skills against self-built virtual tasks grounded in the anchors rather than in target answers. The open world thus supplies both the knowledge to be learned and a supervision-independent practice environment, with target-task supervision reserved for final evaluation.
Across three benchmarks and two target agents, OpenSkill attains the best automated pass rate while satisfying the no-supervision constraint. Analysis shows its skills transfer across models without model-specific adaptation, and its self-built verifier aligns with ground-truth outcomes despite never accessing them.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.AI
See more →The Sim-to-Real Gap of Foundation Model Agents: A Unified MDP Perspective
This paper addresses the sim-to-real gap for foundation model agents by framing it within a Markov Decision Process (MDP) structure. It advocates for established solutions like domain randomization to enhance agent robustness, aiming to create standardized benchmarks for reliable real-world applications.