SkillDAG: Self-Evolving Typed Skill Graphs for LLM Skill Selection at Scale
Quick Take
SkillDAG introduces a self-evolving typed skill graph for LLMs, achieving 67.1% success and 27.3% reward on ALFWorld and SkillsBench, outperforming the Graph-of-Skills baseline by significant margins. The model enhances candidate ranking and recall mechanisms, making it robust as skill libraries expand.
Key Points
- SkillDAG models inter-skill relationships as a typed directed graph.
- Achieved 67.1% success rate and 27.3% reward on ALFWorld and SkillsBench.
- Outperformed Graph-of-Skills baseline by +12.8 success points and +8.6 reward points.
- Candidate ranking remains robust even with a 10x increase in skill pool size.
- Set-monotone online edits improve recall without losing prior matches.
Article Content
From source RSS / original summaryarXiv:2606. 03056v1 Announce Type: new Abstract: As LLM agents adopt large skill libraries, selecting the right subset becomes a structural problem rather than a similarity-matching one: skills depend on, conflict with, specialize, or duplicate one another, a structure invisible to both full enumeration and embedding similarity.
We present SkillDAG, which models inter-skill relationships as a typed directed graph and exposes it to an LLM agent as an inference-time, agent-callable structural retrieval interface, queried and evolved during execution rather than baked into a fixed retrieval pipeline: each search returns vector matches, typed-edge neighbors, and conflict signals, and a propose-then-commit protocol lets the agent register execution-backed edges so the graph accumulates structure across episodes.
On ALFWorld and SkillsBench with MiniMax-M2. 7, SkillDAG reaches 67. 1% success and 27. 3% reward, exceeding the strongest reported Graph-of-Skills baseline by +12. 8 and +8. 6 points; the advantage ports to gpt-5. 2-codex, and intrinsic SkillsBench Ret@K rises from 65. 5 to 78. 2 under matched queries.
These gains trace to isolable mechanisms: candidate ranking that stays robust as the pool grows 10x where a fixed seeding-diffusion pipeline degrades, and set-monotone online edits that enlarge ground-truth recall without evicting prior hits.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.AI
See more →AUDITFLOW: Executable Symbolic Environments for Structured Financial Reporting Verification
AuditFlow introduces a multi-agent framework for structured financial reporting verification, achieving 82.09% accuracy with GPT-5.5, outperforming the baseline by 14.93 points. It utilizes a symbolic environment for effective audit processes, demonstrating the necessity of deterministic checks for reliable verification.