SkillSmith: Compiling Agent Skills into Boundary-Guided Runtime Interfaces
Quick Answer
SkillSmith introduces a boundary-first compiler-runtime framework that reduces token usage by 57.44% and solve time by 50.57% in LLM-based agent systems.
Quick Take
SkillSmith introduces a boundary-first compiler-runtime framework that reduces token usage by 57.44% and solve time by 50.57% in LLM-based agent systems. By compiling skills into minimal executable interfaces, it minimizes context injection and reasoning overhead, enhancing task accuracy and efficiency. The framework allows for the reuse of compiled artifacts across models, improving performance even with less capable runtimes.
Key Points
- SkillSmith reduces solve-stage token usage by 57.44% compared to raw skills.
- Achieves a 50.57% reduction in solve time, making it 2.02x faster.
- Minimizes irrelevant context injection and redundant reasoning overhead.
- Compiled artifacts can be reused by smaller or more efficient runtime models.
- Evaluated on SkillsBench benchmark, demonstrating significant performance improvements.
Paper Resources
📖 Reader Mode
~2 min readAbstract:Recently, skills have been widely adopted in large language model (LLM)-based agent systems across various domains. In existing frameworks, skills are typically injected into the agent reasoning loop as contextual guidance once matched to a runtime task, enabling specialized task-solving capabilities. We find that this execution paradigm introduces two major sources of redundancy: irrelevant context injection and repeated skill-specific reasoning and planning. To this end, we propose SkillSmith, a boundary-first compiler-runtime framework that compiles skill packages offline into minimal executable interfaces. By extracting fine-grained operational boundaries from skills, SkillSmith enables agents to dynamically access and execute only the relevant components at runtime, thereby minimizing unnecessary context injection and redundant reasoning overhead. In the evaluation on SkillsBench benchmark, SkillSmith reduces solve-stage token usage by 57.44%, thinking iterations by 42.99%, solve time by 50.57% (2.02x faster), and token-proportional monetary cost by 57.44% compared with using raw-skills. Moreover, compiled artifacts produced by a stronger model can be reused by a smaller or more efficient runtime model, improving task accuracy in cases where raw skill interpretation fails. The source code and data are available at this https URL.
| Subjects: | Artificial Intelligence (cs.AI); Software Engineering (cs.SE) |
| Cite as: | arXiv:2605.15215 [cs.AI] |
| (or arXiv:2605.15215v1 [cs.AI] for this version) | |
| https://doi.org/10.48550/arXiv.2605.15215 arXiv-issued DOI via DataCite |
Submission history
From: Zaifeng Pan [view email]
[v1]
Tue, 12 May 2026 09:25:25 UTC (464 KB)
— Originally published at arxiv.org
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.AI
See more →Procedural Memory Distillation: Online Reflection for Self-Improving Language Models
Procedural Memory Distillation (PMD) enhances reinforcement learning by converting cross-episode signals into reusable memory, improving Qwen3-8B and OLMo3-Instruct-7B models by 3.8-5.5% on SCIKNOWEVAL and 7.9-13.6% on . The co-evolution of policy and memory allows for more effective self-supervision, demonstrating significant performance gains when both components are active.