SkillFlow: Flow-Driven Recursive Skill Evolution for Agentic Orchestration

arXiv cs.AI·Mingda Zhang, Tiesunlong Shen, Haoran Luo, Wenjin Liu, Zikai Xiao, Erik Cambria, Xiaoying Tang

5/15/2026

·~2 min·5/15/2026·en·5

Quick Answer

SkillFlow introduces a flow-based framework for automating task orchestration in LLM-based systems, overcoming challenges like strategy collapse and opaque credit assignment.

Quick Take

SkillFlow introduces a flow-based framework for automating task orchestration in LLM-based systems, overcoming challenges like strategy collapse and opaque credit assignment. It significantly outperforms baselines across 14 datasets in tasks such as question answering and code generation, showcasing a robust recursive skill evolution mechanism.

Key Points

SkillFlow uses a trainable Supervisor and a dynamic skill library for orchestration.
Employs Tempered Trajectory Balance (TTB) to maintain diverse orchestration strategies.
Achieves transparent per-step credit assignment without additional inference costs.
Outperforms existing methods in question answering, reasoning, and decision-making tasks.
Code available at https://anonymous.4open.science/r/SkillFlow-E850.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Article Content

From source RSS / original summary

arXiv:2605. 14089v1 Announce Type: new Abstract: In recent years, a variety of powerful LLM-based agentic systems have been applied to automate complex tasks through task orchestration. However, existing orchestration methods still face key challenges, including strategy collapse under reward maximization, high gradient variance with opaque credit assignment, and unguided skill evolution whose decisions are typically made by directly prompting an LLM to judge rather than derived from principled training signals.

To address these challenges, we propose SkillFlow, a flow-based framework that takes a trainable Supervisor as the agent and a structured environment with dynamic skill library and frozen executor, automating task orchestration through multi-turn interaction. SkillFlow employs Tempered Trajectory Balance (TTB), a regression-based flow-matching loss that samples trajectories proportional to reward, preserving diverse orchestration strategies rather than collapsing to a single mode.

The same flow objective yields a jointly learned backward policy that provides transparent per-step credit assignment at zero additional inference cost. Building on these flow diagnostics, a recursive skill evolution mechanism determines when to evolve, what skills to create or prune, and where decision gaps lie -- closing the loop from training signal to autonomous capability growth.

Experimental results on 14 datasets show that SkillFlow significantly outperforms baselines across question answering, mathematical reasoning, code generation, and real-world interactive decision making tasks. Our code is available at https://anonymous. 4open. science/r/SkillFlow-E850.

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.AI

See more →

arXiv cs.AI·Binghai Wang, Chenlong Zhang, Dayiheng Liu, Jiajun Zhang, Jiawei Chen, Mouxiang Chen, Rongyao Fang, Siyuan Zhang, Xuwu Wang, Yuheng Jing, Zeyao Ma, Zeyu Cui

6d ago

FeaturedOriginal

The Verification Horizon: No Silver Bullet for Coding Agent Rewards

AI Summary

As coding agents evolve, verifying solutions becomes more challenging than generating them, necessitating a focus on scalable, faithful, and robust verification methods. The study reveals that no fixed reward function can sustain effectiveness as model capabilities advance, emphasizing the need for verification to evolve alongside solution generation.

#Agent #AI Coding #Inference #Policy