Response-Conditioned Parallel-to-Sequential Orchestration for Multi-Agent Systems
Quick Take
Nexa introduces a hybrid orchestration for multi-agent systems, enhancing communication efficiency and accuracy.
Key Points
- Combines parallel and sequential execution modes.
- Utilizes a lightweight transformer model for policy learning.
- Demonstrates generalizability across different tasks and agents.
📖 Reader Mode
~2 min readAbstract:Multi-agent systems can solve complex tasks through collaboration between multiple Large Language Model agents. Existing collaboration frameworks typically operate in either a parallel or a sequential mode. In the parallel mode, agents respond independently to queries followed by aggregation of responses. In contrast, sequential systems allow agents to communicate via a directed topology and refine one another step by step. However, both modes are inadequate for achieving the desired objectives of minimizing communication and latency while simultaneously maximizing the accuracy of the final response. In this work, we introduce a hybrid paradigm called Nexa, a trainable response-conditioned policy that bridges the gap between the two modes. Nexa begins with a parallel execution stage, embeds the resulting responses into a shared semantic space, and then predicts a sparse directed acyclic communication graph. If the graph is empty, the system remains purely parallel; if it is non-empty, the system performs one sequential message propagation. The policy is a lightweight transformer model, and the method avoids the need for external LLM judges or reward models, as well as hand-crafted test-time topology search. We formalize this hybrid execution problem, show that the resulting graph is acyclic by construction, and that the framework strictly subsumes pure parallel execution, and present a training procedure based on policy-gradient optimization. Results demonstrate that the response-conditioned policy learned by Nexa under one setting can be reused when the number of agents, the task, or the underlying agent changes, thus emphasizing the generalizability of the learned communication policy.
| Subjects: | Computation and Language (cs.CL); Machine Learning (cs.LG); Multiagent Systems (cs.MA) |
| Cite as: | arXiv:2605.15573 [cs.CL] |
| (or arXiv:2605.15573v1 [cs.CL] for this version) | |
| https://doi.org/10.48550/arXiv.2605.15573 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Nurbek Tastan [view email]
[v1]
Fri, 15 May 2026 03:33:20 UTC (116 KB)
— Originally published at arxiv.org
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The reliability of LLM judges for evaluating deep research agents is critically assessed using the REFLECT benchmark.