Response-Conditioned Parallel-to-Sequential Orchestration for Multi-Agent Systems

arXiv cs.CL·Nurbek Tastan, Alex Iacob, Lorenzo Sani, Meghdad Kurmanji, Nicholas D. Lane, Samuel Horvath, Karthik Nandakumar

5/18/2026

·~2 min·5/18/2026·en·5

Quick Answer

The Nexa framework introduces a hybrid response-conditioned policy for multi-agent systems, combining parallel and sequential orchestration to enhance communication efficiency and response accuracy.

Quick Take

The Nexa framework introduces a hybrid response-conditioned policy for , combining parallel and sequential orchestration to enhance communication efficiency and response accuracy. It utilizes a lightweight transformer model to create a directed acyclic communication graph, allowing for flexible agent interactions without external judges. The approach demonstrates generalizability across varying tasks and agent configurations.

Key Points

Nexa combines parallel execution with sequential message propagation for improved agent collaboration.
The framework avoids external LLM judges and hand-crafted topology searches.
Results show that learned communication policies are reusable across different tasks and agent numbers.
The hybrid model is based on a lightweight transformer architecture.
The directed acyclic graph ensures efficient communication without unnecessary latency.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

📖 Reader Mode

~2 min read

[Submitted on 15 May 2026]

View PDF HTML (experimental)

Abstract:Multi-agent systems can solve complex tasks through collaboration between multiple Large Language Model agents. Existing collaboration frameworks typically operate in either a parallel or a sequential mode. In the parallel mode, agents respond independently to queries followed by aggregation of responses. In contrast, sequential systems allow agents to communicate via a directed topology and refine one another step by step. However, both modes are inadequate for achieving the desired objectives of minimizing communication and latency while simultaneously maximizing the accuracy of the final response. In this work, we introduce a hybrid paradigm called Nexa, a trainable response-conditioned policy that bridges the gap between the two modes. Nexa begins with a parallel execution stage, embeds the resulting responses into a shared semantic space, and then predicts a sparse directed acyclic communication graph. If the graph is empty, the system remains purely parallel; if it is non-empty, the system performs one sequential message propagation. The policy is a lightweight transformer model, and the method avoids the need for external LLM judges or reward models, as well as hand-crafted test-time topology search. We formalize this hybrid execution problem, show that the resulting graph is acyclic by construction, and that the framework strictly subsumes pure parallel execution, and present a training procedure based on policy-gradient optimization. Results demonstrate that the response-conditioned policy learned by Nexa under one setting can be reused when the number of agents, the task, or the underlying agent changes, thus emphasizing the generalizability of the learned communication policy.

Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
Cite as:	arXiv:2605.15573 [cs.CL]
	(or arXiv:2605.15573v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2605.15573 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Nurbek Tastan [view email]
[v1] Fri, 15 May 2026 03:33:20 UTC (116 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CL

See more →

arXiv cs.CL·Barak Or

1w ago

FeaturedOriginal

Quantifying Prior Dominance in Systems

AI Summary

The study introduces the Normalized Context Utilization (NCU) metric to evaluate Retrieval-Augmented Generation (RAG) systems, revealing that Small Language Models (SLMs) outperform larger models in factual extraction. The findings indicate that traditional scaling laws yield diminishing returns, with a commercial API frequently failing against adversarial evidence due to systemic confidence collapse.

#LLM #AI Coding #Inference #AI Startup

Response-Conditioned Parallel-to-Sequential Orchestration for Multi-Agent Systems

Quick Answer

Quick Take

Key Points

Paper Resources

📖 Reader Mode

Submission history

Want this in your inbox every morning?

More from arXiv cs.CL

Quantifying Prior Dominance in Systems

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

When Plausible Is Not Realistic: Evaluating Human Mobility in LLM-Based Urban Simulation

Quick Answer

Quick Take

Key Points

Paper Resources

📖 Reader Mode

Submission history

Want this in your inbox every morning?

More from arXiv cs.CL

Quantifying Prior Dominance in RAG Systems

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

When Plausible Is Not Realistic: Evaluating Human Mobility in LLM-Based Urban Simulation

Quantifying Prior Dominance in Systems