NeuroMAS: Multi-Agent Systems as Neural Networks with Joint Reinforcement Learning
Quick Answer
NeuroMAS proposes a scalable neural-network-like architecture for multi-agent systems, utilizing LLM agents as nodes and reinforcement learning for communication.
Quick Take
NeuroMAS proposes a scalable neural-network-like architecture for , utilizing LLM agents as nodes and reinforcement learning for communication. This approach enhances parameter efficiency and performance, showing significant improvements over traditional multi-agent baselines, particularly when scaling from smaller systems.
Key Points
- NeuroMAS treats multi-agent systems as a trainable neural network architecture.
- Agent nodes are role-free but aware of structural information flow.
- The method shows significant performance improvements over existing multi-agent systems.
- Organizational scaling is path-dependent, favoring progressive growth from smaller systems.
- Theoretical insights suggest modular textual computation is more parameter-efficient.
Paper Resources
📖 Reader Mode
~2 min readAbstract:Multi-agent language systems are often built as hand-designed workflows, where agents are assigned semantic roles and communication protocols are specified in advance. We propose NeuroMAS, a method that first treats a multi-agent language system as a trainable and scalable neural-network-like architecture with LLM agents as nodes and intermediate textual signals as edges. In NeuroMAS, agent nodes are role-free but structure-aware: the topology only determines how information can flow in general, while reinforcement learning training determines how nodes communicate, specialize, and coordinate. This formulation shifts multi-agent design from workflow engineering toward architecture design, where depth, width, connectivity, and growth protocol become scalable sources of capability. Further, we provide a theoretical perspective showing why such modular textual computation is more parameter-efficient when tasks admit hierarchical decompositions. Experiments show that NeuroMAS improves significantly over both inference-time and trained multi-agent baselines. We further find that organizational scaling is path-dependent: larger systems can be challenging to train from scratch, but become feasible when grown progressively from smaller trained systems. These results suggest that learned neural multi-agent systems are a promising scaling axis for LLMs.
| Subjects: | Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Methodology (stat.ME); Machine Learning (stat.ML) |
| Cite as: | arXiv:2605.16757 [cs.AI] |
| (or arXiv:2605.16757v1 [cs.AI] for this version) | |
| https://doi.org/10.48550/arXiv.2605.16757 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Haoran Lu [view email]
[v1]
Sat, 16 May 2026 02:11:34 UTC (752 KB)
— Originally published at arxiv.org
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.AI
See more →Procedural Memory Distillation: Online Reflection for Self-Improving Language Models
Procedural Memory Distillation (PMD) enhances reinforcement learning by converting cross-episode signals into reusable memory, improving Qwen3-8B and OLMo3-Instruct-7B models by 3.8-5.5% on SCIKNOWEVAL and 7.9-13.6% on . The co-evolution of policy and memory allows for more effective self-supervision, demonstrating significant performance gains when both components are active.