NeuroMAS: Multi-Agent Systems as Neural Networks with Joint Reinforcement Learning

arXiv cs.AI·Haoran Lu, Luyang Fang, Wenxuan Zhong, Ping Ma

5/19/2026

·~2 min·5/19/2026·en·11

Quick Answer

NeuroMAS proposes a scalable neural-network-like architecture for multi-agent systems, utilizing LLM agents as nodes and reinforcement learning for communication.

Quick Take

NeuroMAS proposes a scalable neural-network-like architecture for , utilizing LLM agents as nodes and reinforcement learning for communication. This approach enhances parameter efficiency and performance, showing significant improvements over traditional multi-agent baselines, particularly when scaling from smaller systems.

Key Points

NeuroMAS treats multi-agent systems as a trainable neural network architecture.
Agent nodes are role-free but aware of structural information flow.
The method shows significant performance improvements over existing multi-agent systems.
Organizational scaling is path-dependent, favoring progressive growth from smaller systems.
Theoretical insights suggest modular textual computation is more parameter-efficient.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

📖 Reader Mode

~2 min read

[Submitted on 16 May 2026]

View PDF HTML (experimental)

Abstract:Multi-agent language systems are often built as hand-designed workflows, where agents are assigned semantic roles and communication protocols are specified in advance. We propose NeuroMAS, a method that first treats a multi-agent language system as a trainable and scalable neural-network-like architecture with LLM agents as nodes and intermediate textual signals as edges. In NeuroMAS, agent nodes are role-free but structure-aware: the topology only determines how information can flow in general, while reinforcement learning training determines how nodes communicate, specialize, and coordinate. This formulation shifts multi-agent design from workflow engineering toward architecture design, where depth, width, connectivity, and growth protocol become scalable sources of capability. Further, we provide a theoretical perspective showing why such modular textual computation is more parameter-efficient when tasks admit hierarchical decompositions. Experiments show that NeuroMAS improves significantly over both inference-time and trained multi-agent baselines. We further find that organizational scaling is path-dependent: larger systems can be challenging to train from scratch, but become feasible when grown progressively from smaller trained systems. These results suggest that learned neural multi-agent systems are a promising scaling axis for LLMs.

Subjects:	Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Methodology (stat.ME); Machine Learning (stat.ML)
Cite as:	arXiv:2605.16757 [cs.AI]
	(or arXiv:2605.16757v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2605.16757 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Haoran Lu [view email]
[v1] Sat, 16 May 2026 02:11:34 UTC (752 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.AI

See more →

arXiv cs.AI·Ye Liu, Srijan Bansal, Bo Pang, Yang Li, Zeyu Leo Liu, Yifei Ming, Zixuan Ke, Shafiq Joty, Semih Yavuz

1d ago

FeaturedOriginal

Procedural Memory Distillation: Online Reflection for Self-Improving Language Models

AI Summary

Procedural Memory Distillation (PMD) enhances reinforcement learning by converting cross-episode signals into reusable memory, improving Qwen3-8B and OLMo3-Instruct-7B models by 3.8-5.5% on SCIKNOWEVAL and 7.9-13.6% on . The co-evolution of policy and memory allows for more effective self-supervision, demonstrating significant performance gains when both components are active.

#LLM #AI Coding #Inference #Policy