Beyond Static Evaluation: Co-Evolutionary Mechanisms for LLM-Driven Strategy Evolution in Adversarial Games
Quick Answer
This paper shows that FAMOU, a framework for LLM-driven strategy evolution, introduces co-evolutionary mechanisms that significantly enhance performance in adversarial games, achieving a 61.7% win rate against unseen opponents.
Quick Take
FAMOU, a framework for LLM-driven strategy evolution, introduces co-evolutionary mechanisms that significantly enhance performance in adversarial games, achieving a 61.7% win rate against unseen opponents. It outperformed baselines in the MCTF 2026 competition, demonstrating real-world applicability and innovative tactical structures through code-level evolution.
Key Points
- FAMOU achieved a combined score of 0.526 in MCTF 2026 3v3 task.
- Implemented three mechanisms: evaluator co-evolution, hierarchical deep evaluation, and weakness pressure.
- Demonstrated 61.7% win rate against unseen opponents, validating strategy generalization.
- LLM mutation produced novel tactical structures like lookahead search and adaptive interception.
- FAMOU secured 1st place in hardware round-robin and 3rd in simulation at AAMAS 2026.
Paper Resources
Article Content
From source RSS / original summaryarXiv:2606. 10389v1 Announce Type: new Abstract: Recent advances in LLM-driven code evolution have enabled automated discovery by iteratively generating and improving programs. However, applying these methods to adversarial multi-agent games introduces a fundamental challenge: the evaluation landscape shifts as strategies improve, causing fixed evaluators to become unreliable and evolution to stagnate.
We propose three mechanisms to address this challenge: evaluator co-evolution, which incorporates discovered champions into the opponent pool; hierarchical deep evaluation, which replaces noisy few-game scores with statistically reliable assessments; and weakness pressure, which dynamically up-weights the most difficult opponents to break through plateaus. We implement these mechanisms within FAMOU, a framework built upon the same foundation-model code-evolution paradigm as OpenEvolve and ShinkaEvolve.
On the MCTF 2026 3v3 maritime capture-the-flag task, FAMOU consistently outperforms both baselines under two backbone LLMs, achieving the highest combined score (0. 526) and the best generalization to unseen opponents (61. 7% win rate), while ablations confirm that each mechanism contributes to performance.
Notably, the LLM mutation process generates tactical structures entirely absent from the seed strategies -- including lookahead search and adaptive interception -- demonstrating that code-level evolution can produce nontrivial algorithmic innovations in adversarial settings. The FAMOU-evolved strategy further achieved 1st place in the hardware round-robin and 3rd in simulation at the AAMAS 2026 MCTF Competition, validating its real-world transferability.
The optimized implementation and corresponding evaluation codes developed through our evolutionary process are available at: https://github. com/1xiangliu1/FAMOU-CoEvo
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.AI
See more →Arbor: Tree Search as a Cognition Layer for Autonomous Agents
Arbor introduces a multi-agent framework utilizing structured tree search for optimizing LLM inference, achieving up to 193% throughput-latency improvement compared to vendor-optimized systems. It employs an Orchestrator and Critic agent for stability and coordination, demonstrating hardware-agnostic performance with minimal variance.