Agentic evolution of physically constrained foundation models

arXiv cs.AI·Jiangwei Zhang, Wen Sun, Chong Wang, Shiyao Li, Cheng Che, Chunjing Han, Dan Meng, Jian Yang, Yu Wang, Rui Hou

16h ago

·~1 min·6/25/2026·en·0

Quick Answer

This paper shows that A new multi-agent discovery engine autonomously designs hardware-compliant systems, evolving methods like Q-Enhance and MoE-Salient-AQ that outperform human heuristics.

Quick Take

A new discovery engine autonomously designs hardware-compliant systems, evolving methods like Q-Enhance and MoE-Salient-AQ that outperform human heuristics. It successfully deployed a 235-billion-parameter model on a dual-A100 server, reducing memory needs by 75% with only a 0.64% accuracy drop.

Key Points

Engine evolved two hardware-aware compression methods: Q-Enhance and MoE-Salient-AQ.
Q-Enhance reduces long-context accuracy loss in dense models effectively.
MoE-Salient-AQ outperforms manual sparse designs by 3.7% in sub-3-bit regimes.
Successfully deployed a 235-billion-parameter model on a dual-A100 server.
Achieved 75% memory reduction with a minimal accuracy degradation of 0.64%.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Article Content

From source RSS / original summary

arXiv:2606. 25532v1 Announce Type: new Abstract: Artificial intelligence increasingly drives automated scientific discovery, yet contemporary generalist agents lack physical grounding, frequently hallucinating hardware-incompatible designs. Here, we present a physically grounded, discovery engine that autonomously architects hardware-compliant computing systems.

Anchored by an Evolutionary Knowledge Graph structuring past scientific innovations, the framework extracts an "algorithmic Chain-of-Thought" to transform blind stochastic search into directed structural evolution.

Applied to the extreme testbed of foundation model deployment, the engine evolved two hardware-aware compression methodologies surpassing human-engineered heuristics: Q-Enhance mitigates long-context accuracy loss in dense models, and MoE-Salient-AQ outperforms state-of-the-art manual sparse Mixture-of-Experts designs by 3. 7% at sub-3-bit regimes.

Utilizing a bandwidth-efficient Sensitivity Profile, we successfully deployed a massive 235-billion-parameter model onto a constrained dual-A100 server, reducing memory requirements by 75% with a marginal 0. 64% accuracy degradation. By transforming unconstrained combinatorial search into knowledge-driven autonomy, this establishes a scalable hardware-software co-design paradigm for machine-driven discovery within strict physical boundaries.

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.AI

See more →

arXiv cs.AI·Neha Prakriya, Chaojun Hou, Zheng Gong, Huasha Zhao, Xi Zhao, Mou Li, Zhenyu Gu, Emad Barsoum

1w ago

FeaturedOriginal

Arbor: Tree Search as a Cognition Layer for Autonomous Agents

AI Summary

Arbor introduces a framework utilizing structured tree search for optimizing LLM inference, achieving up to 193% throughput-latency improvement compared to vendor-optimized systems. It employs an Orchestrator and Critic agent for stability and coordination, demonstrating hardware-agnostic performance with minimal variance.

#LLM #Agent #Inference #AI Startup

Agentic evolution of physically constrained foundation models

Quick Answer

Quick Take

Key Points

Paper Resources

Article Content

Want this in your inbox every morning?

More from arXiv cs.AI

Arbor: Tree Search as a Cognition Layer for Autonomous Agents

CEO-Bench: Can Agents Play the Long Game?

ProfiLLM: Utility-Aligned Agentic User Profiling for Industrial Ride-Hailing Dispatch

Related in this space

Deploy Long-Context Reasoning and Agentic Workflows with MiniMax M3 on NVIDIA Accelerated Infrastructure

Deploy Self-Evolving Agents for Faster, More Secure Research with a Hermes Agent and NVIDIA NemoClaw

Run Local AI Agents with Faster Models and Multi-Node Clustering on NVIDIA DGX Spark