Model-Native Computing Architecture: Envisioning Future System Architecture Through the Lens of Computer Architecture
Quick Take
The paper proposes the Intelligent Computing Architecture Model (ICAM), a six-layer framework for model-native computing, addressing issues like cache reuse and agent scheduling in large language models (LLMs) such as Codex and Claude Code. It introduces design laws to optimize performance and highlights the need for a unified model in LLM systems, while also outlining a research roadmap for future developments.
Key Points
- ICAM resolves the debate on LLMs as CPUs or operating systems with a dual-plane view.
- Introduces three design laws for optimizing cache reuse, context management, and agent collaboration.
- Validates design laws against existing system-level data and agentic software practices.
- Highlights the need for a unified model in the emerging model-native stack.
- Outlines a research roadmap for advancing model-native computing.
Article Content
From source RSS / original summaryarXiv:2606. 00288v1 Announce Type: new Abstract: Large language models are undergoing a transition from model technology to system technology. As developers use Codex, Claude Code, AutoGPT, and related agents to write code, manage projects, and execute multi-step tasks, recurring engineering problems such as cache reuse, context management, agent scheduling, and permission control increasingly resemble classical computer systems problems. This paper develops that analogy as a visionary survey.
We map concepts from computer architecture to the emerging model-native stack and review work on LLM-as-OS, memory management, agent frameworks, tool protocols, multi-agent coordination, cognitive architectures, and safety governance. We argue that these strands address different layers of the same system but lack a unified model. To fill this gap, we propose the Intelligent Computing Architecture Model (ICAM), a six-layer framework for model-native computing with explicit interface contracts and design axioms.
ICAM resolves the apparent tension over whether an LLM is more like a CPU or an operating system through a dual-plane view: a probabilistic execution plane concerned with what can be computed, and a deterministic control plane concerned with what should be computed.
We further introduce three design laws: the Semantic Locality Law for KV-cache reuse and inference speedup, the Context Budget Law for effective working sets under finite windows and attention decay, and the Agent Speedup Law for diminishing returns in multi-agent collaboration. We validate these laws against published system-level data and relate them to recent evidence on agentic software practices.
We conclude by identifying where the analogy breaks down and outlining a research roadmap for model-native computing. This is a conceptual and survey contribution; it does not report new experiments.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.AI
See more →MindGames Arena Generalization Track: In2AI Solution with Delayed Per-Step Reward Attribution
The In2AI solution introduces delayed per-step reward attribution for training language model agents in multi-agent environments, achieving top performance on the MindGames Arena benchmark at NeurIPS 2025. An 8-billion-parameter model outperformed larger proprietary systems, including GPT-5, in competitive play, demonstrating enhanced stability and sample efficiency in reinforcement learning.