Model-Native Computing Architecture: Envisioning Future System Architecture Through the Lens of Computer Architecture

6/2/2026

·~2 min·6/2/2026·en·3

Quick Answer

Quick Take

The paper proposes the Intelligent Computing Architecture Model (ICAM), a six-layer framework for model-native computing, addressing issues like cache reuse and agent scheduling in large language models (LLMs) such as Codex and Claude Code. It introduces design laws to optimize performance and highlights the need for a unified model in LLM systems, while also outlining a research roadmap for future developments.

Key Points

ICAM resolves the debate on LLMs as CPUs or operating systems with a dual-plane view.
Introduces three design laws for optimizing cache reuse, context management, and agent collaboration.
Validates design laws against existing system-level data and agentic software practices.
Highlights the need for a unified model in the emerging model-native stack.
Outlines a research roadmap for advancing model-native computing.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Article Content

From source RSS / original summary

arXiv:2606. 00288v1 Announce Type: new Abstract: Large language models are undergoing a transition from model technology to system technology. As developers use Codex, Claude Code, AutoGPT, and related agents to write code, manage projects, and execute multi-step tasks, recurring engineering problems such as cache reuse, context management, agent scheduling, and permission control increasingly resemble classical computer systems problems. This paper develops that analogy as a visionary survey.

We map concepts from computer architecture to the emerging model-native stack and review work on LLM-as-OS, memory management, agent frameworks, tool protocols, coordination, cognitive architectures, and safety governance. We argue that these strands address different layers of the same system but lack a unified model. To fill this gap, we propose the Intelligent Computing Architecture Model (ICAM), a six-layer framework for model-native computing with explicit interface contracts and design axioms.

ICAM resolves the apparent tension over whether an LLM is more like a CPU or an operating system through a dual-plane view: a probabilistic execution plane concerned with what can be computed, and a deterministic control plane concerned with what should be computed.

We further introduce three design laws: the Semantic Locality Law for KV-cache reuse and inference speedup, the Context Budget Law for effective working sets under finite windows and attention decay, and the Agent Speedup Law for diminishing returns in multi-agent collaboration. We validate these laws against published system-level data and relate them to recent evidence on agentic software practices.

We conclude by identifying where the analogy breaks down and outlining a research roadmap for model-native computing. This is a conceptual and survey contribution; it does not report new experiments.

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.AI

See more →

arXiv cs.AI·David Krongauz, Arad Zulti, Eran Segal, Teddy Lazebnik

1d ago

FeaturedOriginal

Automatic Ordinary Differential Equations Discovery For Biological Systems Using Large Language Model Powered Agentic System

AI Summary

The MEDA system utilizes large language models and symbolic regression to autonomously discover ordinary differential equations for biological systems, achieving strong structural recovery and biologically plausible models. It outperforms existing methods by integrating domain knowledge and mechanistic constraints, demonstrating effective retrieval and extrapolation capabilities.

#LLM #Agent #Inference #AI Startup