AgentWall: A Runtime Safety Layer for Local AI Agents
Quick Answer
AgentWall introduces a runtime safety layer for local AI agents, ensuring actions comply with a declarative policy and requiring human approval for sensitive operations.
Quick Take
AgentWall introduces a runtime safety layer for local AI agents, ensuring actions comply with a declarative policy and requiring human approval for sensitive operations. Achieving 92.9% policy enforcement accuracy with minimal overhead, it operates across multiple platforms including Claude Desktop and OpenClaw, enhancing safety for developers managing sensitive environments.
Key Points
- AgentWall intercepts agent actions before execution, evaluating them against a safety policy.
- Requires human approval for sensitive operations to prevent unsafe actions.
- Demonstrated 92.9% policy enforcement accuracy with sub-millisecond overhead.
- Implemented as a policy-enforcing proxy and OpenClaw plugin.
- Open-source implementation available for developers to enhance local AI safety.
Paper Resources
📖 Reader Mode
~2 min readAbstract:The safety of autonomous AI agents is increasingly recognized as a critical open problem. As agents transition from passive text generators to active actors capable of executing shell commands, modifying files, calling APIs, and browsing the web, the consequences of unsafe or adversarially manipulated behavior become immediate and tangible. Existing AI safety work has focused primarily on model alignment and input filtering, but these approaches do not address what happens at the moment an agent's intent becomes a real action on a real machine. This gap is especially acute in local environments, where developers run agents against their own filesystems, credentials, and infrastructure with little runtime control. This paper introduces AgentWall, a runtime safety and observability layer for local AI agents. AgentWall intercepts every proposed agent action before it reaches the host environment, evaluates it against an explicit declarative policy, requires human approval for sensitive operations, and records a complete execution trail for audit and replay. It is implemented as a policy-enforcing MCP proxy and native OpenClaw plugin, working across Claude Desktop, Cursor, Windsurf, Claude Code, and OpenClaw with a single install command. We present the design, architecture, threat model, and policy model of AgentWall, and demonstrate 92.9% policy enforcement accuracy with sub-millisecond overhead across 14 benchmark tests. AgentWall is open-source at this https URL.
| Comments: | 16 pages, 2 figures, open-source implementation at this https URL |
| Subjects: | Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR) |
| Cite as: | arXiv:2605.16265 [cs.AI] |
| (or arXiv:2605.16265v1 [cs.AI] for this version) | |
| https://doi.org/10.48550/arXiv.2605.16265 arXiv-issued DOI via DataCite |
Submission history
From: Ashwin Aravind [view email]
[v1]
Tue, 24 Mar 2026 11:39:35 UTC (14 KB)
— Originally published at arxiv.org
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.AI
See more →Procedural Memory Distillation: Online Reflection for Self-Improving Language Models
Procedural Memory Distillation (PMD) enhances reinforcement learning by converting cross-episode signals into reusable memory, improving Qwen3-8B and OLMo3-Instruct-7B models by 3.8-5.5% on SCIKNOWEVAL and 7.9-13.6% on . The co-evolution of policy and memory allows for more effective self-supervision, demonstrating significant performance gains when both components are active.