Sustaining AI safety: Control-theoretic external impossibility, intrinsic necessity, and structural requirements

arXiv cs.AI·James M. Mazzu

3d ago

·~2 min·5/14/2026·en·1

Quick Take

The paper analyzes AI safety strategies using control theory, highlighting limits of external enforcement.

Key Points

Proves external impossibility for safety strategies reliant on enforcement.
Identifies intrinsic necessity for viable safety-sustaining strategies.
Outlines four structural requirements for AI safety.

📖 Reader Mode

~2 min read

[Submitted on 13 May 2026]

View PDF HTML (experimental)

Abstract:As AI systems become increasingly capable, safety strategies must be evaluated not only by how much they reduce present risk, but by whether they could sustain safety once external control can no longer reliably constrain system behavior. This paper addresses that problem by using control theory to clarify, at a structural level, whether externally enforced safety-sustaining strategies can succeed and, if not, what any alternative strategy would have to satisfy in order to be viable. It establishes two main results. First, under explicit premises including a reachability condition, it proves a class-wide external impossibility result: once the system's effects exceed what bounded external control can counteract, no strategy that depends in any degree on continued external enforcement can sustain AI safety. This failure is structural across the entire externally enforced class rather than contingent on any particular strategy. Second, it establishes a conditional class-level necessity result: if at least one candidate safety-sustaining strategy remains after that elimination, then all such remaining strategies must be intrinsic. It then states four structural requirements for viability: safety may not depend on continued external enforcement; the system's terminal objective must be safety-compatible when first formed; that objective must remain stable under self-modification; and safety must continue to be preserved as capability grows. The paper does not propose a complete strategy for sustaining AI safety. Its contribution is to give formal structure to a widely held concern about the limits of external control. It does so by deriving explicit conditional results that identify which safety-sustaining strategies are ruled out and what any remaining strategies must satisfy.

Subjects:	Artificial Intelligence (cs.AI)
Cite as:	arXiv:2605.12963 [cs.AI]
	(or arXiv:2605.12963v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2605.12963 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: James Mazzu [view email]
[v1] Wed, 13 May 2026 03:56:04 UTC (139 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

Sustaining AI safety: Control-theoretic external impossibility, intrinsic necessity, and structural requirements

Quick Take

Key Points

📖 Reader Mode

Submission history

More from arXiv cs.AI

Invisible Orchestrators Suppress Protective Behavior and Dissociate Power-Holders: Safety Risks in Multi-Agent LLM Systems

Distribution-Aware Algorithm Design with LLM Agents

Enhanced and Efficient Reasoning in Large Learning Models

Related in this space

Auditing Agent Harness Safety

Measuring and Mitigating Toxicity in Large Language Models: A Comprehensive Replication Study