Safe and Adaptive Cloud Healing: Verifying LLM-Generated Recovery Plans with a Neural-Symbolic World Model
Quick Answer
The PASE framework introduces a Planning-Aware Semantic self-healing engine that utilizes LLMs for generating recovery plans and a Neural-Symbolic World Model for plan verification, achieving over 40% reduction in recovery time and improved fault detection accuracy in cloud systems.
Quick Take
The PASE framework introduces a Planning-Aware Semantic self-healing engine that utilizes LLMs for generating recovery plans and a Neural-Symbolic World Model for plan verification, achieving over 40% reduction in recovery time and improved fault detection accuracy in cloud systems.
Key Points
- PASE redefines recovery as a neuro-symbolic program synthesis task.
- Utilizes LLMs to generate structured recovery plans from semantic primitives.
- Achieves over 40% reduction in average system recovery time.
- Improves fault detection accuracy in unknown fault scenarios.
- Integrates reasoning, verification, and meta-learning for adaptive recovery.
Paper Resources
Article Content
From source RSS / original summaryarXiv:2607. 01595v1 Announce Type: new Abstract: As the scale and complexity of cloud-based AI systems continue to escalate, ensuring service reliability through rapid fault detection and adaptive recovery has become a critical challenge.
While existing approaches integrate Large Language Models (LLMs) for semantic understanding and Deep Reinforcement Learning (DRL) for policy optimization, they often rely on sequential, loosely coupled architectures that underutilize the generative and reasoning capabilities of LLMs. In this paper, we propose a paradigm shift with PASE, a Planning-Aware Semantic self-healing engine, a novel fault self-healing framework that reconceptualizes recovery as a neuro-symbolic program synthesis task.
PASE employs an LLM as a core Plan Synthesis Engine to generate structured recovery plans from a library of semantic primitives. A Neural-Symbolic World Model verifies plan feasibility through simulation, while a Meta-Prompt Optimizer, trained via DRL, learns to generate optimal prompts that guide the LLM's planning process. This tight reason-plan-verify-adapt loop enables dynamic, context-aware recovery strategy generation beyond predefined action spaces.
Experiments on a real-world cloud fault injection dataset demonstrate that PASE significantly outperforms state-of-the-art methods, reducing average system recovery time by over 40% and improving fault detection accuracy in unknown fault scenarios. Our framework advances autonomous system management by unifying LLM-based reasoning with model-assisted verification and meta-learned guidance.
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.AI
See more →Procedural Memory Distillation: Online Reflection for Self-Improving Language Models
Procedural Memory Distillation (PMD) enhances reinforcement learning by converting cross-episode signals into reusable memory, improving Qwen3-8B and OLMo3-Instruct-7B models by 3.8-5.5% on SCIKNOWEVAL and 7.9-13.6% on . The co-evolution of policy and memory allows for more effective self-supervision, demonstrating significant performance gains when both components are active.