QUIVER: A Formal Framework for Quantifying Perturbation Propagation and Bifurcation in Compound AI Systems
Quick Take
QUIVER introduces a formal framework for quantifying perturbation propagation in compound AI systems, revealing distinct sensitivity profiles across architectures. Validated on 8,200+ traces, it identifies bifurcation thresholds and localizes evaluation artifacts, enhancing understanding of LLM pipeline dynamics.
Key Points
- Defines a sensitivity matrix categorizing edges as amplifiers, absorbers, or threshold-sensitive.
- Decomposes trajectory divergence into value drift, structural path divergence, and iteration count divergence.
- Identifies bifurcation thresholds for the smallest perturbation causing execution path changes.
- Validates across three distinct architectures, revealing unique sensitivity profiles.
- Localizes stale evaluation artifacts to specific node-field categories.
Article Content
From source RSS / original summaryarXiv:2605. 23956v1 Announce Type: new Abstract: Compound AI systems that chain multiple LLM calls into directed computation graphs are now the dominant architecture for production AI. Although these architectures leverage heterogeneous nodes with mixed-mode outputs, no existing framework quantifies how perturbations propagate through such pipelines, where nodes are stochastic and execution paths can diverge structurally.
We introduce QUIVER, a formal framework for measuring perturbation propagation in graph-structured LLM pipelines.
The framework defines: (1) a sensitivity matrix with type-dispatched distance metrics that classifies edges as amplifiers, absorbers, or threshold-sensitive, complemented by occurrence-lift; (2) trajectory divergence decomposing variation into value drift, structural path divergence, and iteration count divergence; (3) bifurcation thresholds identifying the smallest perturbation that causes structural execution path changes; and (4) distribution faithfulness, quantifying when per node evaluation datasets diverge from production distributions.
We validate on two production enterprise pipelines and a public DSPy multihop QA pipeline, three structurally distinct architectures.
Across 8,200+ instrumented traces (32,000+ pair comparisons), we demonstrate that QUIVER reveals distinct sensitivity profiles across architectures, distinguishes mechanistically different cascade patterns producing identical divergence rates, predicts nodes prone to trajectory bifurcation from observational data alone, and localizes stale evaluation artifacts to specific node-field categories that aggregate metrics cannot surface.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.AI
See more →The Importance of Out-of-Band Metadata for Safe Autonomous Agents: The Redpanda Agentic Data Plane
The Redpanda Agentic Data Plane (ADP) introduces out-of-band metadata channels to enhance the safety of autonomous AI agents, ensuring secure data access and tamper-proof audit trails. This architecture mitigates risks associated with unpredictable AI behavior by enforcing governance throughout the agent lifecycle, demonstrated in a multi-agent trading system with strict data scoping and approval thresholds.