QUIVER: A Formal Framework for Quantifying Perturbation Propagation and Bifurcation in Compound AI Systems

arXiv cs.AI·Prashanti Nilayam, Sankalp Nayak

4d ago

·~2 min·5/26/2026·en·0

Quick Take

QUIVER introduces a formal framework for quantifying perturbation propagation in compound AI systems, revealing distinct sensitivity profiles across architectures. Validated on 8,200+ traces, it identifies bifurcation thresholds and localizes evaluation artifacts, enhancing understanding of LLM pipeline dynamics.

Key Points

Defines a sensitivity matrix categorizing edges as amplifiers, absorbers, or threshold-sensitive.
Decomposes trajectory divergence into value drift, structural path divergence, and iteration count divergence.
Identifies bifurcation thresholds for the smallest perturbation causing execution path changes.
Validates across three distinct architectures, revealing unique sensitivity profiles.
Localizes stale evaluation artifacts to specific node-field categories.

Article Content

From source RSS / original summary

arXiv:2605. 23956v1 Announce Type: new Abstract: Compound AI systems that chain multiple LLM calls into directed computation graphs are now the dominant architecture for production AI. Although these architectures leverage heterogeneous nodes with mixed-mode outputs, no existing framework quantifies how perturbations propagate through such pipelines, where nodes are stochastic and execution paths can diverge structurally.

We introduce QUIVER, a formal framework for measuring perturbation propagation in graph-structured LLM pipelines.

The framework defines: (1) a sensitivity matrix with type-dispatched distance metrics that classifies edges as amplifiers, absorbers, or threshold-sensitive, complemented by occurrence-lift; (2) trajectory divergence decomposing variation into value drift, structural path divergence, and iteration count divergence; (3) bifurcation thresholds identifying the smallest perturbation that causes structural execution path changes; and (4) distribution faithfulness, quantifying when per node evaluation datasets diverge from production distributions.

We validate on two production enterprise pipelines and a public DSPy multihop QA pipeline, three structurally distinct architectures.

Across 8,200+ instrumented traces (32,000+ pair comparisons), we demonstrate that QUIVER reveals distinct sensitivity profiles across architectures, distinguishes mechanistically different cascade patterns producing identical divergence rates, predicts nodes prone to trajectory bifurcation from observational data alone, and localizes stale evaluation artifacts to specific node-field categories that aggregate metrics cannot surface.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.AI

See more →

arXiv cs.AI·Tyler Akidau, Tyler Rockwood, Johannes Br\"uderl, Marc Millstone

1d ago

FeaturedOriginal

The Importance of Out-of-Band Metadata for Safe Autonomous Agents: The Redpanda Agentic Data Plane

AI Summary

The Redpanda Agentic Data Plane (ADP) introduces out-of-band metadata channels to enhance the safety of autonomous AI agents, ensuring secure data access and tamper-proof audit trails. This architecture mitigates risks associated with unpredictable AI behavior by enforcing governance throughout the agent lifecycle, demonstrated in a multi-agent trading system with strict data scoping and approval thresholds.

#Agent #Robotics #Security #Policy