Toward Reliable Design of LLM-Enabled Agentic Workflows: Optimizing Latency-Reliability-Cost Tradeoffs

arXiv cs.AI·Ya-Ting Yang, Quanyan Zhu

5/26/2026

·~1 min·5/26/2026·en·2

Quick Answer

This paper presents a framework for optimizing latency, reliability, and cost in LLM-enabled workflows, introducing performance models for both LLM and non-LLM agents.

Quick Take

This paper presents a framework for optimizing latency, reliability, and cost in LLM-enabled workflows, introducing performance models for both LLM and non-LLM agents. Key findings include a water-filling token allocation policy and optimal workflow reliability characterized by shadow prices, which can significantly impact the design of sequential workflows under given constraints.

Key Points

Introduces performance models for LLM and non-LLM agents to analyze tradeoffs.
Develops a water-filling token allocation policy for optimal resource distribution.
Characterizes optimal workflow reliability using shadow prices for better decision-making.
Focuses on sequential workflows under latency and cost constraints.
Impacts the design of AI systems that integrate multiple interacting agents.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Article Excerpt

From source RSS / original summary

arXiv:2605. 23929v1 Announce Type: new Abstract: Modern AI systems increasingly rely on workflows composed of multiple interacting agents, some powered by large language models (LLMs) and others by conventional computational modules. This paper analyzes the fundamental tradeoffs between latency, reliability, and cost in LLM-enabled agentic workflows.

We introduce performance models for both LLM and non-LLM agents that capture the relationship between computational effort and output quality, incorporating the impact of reasoning and output tokens for LLM agents using a parametric exponential reliability function. Then, we study the design of sequential workflows under latency and cost constraints. Main results include a water-filling token allocation policy and characterizations of optimal workflow reliability in terms of shadow prices.

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.AI

See more →

arXiv cs.AI·Mihnea C. Moldoveanu, Joel A. C. Baum

4d ago

FeaturedOriginal

Adversarial Social Epistemology for Assemblies of Humans and Large Language Models

AI Summary

The paper introduces Adversarial Social Epistemology (ASE) to analyze how agents manipulate trust in public communications, highlighting mechanisms that undermine the reliability of testimony and inference. It critiques existing frameworks like epistemic bubbles and misinformation diffusion, proposing a new language for understanding trust breaches and auditing inferential chains in densely interactive environments involving humans and large language models.

#LLM #Agent #Inference #Policy

Toward Reliable Design of LLM-Enabled Agentic Workflows: Optimizing Latency-Reliability-Cost Tradeoffs

Quick Answer

Quick Take

Key Points

Paper Resources

Article Excerpt

Want this in your inbox every morning?

More from arXiv cs.AI

Adversarial Social Epistemology for Assemblies of Humans and Large Language Models

Information Limits and Attractor Dynamics in Economies of Frontier LLM Agents: A Pre-Registered Test

Onnes: A Physics-Grounded LLM Simulator for Cryogenic Fault Diagnosis in Quantum Computing Infrastructure

Quick Answer

Quick Take

Key Points

Paper Resources

Article Excerpt

Want this in your inbox every morning?

More from arXiv cs.AI

Adversarial Social Epistemology for Assemblies of Humans and Large Language Models

Information Limits and Attractor Dynamics in Economies of Frontier LLM Agents: A Pre-Registered Test

Onnes: A Physics-Grounded Multi-Agent LLM Simulator for Cryogenic Fault Diagnosis in Quantum Computing Infrastructure

Onnes: A Physics-Grounded LLM Simulator for Cryogenic Fault Diagnosis in Quantum Computing Infrastructure