
Sort providers by cost, latency, or throughput on AI Gateway
Quick Answer
Vercel AI's Gateway now allows sorting providers by cost, time to first token (TTFT), or throughput (TPS), enhancing control over model selection.
Quick Take
Vercel AI's Gateway now allows sorting providers by cost, time to first token (TTFT), or throughput (TPS), enhancing control over model selection. This feature is particularly beneficial for high-volume, cost-sensitive tasks, ensuring optimal provider selection based on user-defined metrics without code changes.
Key Points
- Providers can be sorted by cost, TTFT, or TPS for optimized selection.
- New providers and pricing changes automatically update ranking without code changes.
- Sorting by cost is useful for routing through the lowest price provider.
- AI Gateway supports Zero Data Retention (ZDR) alongside sorting options.
- Routing metadata reveals provider rankings and metrics for transparency.
Article Content
From source RSS / original summaryYou can now sort the providers behind a model by cost, time to first token (TTFT), or throughput (TPS) in. AI GatewayThe default provider order blends provider reliability, quality of model output, cost, and speed of response. You can now use for explicit control over ranking criteria. sortFor models with many providers and noticeable cost or speed variation, you can use to optimize on your dimension of choice.
Ranking is computed at request time, so newly added providers, price changes, and shifts in observed latency or throughput flow through automatically without any code changes. sortSet on to one of the three values: sortproviderOptions. gatewayUse to ensure optimizing for your metric of choice. sortIn this example, AI Gateway has over five providers for with different prices, so sorting by cost is a useful option for requests that want to route through the lowest price provider.
GPT OSS 120BProviders are tried in sort order. Fallback to the next provider only happens when the higher-ranked one is unavailable. is compatible with other gateway routing options like Zero Data Retention (ZDR). sortThe example below uses for an interactive request where latency and data retention matter: AI Gateway filters to only providers for that have zero data retention, and then sorts the remaining providers by time to first token (TTFT).
deepseek/deepseek-v4-proDeepseek V4 Pro also composes with: providers listed in are promoted to the front, and the remaining providers follow the requested sort criterion. sortorderorderSee exactly why each request landed where it did. Every response includes a block in the routing metadata showing which providers were considered, the metric values used to rank them, the order they were attempted, and any that were deprioritized due to degraded health. sortFor more information on sorting via AI Gateway, read the.
documentationRead moreValueDescriptionDirectionWhen to use'cost'Sort by the provider's listed input price per million tokensLowest price firstHigh-volume, cost-sensitive work'ttft'Sort by median time to first token, in msLowest latency firstLatency-sensitive workloads where response speed matters 'tps'Sort by median tokens per second throughputHighest firstLong-output generation where total response time matters mostBasic usageCombine with other routing controlsInspecting routing decisions
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from Vercel AI
See more →
The Agent Stack
The Agent Stack by Vercel AI provides essential building blocks for creating production-grade agents, enabling seamless integration across multiple AI models and secure operations. It features components like AI Gateway for model routing, Workflow SDK for durable execution, and Vercel Connect for scoped access, streamlining agent development and deployment across various platforms.

