
DeepSeek enters the fight for token volume, Anthropic continues to dominate spend
Quick Answer
DeepSeek's V4 Flash model surged to 17% of token volume in May, significantly impacting the market with costs 20-50x lower than Anthropic's models.
Quick Take
DeepSeek's V4 Flash model surged to 17% of token volume in May, significantly impacting the market with costs 20-50x lower than Anthropic's models. Despite this, Anthropic maintained a dominant 65% share of spending, indicating a split in budget strategies as teams increasingly route workloads based on cost and quality.
Key Points
- DeepSeek's token share rose from under 1% to 17% in May.
- Anthropic's spending share increased from 61% to 65% in the same period.
- DeepSeek V4 Flash costs $0.14 input / $0.28 output per million tokens.
- Teams are optimizing model routing for cost efficiency amidst rising overall spend.
- B2B applications cost 60% more per token compared to B2C applications.
Article Content
From source RSS / original summaryEvery month, routes tens of trillions of tokens between production applications and AI labs, giving us visibility into what AI usage actually looks like, separate from leaderboards and benchmarks. We publish the data monthly in the AI Gateway production index. AI GatewayLast month, headlines about blown token budgets dominated tech news: its annual Claude Code budget shortly after Q1 and Amazon to curb unproductive tokenmaxxing.
While runaway cost is a real problem, this month’s report shows that spend on production use cases still increased. Uber burned throughshut down KiroRankTwo insights emerged from AI Gateway data in May:From February to April, volume distribution across labs on AI Gateway changed slowly, but in May, DeepSeek V4's launch completely shifted token share. The low-cost end of the market that barely existed in April became AI Gateway’s third-largest provider by volume in May, without a significant impact on overall spend.
In April, DeepSeek accounted for less than 1% of AI Gateway tokens and less than 0. 2% of spend. In May, its volume share jumped to 17% of tokens, putting it in third place, ahead of OpenAI. Almost all of the volume comes from two models: and, both released in May. deepseek/deepseek-v4-flashdeepseek/deepseek-v4-proThe spend picture tells the other half of the story. Even though DeepSeek’s token share grew to 17% in a single month, its cost share stayed near 1%. DeepSeek V4 Flash launched at $0. 14 input / $0.
28 output per million tokens, roughly 20–50× lower than comparable Anthropic models and 8–12× lower than other value-tier flagships like Qwen 3. 6 Plus and Kimi K2. 6. With a savings gap that big, teams adopted V4 Flash quickly. Price alone wouldn’t have shifted DeepSeek’s volume that much in a month, meaning teams testing DeepSeek V4 against their existing evals found the output good enough to ship, not just low-cost enough to try.
Value-tier models have always been available on AI Gateway, but have never captured token share at this scale, meaning DeepSeek V4 was the first model at its price point to clear the quality bar for production work. Even as the low-cost end of the market grew fastest in volume, the expensive end grew faster in dollars. Anthropic’s token share grew from 26% to 32%, and its spend share from 61% to 65%.
OpenAI’s token share held near 13%, but its spend share ticked up from 12% to 13% on a much larger total, so customers were paying more per OpenAI token in May. The average token got more expensive in May, even with DeepSeek pulling the average down. That increase happened because the work that demands frontier models grew faster than the work that doesn’t.
The AI coding agent use case shows the low-cost/frontier split most clearly:Lower-cost models are now a meaningful part of production workflows, but frontier model use is still growing, driving the increase in overall spend. The frontier is getting more expensive per token, and customers are still paying. Anthropic continues to lead on spend, taking 65% of all gateway spend in May, and 70–80% of spend across every high-stakes use case.
Increased overall spend showed that demand for AI continued to grow in May, but teams applied more precision to their budgets through routing. They sent the cheap, high-volume work to lower-priced models and used frontier models where quality mattered most. Slow adoption of Google's latest Flash model is a clear example. Gemini 3. 5 Flash launched in May at a higher price point than Gemini 3. 0 Flash, but migration didn’t happen at scale. By month-end, 3. 5 held only 7% of the Flash family’s tokens while 3.
0 held 90%. Compared to the rapid adoption of Gemini 3. 1 Pro across February and March, slower migration to 3. 5 Flash shows that teams happy with 3. 0 Flash aren't willing to pay the higher cost yet. This month's report signals increased pricing sensitivity in the market, even as overall spend and token volume grow. That means developers are looking for ways to get more out of every dollar.
Data revealed two optimization strategies: Routing gives teams the ability to adjust their model mix, and budget, in real time as the labs compete for different layers of production AI workloads. B2B applications run fewer, more expensive calls, while B2C applications run many cheap ones. On a per-token basis, B2B cost roughly 60% more than B2C in May. Just under a quarter of requests end in a tool call, but those requests carry well over half of all tokens. Both metrics are roughly flat month-over-month.
The more requests an app serves, the more models it runs in production. Single-model setups dominate the lowest-volume tier, while at 1M+ requests the majority of apps route across 11 or more models. Use case cost share indicates how expensive a wrong answer is, not how many tokens it burns. Personal assistants and coding agents run cheap per token, while back-office and recruiting work costs far more. Read the.
April 2026 AI Gateway production indexThis analysis is based on anonymized, aggregate routing data from the Vercel AI Gateway through May 2026. A few notes on measurement:Read moreMay 2026 summaryToken vs cost share by B2B classificationAgent across tokens and requestsModel diversity distribution by request volumeCost vs volume share by use caseAbout this dataTotal AI Gateway tokens grew; total spend grew. Customers paid almost 20% more per token on average than in April.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from Vercel AI
See more →
Opus 4.8 on AI Gateway
Claude Opus 4.8, now available on Vercel AI Gateway, excels in long-horizon agentic execution and complex coding tasks, producing clearer prose for knowledge work. Users can access it via the .anthropic/claude-opus-4.8 model in the AI SDK, benefiting from a unified API with no markup on provider pricing.
