OpenAI reportedly cut response costs for guest ChatGPT users by more than half

The Decoder·Matthias Bastian

7h ago

·~1 min·6/30/2026·en·0

Quick Answer

OpenAI has reduced inference costs for guest ChatGPT users by over 50%, requiring only a few hundred Nvidia GPUs.

Quick Take

OpenAI has reduced inference costs for guest ChatGPT users by over 50%, requiring only a few hundred Nvidia GPUs. This optimization raises questions about its applicability to full-featured accounts, while Deepseek's new method promises a 60-85% speed increase in inference requests.

Key Points

Inference costs for guest ChatGPT users cut by over 50%.
Only a few hundred Nvidia GPUs are now needed for guest access.
Deepseek introduced an open-source method speeding up inference by 60-85%.
Optimizations may not apply to full-featured ChatGPT accounts.
Data center buildouts are slow, impacting chip demand.

📖 Reader Mode

~1 min read

Matthias Bastian

OpenAI engineers told colleagues earlier this month that they'd managed to cut inference costs—the expense of running existing AI models—by more than half. That's according to a person familiar with the discussions, as reported by The Information.

OpenAI applied the new optimizations to ChatGPT, specifically for visitors who don't have an account. The number of Nvidia GPUs needed to serve those users dropped to just a few hundred. It's not clear how many were required before or what techniques OpenAI used to pull it off. Guest users can only access a very limited set of ChatGPT features, so whether these gains would carry over to the full product is an open question.

Deepseek also just dropped a new open-source method that can speed up inference requests by 60 to 85 percent. The freed-up resources could go toward scaling services, better models, faster responses, or bigger margins. But since data center buildouts are moving slowly, gains like these will probably give labs more breathing room rather than cut into chip demand.

— Originally published at the-decoder.com

Continue reading on the-decoder.com

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from The Decoder

See more →

An AI model programmed nonstop for 19 days on a single MirrorCode task that cost $2,600 to run

The Decoder·Matthias Bastian

4d ago

FeaturedOriginal

An AI model programmed nonstop for 19 days on a single MirrorCode task that cost $2,600 to run

AI Summary

Epoch AI's MirrorCode benchmark reveals Claude Opus 4.7 as the leader with a 56% solve rate, reconstructing a 16,000-line toolkit in 14 hours. Despite this, all models tested struggle with the most complex tasks, highlighting limitations in current AI capabilities. The single task consumed $2,600 over 19 days, raising questions about cost-effectiveness in AI development.

#LLM #AI Coding #Inference #AI Startup

OpenAI reportedly cut response costs for guest ChatGPT users by more than half

Quick Answer

Quick Take

Key Points

📖 Reader Mode

Want this in your inbox every morning?

More from The Decoder

An AI model programmed nonstop for 19 days on a single MirrorCode task that cost $2,600 to run

Cursor announces its own AI model, a new Git platform, and a mobile app

OpenAI models now available on Amazon Web Services

Related in this space

Deploy a Production-Ready NVIDIA AI-Q Blueprint on Oracle Cloud Infrastructure

Deploy Long-Context Reasoning and Agentic Workflows with MiniMax M3 on NVIDIA Accelerated Infrastructure

Deploy Self-Evolving Agents for Faster, More Secure Research with a Hermes Agent and NVIDIA NemoClaw