Anthropic's new Claude Sonnet 5 closes the gap to the pricier Opus model series

The Decoder·Matthias Bastian

6h ago

·~3 min·6/30/2026·en·0

Quick Answer

Anthropic's Claude Sonnet 5 surpasses Sonnet 4.6 and approaches Opus 4.8 in benchmarks, scoring 1,618 on GDPval-AA v2.

Quick Take

Anthropic's Claude Sonnet 5 surpasses Sonnet 4.6 and approaches Opus 4.8 in benchmarks, scoring 1,618 on GDPval-AA v2. Available now at an introductory price of $2 per million input tokens until August 2026, it features enhanced agentic capabilities while maintaining low cybersecurity risks.

Key Points

Sonnet 5 beats Sonnet 4.6 across all tested categories, closing in on Opus 4.8.
On GDPval-AA v2, Sonnet 5 scores 1,618, surpassing Opus 4.8's 1,615.
Introductory pricing is $2 per million input tokens until August 2026.
Sonnet 5 features improved agentic capabilities and lower cybersecurity risks.
Cyber safeguards are enabled by default, blocking risky cyber usage.

📖 Reader Mode

~3 min read

Matthias Bastian

Anthropic released Claude Sonnet 5. In benchmarks, it closes in on the larger Opus 4.8 and even beats it in some areas. The model is available now at an introductory price.

Anthropic calls it the most agentic Sonnet yet: it can build plans, grab tools like browsers and terminals, and work on its own at a level that just months ago only bigger, pricier models could pull off, according to the company. Sonnet 5 is meant to close that gap.

Benchmarks show a clear jump over Sonnet 4.6

Anthropic's published benchmarks show Sonnet 5 beating its predecessor Sonnet 4.6 in every tested category while gaining ground on the pricier Opus 4.8. On agentic coding, Sonnet 5 hits 63.2 percent on SWE-bench Pro, up from 58.1 percent for Sonnet 4.6. Opus 4.8 sits at 69.2 percent. On Terminal-Bench 2.1, Sonnet 5 pulls 80.4 percent versus Sonnet 4.6's 67.0 percent. For multidisciplinary reasoning (Humanity's Last Exam), the model reaches 57.4 percent with tools, nearly matching Opus 4.8 at 57.9 percent. On computer use (OSWorld-Verified), Sonnet 5 posts 81.2 percent compared to 78.5 percent for its predecessor.

Sonnet 5 beats its predecessor, Sonnet 4.6, across every tested category and closes in on the pricier Opus 4.8. On knowledge work (GDPval-AA v2), Sonnet 5 even edges past Opus 4.8 with 1,618 points versus 1,615. | Image: Anthropic

On the knowledge work benchmark GDPval-AA v2, which tests AI on real-world knowledge tasks, Sonnet 5 actually beats the larger Opus 4.8, scoring 1,618 to Opus's 1,615. Anthropic says feedback from early-access partners told the same story. Sonnet 5 acts far more agentically than previous versions, showing up in things like how it handles search tasks.

Agentic search performance on BrowseComp by effort level and cost per task. Sonnet 5 (orange) clearly outperforms Sonnet 4.6 (gray) at every level while offering cheaper entry points. Opus 4.8 (yellow) stays ahead at the highest effort settings. | Image: Anthropic

Cybersecurity isn't a concern this time

Lately, Anthropic has been making news for models it can't ship. The US government is blocking the company's two most capable models, Mythos 5 and Fable 5, over cybersecurity concerns. That context hangs over the Sonnet 5 launch. Anthropic is clearly eager to get ahead of any similar worries. The model wasn't trained on cybersecurity tasks, the company says, and in tests for risky capabilities like writing software exploits, it scores far below both Opus 4.8 and Mythos 5.

Firefox 147 exploit evaluation. Like its predecessor Sonnet 4.6, Sonnet 5 couldn't develop a fully working exploit but shows a slightly higher partial control rate at 13.2 percent. Mythos 5 and Opus 4.8 are far more capable at this task. | Image: Anthropic

Sonnet 5 does score a bit higher than its predecessor on these tasks, though. So Anthropic has switched on cyber safeguards by default. They flag and block risky cyber usage in real time, on par with the protections already in place for Claude Opus 4.7 and 4.8. They're dialed back compared to Fable 5's guardrails, which users complained about almost immediately. Anthropic says it views the overall cybersecurity risk from Sonnet 5 as low.

On the safety front, the model does a better job turning down malicious requests and fending off prompt injection attacks than Sonnet 4.6, according to Anthropic. Hallucinations and sycophantic behavior, the tendency to just agree with whatever the user says, are down as well. Anthropic's full safety evaluation is in the Claude Sonnet 5 System Card.

Introductory pricing runs through August 2026

Claude Sonnet 5 is live now on all plans. It's the new default for Free and Pro users, and Max, Team, and Enterprise subscribers can access it too. Developers can plug it into Claude Code and the Claude Platform. On the API side, it goes by "claude-sonnet-5". The training cutoff is January 2026, with a one-million-token context window.

Until August 31, 2026, Anthropic is charging $2 per million input tokens and $10 per million output tokens. After that, prices jump to $3 and $15, which is what previous Sonnet models cost.

Real-world costs might tell a different story: Because the model works more agentically, it's likely to chew through more tokens per task. So even at the same per-token rate, running Sonnet 5 could end up costing more than its predecessors. The same thing happened when Opus went from 4.6 to 4.7.

— Originally published at the-decoder.com

Continue reading on the-decoder.com

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from The Decoder

See more →

An AI model programmed nonstop for 19 days on a single MirrorCode task that cost $2,600 to run

The Decoder·Matthias Bastian

4d ago

FeaturedOriginal

An AI model programmed nonstop for 19 days on a single MirrorCode task that cost $2,600 to run

AI Summary

Epoch AI's MirrorCode benchmark reveals Claude Opus 4.7 as the leader with a 56% solve rate, reconstructing a 16,000-line toolkit in 14 hours. Despite this, all models tested struggle with the most complex tasks, highlighting limitations in current AI capabilities. The single task consumed $2,600 over 19 days, raising questions about cost-effectiveness in AI development.

#LLM #AI Coding #Inference #AI Startup

Anthropic's new Claude Sonnet 5 closes the gap to the pricier Opus model series

Quick Answer

Quick Take

Key Points

📖 Reader Mode

Benchmarks show a clear jump over Sonnet 4.6

Cybersecurity isn't a concern this time

Introductory pricing runs through August 2026

Want this in your inbox every morning?

More from The Decoder

An AI model programmed nonstop for 19 days on a single MirrorCode task that cost $2,600 to run

Cursor announces its own AI model, a new Git platform, and a mobile app

OpenAI models now available on Amazon Web Services

Related in this space

Deploy a Production-Ready NVIDIA AI-Q Blueprint on Oracle Cloud Infrastructure

Deploy Self-Evolving Agents for Faster, More Secure Research with a Hermes Agent and NVIDIA NemoClaw

As AI agents become employees, NewCore emerges with $66M to give them identities