
Anthropic's new Claude Sonnet 5 closes the gap to the pricier Opus model series
Quick Answer
Anthropic's Claude Sonnet 5 surpasses Sonnet 4.6 and approaches Opus 4.8 in benchmarks, scoring 1,618 on GDPval-AA v2.
Quick Take
Anthropic's Claude Sonnet 5 surpasses Sonnet 4.6 and approaches Opus 4.8 in benchmarks, scoring 1,618 on GDPval-AA v2. Available now at an introductory price of $2 per million input tokens until August 2026, it features enhanced agentic capabilities while maintaining low cybersecurity risks.
Key Points
- Sonnet 5 beats Sonnet 4.6 across all tested categories, closing in on Opus 4.8.
- On GDPval-AA v2, Sonnet 5 scores 1,618, surpassing Opus 4.8's 1,615.
- Introductory pricing is $2 per million input tokens until August 2026.
- Sonnet 5 features improved agentic capabilities and lower cybersecurity risks.
- Cyber safeguards are enabled by default, blocking risky cyber usage.
📖 Reader Mode
~3 min readAnthropic released Claude Sonnet 5. In benchmarks, it closes in on the larger Opus 4.8 and even beats it in some areas. The model is available now at an introductory price.
Anthropic calls it the most agentic Sonnet yet: it can build plans, grab tools like browsers and terminals, and work on its own at a level that just months ago only bigger, pricier models could pull off, according to the company. Sonnet 5 is meant to close that gap.
Benchmarks show a clear jump over Sonnet 4.6
Anthropic's published benchmarks show Sonnet 5 beating its predecessor Sonnet 4.6 in every tested category while gaining ground on the pricier Opus 4.8. On agentic coding, Sonnet 5 hits 63.2 percent on SWE-bench Pro, up from 58.1 percent for Sonnet 4.6. Opus 4.8 sits at 69.2 percent. On Terminal-Bench 2.1, Sonnet 5 pulls 80.4 percent versus Sonnet 4.6's 67.0 percent. For multidisciplinary reasoning (Humanity's Last Exam), the model reaches 57.4 percent with tools, nearly matching Opus 4.8 at 57.9 percent. On computer use (OSWorld-Verified), Sonnet 5 posts 81.2 percent compared to 78.5 percent for its predecessor.

On the knowledge work benchmark GDPval-AA v2, which tests AI on real-world knowledge tasks, Sonnet 5 actually beats the larger Opus 4.8, scoring 1,618 to Opus's 1,615. Anthropic says feedback from early-access partners told the same story. Sonnet 5 acts far more agentically than previous versions, showing up in things like how it handles search tasks.

Cybersecurity isn't a concern this time
Lately, Anthropic has been making news for models it can't ship. The US government is blocking the company's two most capable models, Mythos 5 and Fable 5, over cybersecurity concerns. That context hangs over the Sonnet 5 launch. Anthropic is clearly eager to get ahead of any similar worries. The model wasn't trained on cybersecurity tasks, the company says, and in tests for risky capabilities like writing software exploits, it scores far below both Opus 4.8 and Mythos 5.

Sonnet 5 does score a bit higher than its predecessor on these tasks, though. So Anthropic has switched on cyber safeguards by default. They flag and block risky cyber usage in real time, on par with the protections already in place for Claude Opus 4.7 and 4.8. They're dialed back compared to Fable 5's guardrails, which users complained about almost immediately. Anthropic says it views the overall cybersecurity risk from Sonnet 5 as low.
On the safety front, the model does a better job turning down malicious requests and fending off prompt injection attacks than Sonnet 4.6, according to Anthropic. Hallucinations and sycophantic behavior, the tendency to just agree with whatever the user says, are down as well. Anthropic's full safety evaluation is in the Claude Sonnet 5 System Card.
Introductory pricing runs through August 2026
Claude Sonnet 5 is live now on all plans. It's the new default for Free and Pro users, and Max, Team, and Enterprise subscribers can access it too. Developers can plug it into Claude Code and the Claude Platform. On the API side, it goes by "claude-sonnet-5". The training cutoff is January 2026, with a one-million-token context window.
Until August 31, 2026, Anthropic is charging $2 per million input tokens and $10 per million output tokens. After that, prices jump to $3 and $15, which is what previous Sonnet models cost.
Real-world costs might tell a different story: Because the model works more agentically, it's likely to chew through more tokens per task. So even at the same per-token rate, running Sonnet 5 could end up costing more than its predecessors. The same thing happened when Opus went from 4.6 to 4.7.
— Originally published at the-decoder.com
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from The Decoder
See more →
An AI model programmed nonstop for 19 days on a single MirrorCode task that cost $2,600 to run
Epoch AI's MirrorCode benchmark reveals Claude Opus 4.7 as the leader with a 56% solve rate, reconstructing a 16,000-line toolkit in 14 hours. Despite this, all models tested struggle with the most complex tasks, highlighting limitations in current AI capabilities. The single task consumed $2,600 over 19 days, raising questions about cost-effectiveness in AI development.




