Claude Sonnet 4.5 leads SWE-Bench Verified at… | AI Deep Signal

Claude Sonnet 4.5 leads SWE-Bench Verified at 64.2%

Anthropic5/11/2026

·~3 min·5/11/2026·en·3

Quick Answer

Claude Sonnet 4.5 by Anthropic achieves a 64.2% score on SWE-Bench Verified, a significant increase from 53.7% with Sonnet 4.

Quick Take

Claude Sonnet 4.5 by Anthropic achieves a 64.2% score on Verified, a significant increase from 53.7% with Sonnet 4. Additionally, a new 200K-token context option has been introduced for the API, enhancing its capabilities for developers.

Key Points

Claude Sonnet 4.5 improves SWE-Bench Verified score from 53.7% to 64.2%.
Anthropic introduces a new 200K-token context option for its API.
The performance boost enhances usability for developers working with the model.
SWE-Bench Verified serves as a benchmark for evaluating AI models.

Article Excerpt

From source RSS / original summary

Claude Sonnet 4. 5 reaches 64. 2% on Verified, up from 53. 7% on Sonnet 4. Anthropic also released a 200K-token context option for the API.

Read on anthropic.com

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from this source

Anthropic5/13/2026

Original

Anthropic publishes Constitutional AI v3 — fewer refusals, better task completion

AI Summary

Anthropic's Constitutional AI v3 reduces refusal rates by 41% while maintaining safety regressions below baseline. This refined alignment technique employs self-critique against a smaller principle set and includes a contrastive reinforcement step, enhancing task completion efficiency.

#LLM #Enterprise AI

Anthropic Researcher Mode: Claude builds and runs its own experiments

Anthropic5/11/2026

Original

Anthropic Researcher Mode: Claude builds and runs its own experiments

AI Summary

Anthropic's Claude now features a Researcher Mode, enabling persistent compute, file system access, and a code execution sandbox. This allows the model to conduct multi-day investigations, run experiments, and generate detailed reports, enhancing its research capabilities significantly.

#Agent #AI Assistant