Claude Sonnet 4.5 leads SWE-Bench Verified at 64.2%
Quick Answer
Claude Sonnet 4.5 by Anthropic achieves a 64.2% score on SWE-Bench Verified, a significant increase from 53.7% with Sonnet 4.
Quick Take
Claude Sonnet 4.5 by Anthropic achieves a 64.2% score on Verified, a significant increase from 53.7% with Sonnet 4. Additionally, a new 200K-token context option has been introduced for the API, enhancing its capabilities for developers.
Key Points
- Claude Sonnet 4.5 improves SWE-Bench Verified score from 53.7% to 64.2%.
- Anthropic introduces a new 200K-token context option for its API.
- The performance boost enhances usability for developers working with the model.
- SWE-Bench Verified serves as a benchmark for evaluating AI models.
Article Excerpt
From source RSS / original summaryClaude Sonnet 4. 5 reaches 64. 2% on Verified, up from 53. 7% on Sonnet 4. Anthropic also released a 200K-token context option for the API.
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from this source
Anthropic publishes Constitutional AI v3 — fewer refusals, better task completion
Anthropic's Constitutional AI v3 reduces refusal rates by 41% while maintaining safety regressions below baseline. This refined alignment technique employs self-critique against a smaller principle set and includes a contrastive reinforcement step, enhancing task completion efficiency.
Anthropic Researcher Mode: Claude builds and runs its own experiments
Anthropic's Claude now features a Researcher Mode, enabling persistent compute, file system access, and a code execution sandbox. This allows the model to conduct multi-day investigations, run experiments, and generate detailed reports, enhancing its research capabilities significantly.