Anthropic publishes Constitutional AI v3 — fewer refusals, better task completion
Quick Answer
Anthropic's Constitutional AI v3 reduces refusal rates by 41% while maintaining safety regressions below baseline.
Quick Take
Anthropic's Constitutional AI v3 reduces refusal rates by 41% while maintaining safety regressions below baseline. This refined alignment technique employs self-critique against a smaller principle set and includes a contrastive reinforcement step, enhancing task completion efficiency.
Key Points
- Constitutional AI v3 achieves a 41% reduction in refusal rates.
- Safety regressions are maintained below baseline levels.
- Utilizes self-critique against a smaller set of principles.
- Incorporates contrastive reinforcement for improved outcomes.
- Enhances overall task completion efficiency.
Article Excerpt
From source RSS / original summaryAnthropic released Constitutional AI v3, a refined alignment technique that reduces over-refusal rates by 41% while keeping safety regressions below baseline. The technique uses self-critique against a smaller principle set with a contrastive reinforcement step.
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from this source
Claude Sonnet 4.5 leads Verified at 64.2%
Claude Sonnet 4.5 by Anthropic achieves a 64.2% score on SWE-Bench Verified, a significant increase from 53.7% with Sonnet 4. Additionally, a new 200K-token context option has been introduced for the API, enhancing its capabilities for developers.
Anthropic Researcher Mode: Claude builds and runs its own experiments
Anthropic's Claude now features a Researcher Mode, enabling persistent compute, file system access, and a code execution sandbox. This allows the model to conduct multi-day investigations, run experiments, and generate detailed reports, enhancing its research capabilities significantly.