AI Glossary

What is SWE-Bench?

Overview

SWE-Bench is a software-engineering benchmark that tests whether AI systems can fix real GitHub issues inside existing repositories. It matters because coding agents are now judged less by toy coding prompts and more by whether they can understand bugs, edit multi-file codebases, run tests, and produce accepted patches.

Why it matters

SWE-Bench is one of the clearest signals for whether an AI coding agent can move from autocomplete into practical software maintenance.

Where it appears in AI research

AI coding agent evaluations
Model release benchmark tables
Repository repair and bug-fixing papers
Developer tool comparisons

Overview

Why it matters

Where it appears in AI research

Related terms