An open-source evaluation platform for testing AI coding agents on real-world software engineering tasks. SWE-bench Pro Runner provides 742 curated tasks across 11 production repositories, with full orchestration tooling to launch evaluations, track results, and generate analytics reports.
Latest commits.
Builders behind this project.