swebench-pro-runner
An open-source evaluation platform for testing AI coding agents on real-world software engineering tasks. SWE-bench Pro Runner provides 742 curated tasks across 11 production repositories, with full orchestration tooling to launch evaluations, track results, and generate analytics reports.