Official Inspect Implementation for "ImpossibleBench: Measuring LLMs' Propensity of Exploiting Test Cases"
Latest commits.
Builders behind this project.