Python
The data and code for paper: "SOPBench: Evaluating Language Agents at Following Standard Operating Procedures and Constraints"
Latest commits.
Builders behind this project.