Python
Benchmarking whether system prompt personas affect Claude's code generation quality on HumanEval. Spoiler: they don't.
Latest commits.
Builders behind this project.