LoreonLabsPlatform

Overview

Intelligence

Markets
Builders
Research
Ecosystems
Launchpads

Search

Python

persona-bench

Benchmarking whether system prompt personas affect Claude's code generation quality on HumanEval. Spoiler: they don't.

PythonEmerging

Stars

3

Forks

—

Contributors

1

Last push

28d ago

Recent commits

Latest commits.

Add pass@1 comparison chart to README
ae96922Jay W3mo ago
Add OpenAI provider and GPT-4.1 benchmark results
f25ee20Jay W3mo ago
Add multi-provider support (Groq) and Qwen3 32B benchmark results
730a0acJay W3mo ago
Add Claude 3 Haiku benchmark results to README
06f4c05Jay W3mo ago
Remove legacy compat code, add --verify flag and Claude 3 pricing
6733e6cJay W3mo ago

Split error into generation/evaluation, harden sandbox, fix filename encoding

d92ea8bJay W3mo ago

Add uv, ruff, and mypy tooling

056e567Jay W3mo ago

Fix Claude API usage: model aliases, adaptive thinking, pricing, stop reasons

4dfcf56Jay W3mo ago

Top contributors

Builders behind this project.