LoreonLabsPlatform
DocsHome
  • Overview

Intelligence

  • Markets
  • Builders
  • Research
  • Ecosystems
  • Launchpads
  • Search
Ecosystems

Python

persona-bench

Benchmarking whether system prompt personas affect Claude's code generation quality on HumanEval. Spoiler: they don't.

PythonEmerging
GitHub
Stars
3
Forks
—
Contributors
1
Last push
28d ago

Recent commits

Latest commits.

  • Add pass@1 comparison chart to README
    ae96922Jay W3mo ago
  • Add OpenAI provider and GPT-4.1 benchmark results
    f25ee20Jay W3mo ago
  • Add multi-provider support (Groq) and Qwen3 32B benchmark results
    730a0acJay W3mo ago
  • Add Claude 3 Haiku benchmark results to README
    06f4c05Jay W3mo ago
  • Remove legacy compat code, add --verify flag and Claude 3 pricing
    6733e6cJay W3mo ago
Split error into generation/evaluation, harden sandbox, fix filename encoding
d92ea8bJay W3mo ago
  • Add uv, ruff, and mypy tooling
    056e567Jay W3mo ago
  • Fix Claude API usage: model aliases, adaptive thinking, pricing, stop reasons
    4dfcf56Jay W3mo ago
  • Top contributors

    Builders behind this project.

    JayDoubleu
    9 commits