Python

CLIArena

Read the codebases of Codex, Gemini CLI, Mistral Vibe, and OpenCode, then forked three of them to run GLM-4.7 on Terminal-Bench 2.0. Same model, 2x performance gap — the scaffolding is what matters. Also benchmarked all four agents on an unpublished NP-hard optimization problem; Claude Code beat my 8-year-old C++ solution.

PythonEmerging

GitHub

Stars

Forks

—

Contributors

Last push

4mo ago

Recent commits

Latest commits.

update readme
2b26a50Charles AZAM4mo ago
update article
b0680e7Charles AZAM4mo ago
update article
26f383dCharles AZAM4mo ago
update article
db06b0bCharles AZAM4mo ago
GLM-5 runs (#9)
dde8f28Charles AZAM4mo ago

Recent commits

Top contributors