Other

teamcity-ai-agent-testing-demo

End-to-end TeamCity framework to run AI agents on SWE-Bench Lite. Spin up isolated Docker images per task, extract patches, score with the official harness, and aggregate success rates. As an example, we'll look at Junie and Google Gemini CLI

OtherEmerging

GitHub Website

Stars

—

Forks

—

Contributors

Last push

10mo ago

Recent commits

Latest commits.

Merge pull request #2 from JetBrains/olgabedrina-patch-1
8db1637Sergei Ugdyzhekov10mo ago
Update README.md
0f499a9Olga Bedrina10mo ago
Update README.md
dfc9fe8Olga Bedrina10mo ago
Update artifact rules in SWE_Bench_Lite to include datasets directory unpacking.
d26e90fSergei Ugdyzhekov11mo ago
Refactor error tagging logic in TeamCity to group checks for error and empty patch instances.
3c4302b

Top contributors

Builders behind this project.