A benchmark for multi-turn debate judgment in large language models.
Latest commits.
Builders behind this project.