τ²-Bench: Evaluating Conversational Agents in a Dual-Control Environment
Latest commits.
Builders behind this project.