AutoBench is an agentic benchmark for evaluating LLMs as Bengaluru auto-rickshaw drivers. It simulates a 7-day period where an LLM makes decisions about accepting rides, managing time, and maximizing profits.
Latest commits.
No recent commits available.
Builders behind this project.
No contributor data available.