A framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings.
Latest commits.
Builders behind this project.