[COLM 2025] An Open Math Pre-trainng Dataset with 370B Tokens.
Latest commits.
Builders behind this project.