Loreon
Labs
Platform
Docs
Home
Ecosystems
Other
olmocr
Toolkit for linearizing PDFs for LLM datasets/training
Other
Emerging
GitHub
Stars
—
Forks
—
Contributors
4
Last push
16mo ago
Recent commits
Latest commits.
Internal version bump
7d7e81e
Jake Poznanski
16mo ago
double parentheses for proper escaping
7a7c878
Luca Soldaini
16mo ago
Ruff fixes to CI
dc7cb5c
Jake Poznanski
16mo ago
Merge branch 'main' of https://github.com/allenai/olmocr into main
1348a29
Jake Poznanski
16mo ago
Probably need at least 20GB GPU ram to have a good time with olmocr
ca0f911
Jake Poznanski
16mo ago
Update action.yml to use cache v3
9390831
Jake Poznanski
16mo ago
Merge branch 'main' of https://github.com/allenai/olmocr into main
2241853
Jake Poznanski
16mo ago
Fix for calling --pdfs with an invalid pdf
a701a37
Jake Poznanski
16mo ago
Top contributors
Builders behind this project.
jakep-allenai
669 commits
aman-17
7 commits
kyleclo
5 commits
soldni
1 commits