Python tool for intelligent PDF text extraction using hybrid native+OCR approach. Optimized for academic documents with mathematical notation. Batch processing, automatic cleaning, preserves STEM symbols. Built with PyPDF2 and Tesseract OCR.
Latest commits.
Builders behind this project.