How to tell if you need OCR
Open the PDF and try to select text with your cursor. If you can highlight individual words, the PDF already has text — OCR isn't needed. If your cursor only selects whole pages or rectangular regions, the PDF is image-only and OCR is what makes it searchable and editable.
Common cases that need OCR
- Scanned paper documents (legal records, old contracts, medical records)
- Phone-camera photos of paper, exported as PDF
- PDFs from older photocopiers that scan-to-PDF as image
- Faxes converted to PDF
- Screenshots of text saved as PDF
- Books or articles you scanned page by page
What OCR does technically
An OCR engine (like Tesseract, which we use) examines each page image, identifies character shapes, and matches them against a trained model for the document's language. The output is plain text in reading order. We then layer that text invisibly behind the original image, so the PDF still looks identical but is now searchable, copy-paste-able, and convertible to Word.
Languages and accuracy
Tesseract supports 100+ languages; we expose 25 most-common. Pick the right one — OCR'ing English text with the Spanish model produces gibberish. For multilingual documents, select multiple languages and the engine handles them together.
Accuracy at different scan quality
- Clean printed text at 300 DPI: 99%+ accuracy
- Phone-camera scans (lighting, angle): 90–95% accuracy
- Old typewriter or low-contrast scans: 85–90%
- Handwriting (block letters, neat): 70–80%
- Cursive handwriting: 50–70% — usually requires manual correction
Two-step workflow with OCR
- Scan or upload your document as a PDF
- Run OCR — turns the image into a searchable PDF
- Optional: convert OCR'd PDF to Word for editing, or use it directly for search/copy/paste