How to Extract Text from a Screenshot or Photo (OCR Explained)
You’ve got a photo of a paragraph, or a screenshot of something you can’t select, or a scanned page you wish you could search. The text is right there — visible, readable — but it’s locked inside an image. There’s no way to copy it, edit it, search it, or paste it anywhere.
OCR is the way out. Here’s what it is and how to use it.
What is OCR?
OCR stands for Optical Character Recognition. It’s the technology that reads text from inside an image and turns it back into actual text — letters, words, paragraphs you can copy, paste, edit, search, or feed into another program.
Without OCR, the words in an image are just colored pixels arranged in shapes that happen to look like text. Your computer has no idea what they say. With OCR, those pixels become real text again.
The classic use cases:
- Receipts — snap a photo of a receipt, run OCR, paste the line items into a spreadsheet.
- Screenshots — grab text out of an image where you can’t select it normally (a screenshot of a chat, an error message, a diagram).
- Scanned documents — turn a scan of a paper letter, contract, or form into editable text.
- Photos of book or magazine pages — extract quotes or paragraphs without retyping.
- Foreign-language signs and menus — pull the text out so you can paste it into a translator.
- Handwritten notes (printed letters only — cursive doesn’t work well; we’ll come back to this).
The fastest way to use OCR
Use the Image to Text tool. Drop in your image, and the recognized text appears below. Copy it, edit it, save it as a .txt file.
The flow:
- Open the Image to Text tool
- Drag in a JPG, PNG, WebP, BMP, GIF, or TIFF (up to 20 MB)
- Click “Extract text”
- The OCR engine runs in your browser — first run downloads the English model (~10 MB, cached after that)
- Recognized text appears in a textbox; copy it or download as
.txt
Most images take 2–10 seconds to process depending on size and how much text is on them.
What’s actually happening under the hood
The tool uses Tesseract.js, a JavaScript port of Tesseract — the OCR engine Google has been developing since 2006. Tesseract is open source, mature, and surprisingly good at clean printed text.
The process:
- The image gets loaded into a canvas in your browser
- Tesseract analyzes regions of the image, looking for areas that look like text
- For each text region, it segments out individual lines, then words, then characters
- Each character is matched against a trained model of what each letter looks like in many fonts
- The recognized characters are assembled back into text, preserving the layout
The trained model is the file that gets downloaded on first use. There’s one per language — English is ~10 MB, others vary from 8 to 15 MB. After it downloads, your browser caches it indefinitely, so future OCR runs in the same language start instantly.
What kinds of images work well
Tesseract is built for clean printed text. It works best on:
- Screenshots of digital text — practically perfect. The text is rendered cleanly, contrast is high, fonts are consistent.
- High-resolution scans of printed pages — books, magazines, letters, contracts. 300 DPI scans work great.
- Photos of printed text taken in good lighting, with the camera roughly parallel to the page.
It struggles with:
- Low resolution images (anything under ~150 pixels per inch of text). Letters that look fine to your eye may not have enough pixels for Tesseract to distinguish similar shapes (rn vs m, 0 vs O, l vs 1).
- Heavy compression artifacts (JPGs saved at very low quality). The “blockiness” of low-quality JPG hides character details.
- Skewed angles. If the photo is taken at 30°+ off square, Tesseract has trouble.
- Tight columns or unusual layouts (newspaper columns, business cards). The line-segmentation step can mix up which text belongs to which paragraph.
- Stylized fonts (handwritten-looking scripts, heavy decorative fonts). Tesseract is trained on standard serif and sans-serif fonts.
Handwriting: why it usually doesn’t work
You can try, but the expected result for cursive handwriting is “mostly garbage with occasional correct words.” Tesseract is trained on printed characters that follow consistent shapes. Handwriting has unlimited variation in slant, spacing, joining, character shape — each person’s writing is essentially a different “font” that the engine has never seen.
What does work, sometimes:
- Block printing in a consistent style — engineering drawings, neatly printed forms
- Computer-generated faux-handwriting fonts in screenshots and marketing materials
If you have actual cursive handwritten notes to convert, you’d need a different kind of tool (often a paid cloud service trained specifically on handwriting). Tesseract isn’t the right tool for that.
What about scanned PDFs?
Those need a different flow. A PDF has pages, and each page is rendered separately. The PDF OCR tool handles this — it converts each page of your PDF to an image internally, OCRs each one, and concatenates the result into a single text output.
When to use PDF OCR vs Image OCR:
- Scanned multi-page PDF (a contract, a scanned book, a fax-style document) → PDF OCR
- A single image (photo, screenshot, JPG/PNG) → Image to Text
You don’t need to extract pages or convert them yourself first — the PDF OCR tool handles all of that.
Languages other than English
Tesseract supports about 100 languages. We expose the most-searched 8 directly through dedicated tools:
- Spanish OCR (PDF version)
- French OCR (PDF version)
- German OCR (PDF version)
- Italian OCR (PDF version)
- Portuguese OCR (PDF version)
- Chinese (Simplified) OCR (PDF version)
- Japanese OCR (PDF version)
- Russian OCR (PDF version)
Each language uses a separate trained model, optimized for the alphabet and word patterns of that language. The English model isn’t great at Spanish (it’ll read accented characters as something else); the Spanish model isn’t great at English. Pick the language that matches your image.
Or — on any OCR page, use the Language dropdown above the recognize button to switch on the fly. You don’t have to navigate to a different URL.
Privacy: what stays on your device
Every part of OCR on these tools runs in your browser:
- The image is read with the browser’s File API — never uploaded.
- Tesseract.js itself is a JavaScript library that loads from our server, then runs locally.
- The language model is downloaded once (10–15 MB), cached, then used locally.
- The recognized text never leaves your browser unless you choose to copy or save it.
That matters when you’re OCR-ing things like:
- Personal documents (medical records, tax forms)
- Confidential business material (contracts, financial statements, internal docs)
- Anything with names, addresses, or other identifiers
Most other free online OCR services upload your image to a server, process it there, and serve back the text. Even when those services promise to delete the image after an hour, your file spent time on someone else’s machine. Browser-based OCR avoids that entirely.
Tips for getting better results
Five quick tips that consistently improve OCR accuracy:
-
Higher resolution helps a lot. If you can scan at 300 DPI instead of 150 DPI, do it. Letters need pixels to be recognizable; double the resolution roughly halves the error rate.
-
High contrast helps. Black text on white background works better than gray text on cream background. If your image is washed out, increase the contrast in any photo editor before OCR.
-
Crop tightly. OCR engines waste effort on non-text regions (background, edges, decorative elements). Crop down to just the text area before running OCR.
-
Straighten skewed scans. Even a 5° rotation can hurt accuracy noticeably. Most photo apps have a “straighten” tool.
-
Pick the right language. Running an English model on a French document will produce garbled output (it’ll guess based on English word patterns). Always match the OCR language to the text language.
TL;DR
- Image with text → Image to Text
- Scanned PDF → PDF OCR
- Non-English text → use the language dropdown on either tool, or jump to one of the language-specific pages above
- Runs entirely in your browser; your image never uploads
- Works great on clean printed text; struggles with handwriting, blurry images, and weird layouts