How to Extract Text from a Screenshot or Photo (OCR Explained)

You’ve got a photo of a paragraph, or a screenshot of something you can’t select, or a scanned page you wish you could search. The text is right there — visible, readable — but it’s locked inside an image. There’s no way to copy it, edit it, search it, or paste it anywhere.

OCR is the way out. Here’s what it is and how to use it.

What is OCR?

OCR stands for Optical Character Recognition. It’s the technology that reads text from inside an image and turns it back into actual text — letters, words, paragraphs you can copy, paste, edit, search, or feed into another program.

Without OCR, the words in an image are just colored pixels arranged in shapes that happen to look like text. Your computer has no idea what they say. With OCR, those pixels become real text again.

The classic use cases:

Receipts — snap a photo of a receipt, run OCR, paste the line items into a spreadsheet.
Screenshots — grab text out of an image where you can’t select it normally (a screenshot of a chat, an error message, a diagram).
Scanned documents — turn a scan of a paper letter, contract, or form into editable text.
Photos of book or magazine pages — extract quotes or paragraphs without retyping.
Foreign-language signs and menus — pull the text out so you can paste it into a translator.
Handwritten notes (printed letters only — cursive doesn’t work well; we’ll come back to this).

The fastest way to use OCR

Use the Image to Text tool. Drop in your image, and the recognized text appears below. Copy it, edit it, save it as a .txt file.

The flow:

Open the Image to Text tool
Drag in a JPG, PNG, WebP, BMP, GIF, or TIFF (up to 20 MB)
Click “Extract text”
The OCR engine runs in your browser — first run downloads the English model (~10 MB, cached after that)
Recognized text appears in a textbox; copy it or download as .txt

Most images take 2–10 seconds to process depending on size and how much text is on them.

What’s actually happening under the hood

The tool uses Tesseract.js, a JavaScript port of Tesseract — the OCR engine Google has been developing since 2006. Tesseract is open source, mature, and surprisingly good at clean printed text.

The process:

The image gets loaded into a canvas in your browser
Tesseract analyzes regions of the image, looking for areas that look like text
For each text region, it segments out individual lines, then words, then characters
Each character is matched against a trained model of what each letter looks like in many fonts
The recognized characters are assembled back into text, preserving the layout

The trained model is the file that gets downloaded on first use. There’s one per language — English is ~10 MB, others vary from 8 to 15 MB. After it downloads, your browser caches it indefinitely, so future OCR runs in the same language start instantly.

What kinds of images work well

Tesseract is built for clean printed text. It works best on:

Screenshots of digital text — practically perfect. The text is rendered cleanly, contrast is high, fonts are consistent.
High-resolution scans of printed pages — books, magazines, letters, contracts. 300 DPI scans work great.
Photos of printed text taken in good lighting, with the camera roughly parallel to the page.

It struggles with:

Low resolution images (anything under ~150 pixels per inch of text). Letters that look fine to your eye may not have enough pixels for Tesseract to distinguish similar shapes (rn vs m, 0 vs O, l vs 1).
Heavy compression artifacts (JPGs saved at very low quality). The “blockiness” of low-quality JPG hides character details.
Skewed angles. If the photo is taken at 30°+ off square, Tesseract has trouble.
Tight columns or unusual layouts (newspaper columns, business cards). The line-segmentation step can mix up which text belongs to which paragraph.
Stylized fonts (handwritten-looking scripts, heavy decorative fonts). Tesseract is trained on standard serif and sans-serif fonts.

Handwriting: why it usually doesn’t work

You can try, but the expected result for cursive handwriting is “mostly garbage with occasional correct words.” Tesseract is trained on printed characters that follow consistent shapes. Handwriting has unlimited variation in slant, spacing, joining, character shape — each person’s writing is essentially a different “font” that the engine has never seen.

What does work, sometimes:

Block printing in a consistent style — engineering drawings, neatly printed forms
Computer-generated faux-handwriting fonts in screenshots and marketing materials

If you have actual cursive handwritten notes to convert, you’d need a different kind of tool (often a paid cloud service trained specifically on handwriting). Tesseract isn’t the right tool for that.

What about scanned PDFs?

Those need a different flow. A PDF has pages, and each page is rendered separately. The PDF OCR tool handles this — it converts each page of your PDF to an image internally, OCRs each one, and concatenates the result into a single text output.

When to use PDF OCR vs Image OCR:

Scanned multi-page PDF (a contract, a scanned book, a fax-style document) → PDF OCR
A single image (photo, screenshot, JPG/PNG) → Image to Text

You don’t need to extract pages or convert them yourself first — the PDF OCR tool handles all of that.

Languages other than English

Tesseract supports about 100 languages. We expose the most-searched 8 directly through dedicated tools:

Each language uses a separate trained model, optimized for the alphabet and word patterns of that language. The English model isn’t great at Spanish (it’ll read accented characters as something else); the Spanish model isn’t great at English. Pick the language that matches your image.

Or — on any OCR page, use the Language dropdown above the recognize button to switch on the fly. You don’t have to navigate to a different URL.

Privacy: what stays on your device

Every part of OCR on these tools runs in your browser:

The image is read with the browser’s File API — never uploaded.
Tesseract.js itself is a JavaScript library that loads from our server, then runs locally.
The language model is downloaded once (10–15 MB), cached, then used locally.
The recognized text never leaves your browser unless you choose to copy or save it.

That matters when you’re OCR-ing things like:

Personal documents (medical records, tax forms)
Confidential business material (contracts, financial statements, internal docs)
Anything with names, addresses, or other identifiers

Most other free online OCR services upload your image to a server, process it there, and serve back the text. Even when those services promise to delete the image after an hour, your file spent time on someone else’s machine. Browser-based OCR avoids that entirely.

Tips for getting better results

Five quick tips that consistently improve OCR accuracy:

Higher resolution helps a lot. If you can scan at 300 DPI instead of 150 DPI, do it. Letters need pixels to be recognizable; double the resolution roughly halves the error rate.
High contrast helps. Black text on white background works better than gray text on cream background. If your image is washed out, increase the contrast in any photo editor before OCR.
Crop tightly. OCR engines waste effort on non-text regions (background, edges, decorative elements). Crop down to just the text area before running OCR.
Straighten skewed scans. Even a 5° rotation can hurt accuracy noticeably. Most photo apps have a “straighten” tool.
Pick the right language. Running an English model on a French document will produce garbled output (it’ll guess based on English word patterns). Always match the OCR language to the text language.

TL;DR

Image with text → Image to Text
Scanned PDF → PDF OCR
Non-English text → use the language dropdown on either tool, or jump to one of the language-specific pages above
Runs entirely in your browser; your image never uploads
Works great on clean printed text; struggles with handwriting, blurry images, and weird layouts