Free · Private · No sign-up

Convert PDF to Markdown & Plain Text

Q: Is pdfs2txt free to use?

Yes. Converting PDFs to Markdown with built-in text extraction and Tesseract OCR is completely free, with no account or watermarks. The only optional paid part is AI image processing, which uses your own OpenAI or Gemini API key billed by those providers.

Q: What happens to my uploaded files?

Your PDF is uploaded over HTTPS, processed only for the conversion, and then deleted. The converted Markdown is held briefly for a single download and removed afterward. We do not store, share, or analyse your documents.

Q: Can it handle scanned PDFs?

Yes. If a page has little or no selectable text, pdfs2txt runs OCR to recover the words from the page image. Choose the OCR option before converting; AI image processing produces better results for diagrams or handwriting.

Q: Will my tables and formatting be preserved?

In most cases yes. The converter maps PDF headings, lists, emphasis, and tables onto their Markdown equivalents. Very complex tables can lose some structure.

Q: Why convert to Markdown instead of Word or plain text?

Markdown keeps document structure in a lightweight, readable form that works in editors, Git, static-site generators, note apps, and AI tools, without proprietary bloat. Plain text loses structure and Word files are harder to diff or automate.

Q: Do I need an API key?

No. Text extraction and OCR run on our server with no key. An API key is only needed if you opt into AI image processing with OpenAI or Gemini, using your own key for that request.

Q: What size and type of PDFs are supported?

Standard PDF files up to roughly 50 MB are supported, digital or scanned. Password-protected PDFs must be unlocked before uploading.

pdfs2txt turns PDF documents — research papers, reports, ebooks, scanned pages, and forms — into clean, structured Markdown you can edit, search, version-control, or feed straight into an AI model. Drop a file in, pick how you want images handled, and download the result in seconds.

Keeps headings, lists, tables, and emphasis intact
Built-in OCR recovers text from scanned and photographed pages
Files are processed only for the conversion, then deleted

Convert a PDF now

Upload a PDF and download the converted Markdown file.

Converting PDF…

✓

Conversion complete

Download Markdown

Why convert PDFs to Markdown?

PDF is a great format for printing a document, but a frustrating one for working with the text inside it. The words you see on screen are often locked into a fixed page layout, split across columns, or — in the case of scanned documents — stored as flat images with no real text at all. Copy and paste from a PDF and you frequently get jumbled line breaks, missing characters, or nothing usable.

Markdown solves this. It is plain text with a tiny bit of structure, so it opens in any editor, lives happily in Git, pastes cleanly into note apps like Obsidian and Notion, and is the preferred input format for large language models. Converting a PDF to Markdown gives you back control of your content: you can search it, diff it, reformat it, and reuse it anywhere. pdfs2txt does that conversion for you while preserving the document's headings, lists, tables, and emphasis.

Features

Structure-aware extraction

Headings, bullet and numbered lists, tables, bold and italic text are carried over as real Markdown — not flattened into one wall of text.

OCR for scanned pages

When a page is an image with no selectable text, built-in Tesseract OCR reads the characters and adds them to the output automatically.

Optional AI image processing

Connect an OpenAI or Google Gemini key to interpret diagrams, complex tables, and handwriting that traditional OCR struggles with.

Page chunking

Split the output by page with horizontal rules, so you can trace any passage back to its original page number.

Private by design

Your upload is processed only for the conversion and then deleted. No accounts, no storage, no tracking of your document contents.

Free and instant

No sign-up, no paywall, no watermarks. Most documents finish converting in a few seconds.

What people use it for

Feeding documents to AI Turn reports and papers into clean Markdown for ChatGPT, Claude, or a RAG pipeline that needs well-structured text.

Research & study notes Pull quotes, sections, and references out of academic PDFs into your note-taking app without retyping.

Archiving scanned files Make old scanned contracts, letters, and forms searchable by recovering their text with OCR.

Publishing & docs Move content from a PDF into a static site, wiki, or README without copy-paste formatting headaches.

Guides & tutorials

In-depth, practical articles on PDF extraction, OCR, Markdown workflows, and getting documents ready for large language models.

Extracting Structured JSON Data from PDFs — Schemas, Tools, and Validation

How to turn unstructured PDF content into clean JSON. Schema design, regex vs layout vs LLM extraction, and validating the output you get back.

6 min read

Fixing Reading Order in Multi-Column and Magazine PDFs

Why two-column papers and magazine layouts come out scrambled when converted, how reading order detection works, and how to get coherent text back.

5 min read

Measuring and Improving OCR Accuracy — CER, WER, and What "95%" Really Means

How OCR accuracy is actually measured, why character and word error rates differ, how to benchmark a tool on your own documents, and how to push accuracy up.

5 min read

Extracting Images and Figures from PDFs — Embedded Bitmaps vs Rendered Pages

How to pull images, charts, and figures out of a PDF. The difference between embedded image extraction and page rendering, resolution gotchas, and tools.

5 min read

Extracting Data from Invoice PDFs at Scale — Fields, Tools, and Accuracy

A practical guide to pulling structured data from invoice PDFs. Which fields to target, prebuilt vs custom models, line-item extraction, and validation.

5 min read

Converting PDFs to EPUB and Ebook Formats — Reflowable Text from Fixed Pages

How to turn a fixed-layout PDF into a reflowable EPUB that works on e-readers. Why it's hard, the Markdown-as-intermediate approach, tools, and cleanup.

5 min read

Browse all 31 guides →

Frequently asked questions

Is pdfs2txt free to use?

Yes. Converting PDFs to Markdown with the built-in text extraction and Tesseract OCR is completely free, with no account, no file limits beyond a sensible size cap, and no watermarks. The only optional paid part is AI image processing, which uses your own OpenAI or Gemini API key and is billed by those providers, not by us.

What happens to my uploaded files?

Your PDF is uploaded over an encrypted HTTPS connection, processed on the server only for the duration of the conversion, and then deleted. The converted Markdown is held briefly for a single download and removed after you fetch it. We do not store, share, or analyse the contents of your documents. See our privacy policy for details.

Can it handle scanned PDFs?

Yes. If a page has little or no selectable text, pdfs2txt detects this and runs OCR to recover the words from the page image. Choose the OCR option before converting. For scans with diagrams or handwriting, the optional AI image processing produces better results. Our guide on converting scanned PDFs to text walks through the details.

Will my tables and formatting be preserved?

In most cases, yes. The converter maps PDF headings, lists, emphasis, and tables onto their Markdown equivalents. Very complex or visually-styled tables can lose some structure — for those, see our guide on converting PDF tables to Markdown.

Why convert to Markdown instead of Word or plain .txt?

Markdown keeps document structure (headings, lists, tables) in a lightweight, human-readable form that works everywhere — editors, Git, static-site generators, note apps, and AI tools — without the bloat of a proprietary format. Plain .txt loses all structure; Word files are heavier and harder to diff or automate. You can always convert Markdown onward to other formats later.

Do I need an API key?

No. Standard text extraction and OCR run entirely on our server and need no key. An API key is only required if you opt into AI image processing with OpenAI or Google Gemini, in which case you supply your own key and it is used only for that single request.

What size and type of PDFs are supported?

Standard PDF files up to roughly 50 MB are supported, whether they are digitally created or scanned. Password-protected PDFs need to be unlocked before uploading.