Converting PDFs to EPUB and Ebook Formats — Reflowable Text from Fixed Pages

A PDF is a fixed page. An EPUB is reflowable text that adapts to any screen, font size, and orientation. Converting between them isn't a format swap — it's a fundamental change in how the content is structured. That's why so many "PDF to EPUB" conversions produce ebooks that are unreadable on a phone: tiny fixed text, broken line wrapping, page numbers stranded mid-sentence.

This guide explains why the conversion is genuinely hard and lays out an approach that produces a real reflowable ebook rather than a PDF in an EPUB wrapper.

Why PDF → EPUB is harder than it sounds

PDF and EPUB have opposite design goals:

To convert properly you have to throw away the page layout and recover the underlying logical structure: which text is a chapter heading, which is body, where paragraphs begin and end, where one chapter stops and the next starts. PDFs don't store any of that — they store positioned glyphs (the same root issue behind why PDF text won't copy and reading order problems).

The naive converters that just embed each PDF page as an image, or dump raw positioned text into one HTML blob, skip this reconstruction entirely. The result technically opens in an e-reader but doesn't reflow — defeating the purpose.

The key insight: convert to Markdown first

The cleanest path from PDF to a good EPUB goes through a structured intermediate format, and Markdown is ideal for it. The pipeline:

PDF  →  Markdown (recover structure)  →  EPUB (apply ebook formatting)

Markdown forces the content into logical structure — headings, paragraphs, lists, emphasis — exactly the structure an EPUB needs and exactly what the PDF lost. Once you have clean Markdown, generating a valid, reflowable EPUB is a solved problem. This is the same reason Markdown works as a hub format for so many workflows; see building a PDF-to-LLM workflow for the general pattern.

So the hard part is step one — getting clean, well-structured Markdown — and the rest is mechanical.

Step 1: PDF to clean Markdown

Use a converter that preserves heading hierarchy and paragraph structure (this site's converter produces page-structured Markdown; pymupdf4llm and marker are good library options). Then clean it up, because ebook readers are unforgiving of artifacts:

The hyphenation and paragraph-rejoining cleanup is where most of the manual effort goes, and it's worth doing — it's the difference between a polished ebook and an obviously-converted one.

Step 2: Markdown to EPUB

With clean Markdown, Pandoc is the standard tool:

pandoc book.md -o book.epub \
  --metadata title="My Book" \
  --metadata author="Author Name" \
  --toc --toc-depth=2 \
  --epub-cover-image=cover.jpg

This produces a valid EPUB 3 with:

--toc-depth controls how deep the navigation goes; --split-level (or --epub-chapter-level in older Pandoc) controls where the book splits into separate chapter files, which affects load performance on e-readers.

The alternative: Calibre

Calibre is the other major path and goes PDF → EPUB directly, with a built-in conversion engine and heuristics for detecting chapters and removing headers/footers. It's more convenient (GUI, one step) but the structure recovery is less controllable than the Markdown route. For a quick personal conversion, Calibre is fine; for a clean, distributable ebook, the Markdown-intermediate path gives better results because you can fix the structure before generating the EPUB.

Calibre's "Heuristic processing" options (in the conversion dialog) help a lot — enable them to auto-remove headers/footers and fix hyphenation. Its editor also lets you fix the EPUB after conversion.

When the PDF is scanned

If the source is a scanned book, you need OCR before any of this — there's no text to restructure until OCR creates it. Run the scan through OCR (see scanned PDF to text), accept that OCR errors will need proofreading, then enter the Markdown pipeline above. For book-length scans, budget real proofreading time: at even 99% character accuracy, a 300-page book has thousands of errors.

What won't convert well

Set expectations. Some PDF content doesn't survive the trip to reflowable text:

For heavily-designed content, a fixed-layout EPUB (which preserves the design but doesn't reflow) or simply keeping the PDF may be the honest answer.

Quick reference

Conclusion

A good PDF-to-EPUB conversion is really a structure-recovery problem wearing a format-conversion costume. Routing through Markdown makes that explicit: it forces the content back into the logical structure the PDF discarded, and from there Pandoc turns out a clean, reflowable ebook in one command.

To get started on step one, the converter here will turn your PDF into structured Markdown — including OCR for scanned pages — giving you the intermediate file the rest of the pipeline needs.

← Back to all guides