Converting Research Papers to Markdown for Obsidian, Notion, and Logseq
Picture the typical research workflow: dozens of PDFs in a Downloads folder, vague memory of reading three of them, no idea where the useful quote was. Note-taking apps like Obsidian, Notion, and Logseq fix this — but only if you can get the PDFs into them as searchable, linkable text.
This article walks through the conversion-to-PKM pipeline for research papers, book chapters, and reports. The goal: turn a passive archive of files into an active, queryable knowledge base.
Why convert to Markdown specifically
Markdown is the right intermediate format because most modern note-taking apps either store Markdown natively or import it cleanly:
- Obsidian, Logseq, Bear, Joplin store notes as plain
.mdfiles in a folder - Notion isn't Markdown-native but imports
.mdcleanly into its block model - Roam Research, Tana use outliner formats but accept Markdown paste
- iA Writer, Typora, Zettlr edit Markdown directly
Beyond compatibility, plain Markdown future-proofs your notes. There's no proprietary database to migrate out of, no app you're locked into, no risk that your notes become unreadable when a company changes direction. Markdown also preserves headings and structure, which means you can outline-fold long papers and navigate them by section.
The conversion step
The conversion approach depends on the document type:
- Digital papers (most arXiv preprints, modern journal PDFs, government reports): use the converter on this site, pymupdf4llm, or marker. No OCR needed. Expect 10–30 seconds per paper.
- Older scanned papers or photographed pages: add OCR. See converting scanned PDFs to text. Expect 1–3 minutes per paper.
- Papers with critical tables or equations: use a vision-model-based converter so figures and structured content survive. See preserving tables when converting PDF to Markdown.
A pragmatic tip: keep the original PDF alongside the converted Markdown. Sometimes you'll need to verify a quote against the source, and a PDF render is the ground truth that a converted file can't always preserve perfectly.
Cleanup that actually pays off
Time-box this — five minutes per paper, no more. The cleanup that's worth doing:
- Strip running headers and footers. Journal name, page numbers, date strings that appear on every page. These are noise in a knowledge base.
- Fix the title block. Promote the paper's title to
# H1. Authors and affiliation on a single line below. - Promote the abstract heading. Abstracts often come out as
**Abstract**(bold text); promote to## Abstractso it shows up in your editor's outline view. - Verify references are intact. Often the most useful section months later, when you're trying to trace a claim back to its source.
- Drop obvious garbage. Ligature artifacts (
fi→fi), broken hyphenation across line breaks (auto-\nmation→automation), stray PDF metadata.
If you don't plan to read the paper again — and most papers you save fall into that bucket — skip cleanup entirely. The Pareto principle applies: 20% of your collection gets 80% of the use; clean only the papers you actually return to.
Obsidian-specific workflow
Obsidian's strength is local-first, plain-Markdown storage with strong linking. The workflow that scales:
- Create a dedicated
Sources/orPapers/folder for converted papers. - Use a consistent file naming convention:
Author Year — Short Title.md. This sorts chronologically by author and is searchable. - For each paper, create a sibling literature note:
Author Year - notes.mdwith your own takeaways. Link to the source from the notes with[[Author Year — Short Title]]. - Use Obsidian's "Backlinks" pane to see which of your daily notes, project pages, and thoughts cite a given paper.
- Install the Citations plugin if you manage BibTeX entries — it creates literature notes from a
.bibfile and keeps them in sync. - Tag by topic and reading status:
#topic/machine-learning #status/read. Avoid deep tag hierarchies; they become maintenance overhead.
The combination of plain Markdown storage, full-text search, and graph view means your papers stop being a passive archive and become a navigable knowledge graph. The investment is small — converting and cleanly filing one paper takes about 10 minutes — and the payoff compounds.
Notion-specific workflow
Notion handles imported Markdown surprisingly well, but the database model needs setup:
- Create a Papers database with properties for Authors, Year, Topic, Status, Source Link.
- For each paper, drag the converted
.mdfile into the database. Notion creates a page with the content as blocks. - Fill in the database properties so you can filter and sort. The properties are what turn the database into a useful research tool.
- Use Notion's full-text search across the database content. It works on the body of imported pages.
- For round-tripping back out (e.g., feeding selected papers to an LLM), use Notion's API or export-to-Markdown feature.
One pitfall: very large papers can hit Notion's per-page block limits. For a 50-page PDF that converts to 2,000+ paragraphs, break it into sections by chapter and link them from a parent page.
Logseq-specific workflow
Logseq is outliner-first: every line in a Logseq page is a block you can reference individually. Pasting a converted Markdown paper creates a deep block tree following the heading structure.
This is genuinely useful for dense papers because:
- You can quote a specific paragraph by block reference, not by copy-paste. Your daily journal can include
((block-uuid))references that pull in the original paragraph from the source paper. - Rearranging a paper's argument structure is drag-and-drop — useful when you're trying to understand a paper by reorganizing it.
- Block references are bi-directional: you can see everywhere you've quoted a given paragraph.
Combine with Logseq's PDF annotation feature: highlight passages in the original PDF and the highlights become blocks in your daily journal, linked back to the source.
Linking and discovery
The biggest payoff of converting PDFs to Markdown isn't the conversion itself — it's the linking afterward. A few patterns:
- Maps of Content (MoCs). A topical page that links to all relevant papers on a given subject. Turns scattered downloads into a navigable index.
- Concept notes. Atomic notes about a specific idea, each linking to the papers that introduced or discussed it. The Zettelkasten approach.
- Citation chains. When paper A cites paper B, link the two. Over time, this creates a graph of the literature you've read that you can navigate visually in Obsidian's graph view.
- Cross-references in your own writing. When you write notes that cite a paper, include the
[[link]]— backlinks make every cited paper a hub.
The links are where the value lives. The conversion is the price of admission.
A note on long papers
Papers over 50 pages can swamp your vault. Two strategies:
- Selective retention. Keep only the abstract, introduction, and conclusion in Markdown. Link to the full PDF for deep dives.
- Per-chapter split. Use the page-chunks option in your converter to split a book-length PDF by chapter. Save each chapter as a separate file with a shared prefix (
Book Title — Chapter 01.md,Book Title — Chapter 02.md).
Conclusion
The conversion step is mechanical. The cleanup and linking — where the value lives — is the part that takes thought. Start with five papers and a 30-minute total budget; scale from there once the workflow feels natural.
If you don't already have a conversion tool, the converter on this site outputs Markdown by default and handles both digital and scanned PDFs.
← Back to all guides