PDF Privacy and Security — What Happens to Your Document When You Convert It Online
You upload a contract, a tax return, or a medical record to a free PDF converter. The file gets converted in seconds. What just happened?
Most users hover here for half a second before clicking through. But "free" is rarely free, and the contents of your PDF can be valuable to the people running the site. This article walks through the realistic privacy and security implications of online PDF conversion — what to ask, what to avoid, and when to convert offline instead.
What an online converter actually does with your file
The honest pipeline for most online converters:
- Your browser uploads the PDF over HTTPS.
- The server saves it to disk in a temporary directory.
- The conversion process opens the file, extracts text, and generates the output.
- The output is delivered back to your browser.
- The input and output files are deleted.
Stage 5 is the variable one. Different services do different things:
- Delete immediately after a successful download
- Delete after a time-out (often 1–24 hours)
- Delete only when storage fills up (effectively "indefinitely")
- Keep a copy "for service improvement" — which often means training data
- Forward to a third-party API (Google Cloud, AWS) where the third party's retention policies apply
Without auditing the operator, you don't know which of these is happening to your file. The privacy policy is the only signal available, and not every operator writes one that matches what they actually do.
Red flags to watch for
Before uploading anything sensitive, check the converter for these warning signs:
- No HTTPS. Even on a free coffee-shop wifi, the upload is readable by anyone on the network.
- No privacy policy, or a policy with no retention statement. A serious operator publishes retention terms. A vague policy that says "we may use your data to improve our service" usually means longer retention than you'd guess.
- Account-required uploads with broad permissions asks. A simple converter doesn't need access to your Google Drive or Dropbox.
- Free tool running on adservers with no clear business model. Free is fine if the business model is "we sell a paid version too." Free with no paid tier and no ads usually means the business model is the data.
- "AI-powered" without disclosing the third-party API. If the conversion uses OpenAI, Google, or another vendor's API, the privacy policy should say so. Silence is a sign that the operator hasn't thought about it.
- Browser plugins that request access to all sites. Almost never necessary for PDF conversion.
Questions to ask the converter
If a converter doesn't answer these, treat the absence as the answer:
- Where is the server located? (Affects which jurisdiction's laws apply to your data.)
- How long are uploaded files retained after conversion?
- Is the content sent to any third-party API for processing?
- Are conversions logged? At what level of detail?
- What happens to the output file after you download it?
- Who on the operator's team has access to the file system during processing?
A trustworthy converter has answers to these on its privacy page. The good privacy pages are short and specific.
Document categories that should never go to an online converter
For these, convert offline. Period:
- Healthcare records. HIPAA implications in the US; equivalent laws elsewhere. Even for personal records, the chain-of-custody risk isn't worth the convenience.
- Legal documents under privilege. Attorney work product, settlements, NDAs, confidential pleadings.
- Tax returns and financial statements. Identity theft risk; the data is high-value to attackers.
- Identity documents. Passports, driver's licenses, social security cards, birth certificates.
- Trade secrets and proprietary technical documentation. Including engineering specs, source code printouts, and internal training materials.
- Documents naming third parties without their consent. Contracts mentioning unnamed parties, employee evaluations, customer data.
- Drafts of unpublished writing. Manuscripts, research findings before publication, journalistic notes with confidential sources.
- Government documents marked confidential or above.
If a document falls into any of these categories, the five minutes saved by using an online converter isn't worth the lifetime tail-risk of an inadvertent leak.
How to convert offline
Several practical options that keep your file on your machine:
Command-line tools (free, technical)
# pdftotext — part of poppler-utils
sudo apt-get install poppler-utils # Ubuntu/Debian
brew install poppler # macOS
pdftotext input.pdf output.txt
# pymupdf4llm for Markdown output
pip install pymupdf4llm
python -c "import pymupdf4llm; print(pymupdf4llm.to_markdown('input.pdf'))" > output.md
# pdfs2txt CLI — clone the repository and run locally
git clone https://github.com/<your-fork>/pdfs2txt
cd pdfs2txt
python pdfs2txt.py input.pdf
Desktop applications
- Adobe Acrobat Pro — native export to plain text, Word, or other formats. Mature and reliable.
- Calibre — free, primarily for ebooks but handles PDFs reasonably.
- Pandoc — extensible, supports many formats, requires a PDF parsing backend.
Self-hosted converter
The pdfs2txt project includes a Docker deployment. You can run the same web UI on your own machine, in a virtual machine, or behind a VPN. The conversion runs on your hardware; nothing leaves your network.
docker build -t pdfs2txt .
docker run -p 8000:8000 pdfs2txt
For offline OCR specifically, install Tesseract locally and use it directly. See the Tesseract OCR guide.
What "deleted immediately" really means
Even when a converter promises immediate deletion, the reality is more nuanced:
- "Deleted" usually means "unlinked from the filesystem." The actual bytes remain on disk until the operating system overwrites them.
- Backups, log files, and crash dumps may still contain copies.
- If a CDN cached the upload (some setups do this for performance), the CDN provider retains the file for its own cache window.
- Memory dumps from a server crash can contain document contents.
- An adversary with server access can capture files in real time, before any deletion happens.
The realistic claim a trustworthy converter can make is: "We don't retain files in normal operation, and we delete temporary copies promptly." Not: "Your file ceases to exist the moment you click download."
For documents where this distinction matters, offline conversion is the only privacy-preserving option.
How pdfs2txt handles this
For transparency, here's what this specific service does with your file:
- Uploads go to the server's temporary directory and are processed there.
- The output Markdown is written to disk briefly (so it survives across worker processes) and deleted after the first download.
- No long-term storage. No analytics on file contents. No third-party API receives the file unless you explicitly choose an AI-vision option and provide your own API key.
- API keys you provide are used for the single request that needs them and removed from memory immediately after.
- The source code is open. Anyone can audit what the server does with a file.
For maximum privacy, deploy your own copy from the GitHub repository. The same code, your hardware, no third party in the loop. See the project README for Docker deployment instructions.
The full privacy policy lists this in detail.
Conclusion
Online PDF converters are fine for non-sensitive documents — the marketing PDF you downloaded yesterday, the public research paper, the user manual for your dishwasher. For anything you'd hesitate to email to a stranger, convert offline.
Read privacy policies. The good ones are short and specific. The bad ones are vague or missing — and the vagueness usually means the operator hasn't decided what they'll do with your data, which is worse than a clear policy you disagree with.
When in doubt, default to offline conversion. The tools are free, the setup takes 10 minutes, and the privacy guarantee is absolute.
← Back to all guides