Converting a PDF to Word gives you an editable document you can revise, reformat, and share. The best method depends on whether your PDF contains real text (a digital PDF) or just scanned images of text (a scanned PDF). Here is every reliable free approach, from built-in tools to Python automation.
Method 1: Microsoft Word (free, built-in)
Microsoft Word 2013 and later can open PDF files directly and convert them to editable DOCX format. No plugins or third-party tools needed.
- Open Microsoft Word
- Click File > Open > Browse
- Select your PDF file and click Open
- A dialog appears: "Word will now convert your PDF to an editable Word document." Click OK
- The converted document opens. Save it as DOCX via File > Save As > Word Document (.docx)
What to expect: Text, headings, and simple tables convert well. Multi-column layouts and heavily styled elements often need manual cleanup. The original PDF file is never modified.
Works on: Windows (Word 2013+), Mac (Word 2016+)
Method 2: Google Docs (free, no software needed)
Google Docs converts PDFs to editable documents for free, entirely in your browser:
- Go to drive.google.com and sign in
- Click New > File upload and select your PDF
- Once uploaded, right-click the file and select Open with > Google Docs
- Google Docs converts the PDF and opens it as an editable document
- To save as DOCX, click File > Download > Microsoft Word (.docx)
What to expect: Good for text-heavy PDFs. Tables and complex formatting convert with mixed results. Google Docs also has basic OCR for scanned PDFs.
Best for: Quick edits on any device without installing software.
Method 3: LibreOffice Writer (free, open source)
LibreOffice is a free, open-source office suite that imports PDF files and saves them as DOCX:
- Download and install LibreOffice from libreoffice.org
- Open LibreOffice Writer
- Click File > Open, select your PDF, and click Open
- The PDF opens as an editable document
- Click File > Save As, choose Word 2007-365 (.docx), and save
What to expect: Similar quality to Microsoft Word. Handles text-based PDFs reliably. Complex formatting may need adjustment.
Works on: Windows, macOS, Linux.
Method 4: Python with pdf2docx
pdf2docx is the most capable Python library for PDF to Word conversion. It preserves paragraphs, tables, images, and basic layout.
pip install pdf2docxfrom pdf2docx import Converter
# Convert a single file
cv = Converter("input.pdf")
cv.convert("output.docx")
cv.close()
print("Conversion complete: output.docx")Convert specific pages:
from pdf2docx import Converter
cv = Converter("report.pdf")
# Pages are zero-indexed: pages 1-3 = start=0, end=3
cv.convert("report_excerpt.docx", start=0, end=3)
cv.close()Batch convert a folder:
import os
from pdf2docx import Converter
input_dir = "./pdfs"
output_dir = "./docx"
os.makedirs(output_dir, exist_ok=True)
for filename in os.listdir(input_dir):
if filename.endswith(".pdf"):
pdf_path = os.path.join(input_dir, filename)
docx_path = os.path.join(output_dir, filename.replace(".pdf", ".docx"))
cv = Converter(pdf_path)
cv.convert(docx_path)
cv.close()
print(f"Converted: {filename}")Install notes: pdf2docx requires Python 3.6+ and works on Windows, macOS, and Linux without any system dependencies.
Method 5: Command line with LibreOffice headless
LibreOffice's headless mode converts PDFs from the terminal with no GUI. Useful for scripts and CI/CD pipelines.
# Convert a single file
libreoffice --headless --convert-to docx document.pdf
# Convert all PDFs in current directory
libreoffice --headless --convert-to docx *.pdf
# Specify output directory
libreoffice --headless --convert-to docx --outdir ./output *.pdfInstall LibreOffice:
# macOS
brew install --cask libreoffice
# Ubuntu / Debian
sudo apt install libreoffice
# Docker (for serverless environments)
FROM debian:bookworm-slim
RUN apt-get update && apt-get install -y libreoffice --no-install-recommendsBatch conversion in a shell script:
#!/bin/bash
INPUT_DIR="./pdfs"
OUTPUT_DIR="./docx"
mkdir -p "$OUTPUT_DIR"
for pdf in "$INPUT_DIR"/*.pdf; do
echo "Converting: $pdf"
libreoffice --headless --convert-to docx --outdir "$OUTPUT_DIR" "$pdf"
done
echo "Done. Files saved to $OUTPUT_DIR"Conversion quality comparison
| Method | Text accuracy | Tables | Images | Scanned PDF | Effort |
|---|---|---|---|---|---|
| Microsoft Word (built-in) | Excellent | Good | Good | Basic OCR | Zero |
| Google Docs | Good | Fair | Fair | Basic OCR | Zero |
| LibreOffice Writer | Good | Good | Good | None | Low |
| pdf2docx (Python) | Excellent | Excellent | Good | None | Low |
| LibreOffice headless | Good | Good | Good | None | Low |
Conversion quality varies by PDF complexity. A simple single-column text PDF will convert near-perfectly with any method. A PDF with overlapping elements, custom fonts, or intricate tables will require manual cleanup regardless of the tool.
Scanned PDFs: when you need OCR
A scanned PDF is a document that was printed and then photographed or scanned. The PDF contains an image of text, not actual text characters. Plain conversion tools produce blank or garbled output for scanned PDFs.
How to tell if your PDF is scanned: try selecting text by clicking and dragging. If nothing highlights, the PDF is image-only and requires OCR.
Free OCR options:
Microsoft Word: Word's PDF import includes basic OCR. Open the scanned PDF via File > Open and Word will attempt to recognize the text. Quality varies with scan clarity.
Google Docs: upload the scanned PDF to Google Drive, right-click, and open with Google Docs. Google applies OCR automatically. Works well for clean scans with standard fonts.
Tesseract (open source, command line):
# Install Tesseract
# macOS: brew install tesseract
# Ubuntu: apt install tesseract-ocr
# First, convert PDF pages to images (requires poppler)
pdftoppm -r 300 scanned.pdf page
# Then run OCR on each page
for img in page-*.ppm; do
tesseract "$img" "${img%.ppm}" -l eng
done
# Combine text files if needed
cat page-*.txt > output.txtTesseract works best at 300 DPI or higher. 300 DPI is the standard minimum for accurate OCR, as lower resolution degrades character recognition significantly.
Why conversion quality is never perfect
PDF and Word use fundamentally different layout models:
- PDF places every element at an absolute x/y coordinate on the page. Text is rendered at a fixed position, regardless of surrounding content.
- Word (DOCX) uses a flow layout: text wraps, tables expand, paragraphs reflow as you edit.
Converting from absolute to flow layout requires the converter to infer structure (this group of text is a paragraph, this block is a table header) from visual proximity alone. That inference is imperfect, especially for:
- Multi-column layouts
- Headers and footers with complex positioning
- Mixed font sizes used for decoration rather than hierarchy
- Tables with merged cells or no visible borders
For documents where formatting precision matters, expect to spend a few minutes cleaning up after conversion.
Extract text from PDF without converting to Word
If you only need the text content (not the formatting), extracting plain text is faster than converting to DOCX. PDF4.dev's free PDF to text tool extracts all text from a PDF in your browser, with no upload to any server. For batch extraction in Python, pdfplumber and PyMuPDF are the two most reliable libraries.
You can also compress the PDF before sharing it, split it into individual pages, or merge multiple documents into one file.
Building a document workflow with PDF generation
If you need to go in the other direction (generate PDFs programmatically from data), PDF4.dev handles that: you define an HTML template with variables, and the API renders it to a pixel-perfect PDF in milliseconds.
// Generate a PDF invoice from a template + data
const response = await fetch("https://pdf4.dev/api/v1/render", {
method: "POST",
headers: {
Authorization: `Bearer ${process.env.PDF4_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
template_id: "invoice",
data: {
company_name: "Acme Corp",
invoice_number: "INV-2026-042",
total: "$3,200.00",
},
}),
});
const pdf = await response.arrayBuffer();
// Returns a binary PDF ready to save or streamThe API uses Playwright (headless Chromium) for rendering, so every CSS layout rule, custom font, and table works exactly as it does in a browser. No layout surprises. See the full guide to generating PDFs from HTML templates with Node.js.
Start generating PDFs
Build PDF templates with a visual editor. Render them via API from any language in ~300ms.