Markdown Tools
Convert documents to high-quality markdown with intelligent multi-tool orchestration.
Dual Mode Architecture Mode Speed Quality Use Case Quick (default) Fast Good Drafts, simple documents Heavy Slower Best Final documents, complex layouts Quick Start Installation
Required: PDF/DOCX/PPTX support
uv tool install "markitdown[pdf]" pip install pymupdf4llm brew install pandoc
Basic Conversion
Quick Mode (default) - fast, single best tool
uv run --with pymupdf4llm --with markitdown scripts/convert.py document.pdf -o output.md
Heavy Mode - multi-tool parallel execution with merge
uv run --with pymupdf4llm --with markitdown scripts/convert.py document.pdf -o output.md --heavy
Check available tools
uv run scripts/convert.py --list-tools
Tool Selection Matrix Format Quick Mode Tool Heavy Mode Tools PDF pymupdf4llm pymupdf4llm + markitdown DOCX pandoc pandoc + markitdown PPTX markitdown markitdown + pandoc XLSX markitdown markitdown Tool Characteristics pymupdf4llm: LLM-optimized PDF conversion with native table detection and image extraction markitdown: Microsoft's universal converter, good for Office formats pandoc: Excellent structure preservation for DOCX/PPTX Heavy Mode Workflow
Heavy Mode runs multiple tools in parallel and selects the best segments:
Parallel Execution: Run all applicable tools simultaneously Segment Analysis: Parse each output into segments (tables, headings, images, paragraphs) Quality Scoring: Score each segment based on completeness and structure Intelligent Merge: Select best version of each segment across tools Merge Criteria Segment Type Selection Criteria Tables More rows/columns, proper header separator Images Alt text present, local paths preferred Headings Proper hierarchy, appropriate length Lists More items, nested structure preserved Paragraphs Content completeness Image Extraction
Extract images with metadata
uv run --with pymupdf scripts/extract_pdf_images.py document.pdf -o ./assets
Generate markdown references file
uv run --with pymupdf scripts/extract_pdf_images.py document.pdf --markdown refs.md
Output:
Images: assets/img_page1_1.png, assets/img_page2_1.jpg Metadata: assets/images_metadata.json (page, position, dimensions) Quality Validation
Validate conversion quality
uv run --with pymupdf scripts/validate_output.py document.pdf output.md
Generate HTML report
uv run --with pymupdf scripts/validate_output.py document.pdf output.md --report report.html
Quality Metrics Metric Pass Warn Fail Text Retention >95% 85-95% <85% Table Retention 100% 90-99% <90% Image Retention 100% 80-99% <80% Merge Outputs Manually
Merge multiple markdown files
python scripts/merge_outputs.py output1.md output2.md -o merged.md
Show segment attribution
python scripts/merge_outputs.py output1.md output2.md -o merged.md --verbose
Path Conversion (Windows/WSL)
Windows → WSL conversion
python scripts/convert_path.py "C:\Users\name\Documents\file.pdf"
Output: /mnt/c/Users/name/Documents/file.pdf
Common Issues
"No conversion tools available"
Install all tools
pip install pymupdf4llm uv tool install "markitdown[pdf]" brew install pandoc
FontBBox warnings during PDF conversion
Harmless font parsing warnings, output is still correct
Images missing from output
Use Heavy Mode for better image preservation Or extract separately with scripts/extract_pdf_images.py
Tables broken in output
Use Heavy Mode - it selects the most complete table version Or validate with scripts/validate_output.py Bundled Scripts Script Purpose convert.py Main orchestrator with Quick/Heavy mode merge_outputs.py Merge multiple markdown outputs validate_output.py Quality validation with HTML report extract_pdf_images.py PDF image extraction with metadata convert_path.py Windows to WSL path converter References references/heavy-mode-guide.md - Detailed Heavy Mode documentation references/tool-comparison.md - Tool capabilities comparison references/conversion-examples.md - Batch operation examples