LiteParse Skill
Parse unstructured documents (PDF, DOCX, PPTX, XLSX, images, and more) locally with LiteParse: fast, lightweight, no cloud dependencies or LLM required.
Initial Setup
When this skill is invoked, respond with:
I'm ready to use LiteParse to parse files locally. Before we begin, please confirm that:
- @llamaindex/liteparse is installed globally (npm i -g @llamaindex/liteparse)
- The lit CLI command is available in your terminal
If both are set, please provide:
1. One or more files to parse (PDF, DOCX, PPTX, XLSX, images, etc.)
2. Any specific options: output format (json/text), page ranges, OCR preferences, DPI, etc.
3. What you'd like to do with the parsed content.
I will produce the appropriate lit CLI command or TypeScript script, and once approved, report the results.
Then wait for the user's input.
Step 0 — Install LiteParse (if needed)
If
liteparse
is not yet installed, install it globally:
npm
i
-g
@llamaindex/liteparse
Verify installation:
lit
--version
For Office document support (DOCX, PPTX, XLSX), LibreOffice is required:
macOS
brew install --cask libreoffice
Ubuntu/Debian
apt-get install libreoffice For image parsing, ImageMagick is required:
macOS
brew install imagemagick
Ubuntu/Debian
apt-get install imagemagick Step 1 — Produce the CLI Command or Script Parse a Single File
Basic text extraction
lit parse document.pdf
JSON output saved to a file
lit parse document.pdf --format json -o output.json
Specific page range
lit parse document.pdf --target-pages "1-5,10,15-20"
Disable OCR (faster, text-only PDFs)
lit parse document.pdf --no-ocr
Use an external HTTP OCR server for higher accuracy
lit parse document.pdf --ocr-server-url http://localhost:8828/ocr
Higher DPI for better quality
lit parse document.pdf --dpi 300 Batch Parse a Directory lit batch-parse ./input-directory ./output-directory
Only process PDFs, recursively
lit batch-parse ./input ./output --extension .pdf --recursive Generate Page Screenshots Screenshots are useful for LLM agents that need to see visual layout.
All pages
lit screenshot document.pdf -o ./screenshots
Specific pages
lit screenshot document.pdf --pages "1,3,5" -o ./screenshots
High-DPI PNG
lit screenshot document.pdf --dpi 300 --format png -o ./screenshots
Page range
lit screenshot document.pdf
--pages
"1-10"
-o
./screenshots
Step 3 — Key Options Reference
OCR Options
Option
Description
(default)
Tesseract.js — zero setup, built-in
--ocr-language fra
Set OCR language (ISO code)
--ocr-server-url