GPU-optimized OCR skill using Surya - a modern, high-accuracy OCR engine.

When to Use

Extracting text from screenshots, photos, or scanned images
Processing PDFs with embedded images
Multi-language document OCR (90+ languages including Japanese)
Layout analysis and table detection
When GPU acceleration is available and desired

Key Features

| Accuracy | 2x better than Tesseract (0.97 vs 0.88 similarity)

| GPU Support | PyTorch-based, CUDA optimized

| Languages | 90+ languages including CJK

| Layout | Document layout analysis, table recognition

| LaTeX | Inline math equation recognition

Quick Start

Installation

Step 1: GPU Check

Before installing, check if GPU is available:

import torch
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"VRAM: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.1f} GB")

⚠️ If CUDA = False but you have an NVIDIA GPU:

You have CPU-only PyTorch installed. Reinstall with CUDA support:

# Uninstall CPU version
pip uninstall torch torchvision torchaudio -y

# Install CUDA version (check your CUDA version with: nvidia-smi)
# CUDA 12.1 (recommended)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

# CUDA 11.8 (older GPUs)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

No GPU? Surya works on CPU too (slower, but functional).

Step 2: Install Surya

# Core OCR (includes pypdfium2 for PDF support)
pip install surya-ocr

Note: Surya includes pypdfium2 for PDF processing. No external dependencies (Poppler) required.

Basic Usage

from PIL import Image
from surya.recognition import RecognitionPredictor
from surya.detection import DetectionPredictor
from surya.foundation import FoundationPredictor

# Load image
image = Image.open("document.png")

# Initialize predictors (auto-detects GPU)
foundation_predictor = FoundationPredictor()
recognition_predictor = RecognitionPredictor(foundation_predictor)
detection_predictor = DetectionPredictor()

# Run OCR
predictions = recognition_predictor([image], det_predictor=detection_predictor)

# Get text
for page in predictions:
    for line in page.text_lines:
        print(line.text)

CLI Usage

# OCR single image
surya_ocr image.png

# OCR with output to JSON
surya_ocr image.png --output_dir ./results

# Launch GUI (requires streamlit)
pip install streamlit
surya_gui

Helper Script CLI

# Basic usage
python scripts/ocr_helper.py image.png

# With verbose logging
python scripts/ocr_helper.py image.png -v

# Specify languages and output file
python scripts/ocr_helper.py document.pdf -l ja en -o result.txt

# Disable OOM auto-retry
python scripts/ocr_helper.py large_image.png --no-retry

GPU Configuration

Surya auto-detects GPU. Adjust VRAM usage with environment variables:

| RECOGNITION_BATCH_SIZE | 512 | Reduce for lower VRAM (e.g., 256 for 12GB)

| DETECTOR_BATCH_SIZE | 36 | Reduce if OOM errors occur

# Linux/macOS
export RECOGNITION_BATCH_SIZE=256
export DETECTOR_BATCH_SIZE=16
surya_ocr image.png

# Windows PowerShell
$env:RECOGNITION_BATCH_SIZE = 256
$env:DETECTOR_BATCH_SIZE = 16
surya_ocr image.png

OOM Auto-Retry

The helper script automatically retries with reduced batch size on GPU OOM:

# Auto-retry enabled by default
text = ocr_image("large_image.png")  # Retries up to 3x

# Disable if you want manual control
text = ocr_image("large_image.png", auto_retry=False)

Use Cases

| Screenshot OCR | python scripts/ocr_helper.py screenshot.png

| PDF Processing | ocr_pdf("document.pdf") → returns list of page texts

| Batch Processing | ocr_batch(["img1.png", "img2.png"]) → returns dict

| Japanese/CJK | Auto-detected, no config needed

Scripts

| scripts/ocr_helper.py | Helper functions with OOM auto-retry, verbose logging, batch support

Helper Script Features

| verbose | Enable detailed logging (-v in CLI)

| auto_retry | Automatically reduce batch size on OOM (default: on)

| ocr_image() | Single image OCR

| ocr_pdf() | PDF OCR (all pages)

| ocr_batch() | Batch OCR for multiple images

| set_verbose() | Enable/disable logging programmatically

Troubleshooting

GPU Not Detected (CUDA = False)

Symptom: CUDA available: False even with NVIDIA GPU

Cause: CPU-only PyTorch installed instead of CUDA version

Fix:

# 1. Check your CUDA version
nvidia-smi  # Look for "CUDA Version: X.X"

# 2. Reinstall PyTorch with CUDA
pip uninstall torch torchvision torchaudio -y
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

Verify:

import torch
print(torch.cuda.is_available())  # Should be True
print(torch.cuda.get_device_name(0))  # Should show your GPU name

CUDA Out of Memory

Reduce batch sizes:

export RECOGNITION_BATCH_SIZE=128
export DETECTOR_BATCH_SIZE=8

CPU Fallback

If no GPU available, Surya automatically falls back to CPU (slower but works).

Model Download

First run downloads models (~2GB). Ensure internet connection.

References

Surya GitHub - Official repository
Surya Documentation - Usage guide
Benchmark Results - Accuracy comparisons

License Notice

This skill: CC BY-NC 4.0 (wrapper scripts only)

Surya (underlying OCR engine):

Code: GPL-3.0
Models: Free for research, personal use, and startups under $2M funding/revenue
Commercial use beyond $2M: See Surya Pricing

ocr-super-surya

安装