pdf-pro

安装量: 37
排名: #18903

安装

npx skills add https://github.com/yuniorglez/gemini-elite-core --skill pdf-pro

Skill: PDF Pro (Standard 2026) Role: The PDF Pro is a specialized agent responsible for the entire lifecycle of document engineering. This includes "Semantic Extraction" using AI models, "High-Fidelity Generation" via headless browsers, and "Forensic Modification" using low-level byte manipulation. In 2026, the Squaads AI Core prioritizes Bun-native and JavaScript-first solutions for seamless integration with Next.js 16.2. 🎯 Primary Objectives Semantic Extraction: Move beyond raw text to structured JSON using LLM-assisted OCR and layout analysis. High-Fidelity Generation: Use Puppeteer/Playwright for pixel-perfect HTML-to-PDF conversion with CSS Print Support. PDF 2.0 Compliance: Implement AES-256 encryption, UTF-8 metadata, and accessible (Tagged) PDF structures. Edge-Ready Processing: Use lightweight libraries like unpdf for serverless and edge environments. 🏗️ The 2026 Toolbelt 1. Bun-Native & JS Libraries (Primary) pdf-lib: Byte-level modification, merging, splitting, and form filling. unpdf: Ultra-lightweight extraction for Edge/Serverless. Puppeteer/Playwright: The gold standard for generating PDFs from React templates. Mistral/OpenAI OCR: Semantic extraction for complex layouts and handwriting. 2. Forensic Utilities (Legacy/Advanced) qpdf: CLI tool for structural repairs and decryption. poppler-utils: Fast C-based text and image extraction. 🛠️ Implementation Patterns 1. High-Fidelity Generation (Next.js 16.2) Generating PDFs from React components ensures visual consistency with the web app. // app/api/generate-pdf/route.ts import puppeteer from 'puppeteer' ; export async function POST ( req : Request ) { const { htmlContent } = await req . json ( ) ; const browser = await puppeteer . launch ( { headless : true } ) ; const page = await browser . newPage ( ) ; await page . setContent ( htmlContent , { waitUntil : 'networkidle0' } ) ; const pdfBuffer = await page . pdf ( { format : 'A4' , printBackground : true , margin : { top : '20px' , bottom : '20px' } } ) ; await browser . close ( ) ; return new Response ( pdfBuffer , { headers : { 'Content-Type' : 'application/pdf' } } ) ; } 2. AI-Driven Semantic Extraction Using LLMs to turn unstructured PDF text into validated Zod schemas. import { unpdf } from 'unpdf' ; import { generateObject } from 'ai' ; // AI SDK 2026 async function extractInvoice ( buffer : Buffer ) { const { text } = await unpdf . extractText ( buffer ) ; const { object } = await generateObject ( { model : myModel , schema : invoiceSchema , prompt : Extract structured data from this PDF text: ${ text } } ) ; return object ; } 🔒 PDF 2.0 Security & Integrity AES-256 Encryption PDF 2.0 deprecates weak algorithms. Use qpdf or modern JS wrappers for secure locking.

Secure a PDF with 2026 standards

qpdf --encrypt user-pass owner-pass 256 -- input.pdf secured.pdf Digital Signatures (PAdES) Integrate with OIDC providers or Hardware Security Modules (HSMs) for legally binding signatures. 🚫 The "Do Not List" (Anti-Patterns) NEVER use pypdf for complex layout extraction; it fails on multi-column or overlapping text. Use pdfplumber or AI OCR. NEVER generate PDFs using canvas drawing commands if HTML/CSS templates are an option. Maintenance is a nightmare. NEVER store unencrypted PDFs containing PII (Personally Identifiable Information) in public S3 buckets. NEVER rely on window.print() for automated server-side generation. It is non-deterministic. 🛠️ Troubleshooting Guide Issue Likely Cause 2026 Corrective Action Missing Fonts System fonts not in container Use Puppeteer with embedded Google Fonts or WOFF2. Garbled Text Complex CID encoding Use poppler with -enc UTF-8 or an AI-OCR layer. Huge File Size High-res images not optimized Run a compression pass using ghostscript or pdf-lib scaling. Form Filling Fails Flattened PDF fields Use pdf-lib to inspect AcroForm fields before writing. 📚 Reference Library AI Extraction Patterns : Mastering semantic document understanding. High-Fidelity Generation : HTML-to-PDF at scale. Legacy Utilities : When to reach for Python/CLI tools. 📜 Standard Operating Procedure (SOP) Requirement Check: Is the goal Creation , Extraction , or Modification ? Tool Selection: Creation -> Puppeteer. Extraction -> AI SDK + unpdf. Modification -> pdf-lib. Environment Check: Is this running in an Edge Function? (If yes, avoid Puppeteer). Implementation: Build with strict TypeScript typing. Audit: Verify PDF 2.0 metadata and accessibility (A11y) tags. 📈 Quality Metrics Extraction Accuracy:

98% (Measured against ground truth JSON). Generation Speed: < 2s for a 10-page document. Security Audit: Zero weak crypto algorithms (Verified via qpdf ). 🔄 Last Refactor Details By: Gemini Elite Conductor Date: January 22, 2026 Version: 1.1.0 (2026 Standard) Focus: Shift from Python-centric to JS-centric AI-integrated document engineering. End of PDF Pro Standard (v1.1.0)

返回排行榜