Transform academic papers and content into professional slide deck images with automatic figure extraction.
Usage
/paper-slide-deck path/to/paper.pdf
/paper-slide-deck path/to/paper.pdf --style academic-paper
/paper-slide-deck path/to/content.md --style sketch-notes
/paper-slide-deck path/to/content.md --audience executives
/paper-slide-deck path/to/content.md --lang zh
/paper-slide-deck path/to/content.md --slides 10
/paper-slide-deck path/to/content.md --outline-only
/paper-slide-deck # Then paste content
Script Directory
Important: All scripts are located in the scripts/ subdirectory of this skill.
Agent Execution Instructions:
-
Determine this SKILL.md file's directory path as
SKILL_DIR -
Script path =
${SKILL_DIR}/scripts/<script-name>.ts -
Replace all
${SKILL_DIR}in this document with the actual path
Script Reference:
| scripts/generate-slides.py
| Generate AI slides via Gemini API (Python)
| scripts/merge-to-pptx.ts
| Merge slides into PowerPoint
| scripts/merge-to-pdf.ts
| Merge slides into PDF
| scripts/detect-figures.ts
| Auto-detect figures/tables in PDF
| scripts/extract-figure.ts
| Extract figure from PDF page (uses PyMuPDF fallback)
| scripts/apply-template.ts
| Apply figure container template
Options
| --style <name>
| Visual style (see Style Gallery)
| --audience <type>
| Target audience: beginners, intermediate, experts, executives, general
| --lang <code>
| Output language (en, zh, ja, etc.)
| --slides <number>
| Target slide count
| --outline-only
| Generate outline only, skip image generation
Style Gallery
| academic-paper
| Clean professional, precise charts
| Conference talks, thesis defense
| blueprint (Default)
| Technical schematics, grid texture
| Architecture, system design
| chalkboard
| Black chalkboard, colorful chalk
| Education, tutorials, classroom
| notion
| SaaS dashboard, card-based layouts
| Product demos, SaaS, B2B
| bold-editorial
| Magazine cover, bold typography, dark
| Product launches, keynotes
| corporate
| Navy/gold, structured layouts
| Investor decks, proposals
| dark-atmospheric
| Cinematic dark mode, glowing accents
| Entertainment, gaming
| editorial-infographic
| Magazine explainers, flat illustrations
| Tech explainers, research
| fantasy-animation
| Ghibli/Disney style, hand-drawn
| Educational, storytelling
| intuition-machine
| Technical briefing, bilingual labels
| Technical docs, academic
| minimal
| Ultra-clean, maximum whitespace
| Executive briefings, premium
| pixel-art
| Retro 8-bit, chunky pixels
| Gaming, developer talks
| scientific
| Academic diagrams, precise labeling
| Biology, chemistry, medical
| sketch-notes
| Hand-drawn, warm & friendly
| Educational, tutorials
| vector-illustration
| Flat vector, retro & cute
| Creative, children's content
| vintage
| Aged-paper, historical styling
| Historical, heritage, biography
| watercolor
| Hand-painted textures, natural warmth
| Lifestyle, wellness, travel
Auto Style Selection
| paper, thesis, defense, conference, ieee, acm, icml, neurips, cvpr, acl, aaai, iclr
| academic-paper
| tutorial, learn, education, guide, intro, beginner
| sketch-notes
| classroom, teaching, school, chalkboard, blackboard
| chalkboard
| architecture, system, data, analysis, technical
| blueprint
| creative, children, kids, cute, illustration
| vector-illustration
| briefing, bilingual, infographic, concept
| intuition-machine
| executive, minimal, clean, simple, elegant
| minimal
| saas, product, dashboard, metrics, productivity
| notion
| investor, quarterly, business, corporate, proposal
| corporate
| launch, marketing, keynote, bold, impact, magazine
| bold-editorial
| entertainment, music, gaming, creative, atmospheric
| dark-atmospheric
| explainer, journalism, science communication
| editorial-infographic
| story, fantasy, animation, magical, whimsical
| fantasy-animation
| gaming, retro, pixel, developer, nostalgia
| pixel-art
| biology, chemistry, medical, pathway, scientific
| scientific
| history, heritage, vintage, expedition, historical
| vintage
| lifestyle, wellness, travel, artistic, natural
| watercolor
| Default
| blueprint
Layout Gallery
Optional layout hints for individual slides. Specify in outline's // LAYOUT section.
Slide-Specific Layouts
| title-hero
| Large centered title + subtitle
| Cover slides, section breaks
| quote-callout
| Featured quote with attribution
| Testimonials, key insights
| key-stat
| Single large number as focal point
| Impact statistics, metrics
| split-screen
| Half image, half text
| Feature highlights, comparisons
| icon-grid
| Grid of icons with labels
| Features, capabilities, benefits
| two-columns
| Content in balanced columns
| Paired information, dual points
| three-columns
| Content in three columns
| Triple comparisons, categories
| image-caption
| Full-bleed image + text overlay
| Visual storytelling, emotional
| agenda
| Numbered list with highlights
| Session overview, roadmap
| bullet-list
| Structured bullet points
| Simple content, lists
Infographic-Derived Layouts
| linear-progression
| Sequential flow left-to-right
| Timelines, step-by-step
| binary-comparison
| Side-by-side A vs B
| Before/after, pros-cons
| comparison-matrix
| Multi-factor grid
| Feature comparisons
| hierarchical-layers
| Pyramid or stacked levels
| Priority, importance
| hub-spoke
| Central node with radiating items
| Concept maps, ecosystems
| bento-grid
| Varied-size tiles
| Overview, summary
| funnel
| Narrowing stages
| Conversion, filtering
| dashboard
| Metrics with charts/numbers
| KPIs, data display
| venn-diagram
| Overlapping circles
| Relationships, intersections
| circular-flow
| Continuous cycle
| Recurring processes
| winding-roadmap
| Curved path with milestones
| Journey, timeline
| tree-branching
| Parent-child hierarchy
| Org charts, taxonomies
| iceberg
| Visible vs hidden layers
| Surface vs depth
| bridge
| Gap with connection
| Problem-solution
Academic-Specific Layouts
| paper-title
| Title, authors, affiliations, venue
| Conference paper cover
| outline-agenda
| Numbered section list with highlights
| Talk structure overview
| methods-diagram
| Central architecture/pipeline diagram
| Methods, system design
| results-chart
| Chart area + data annotations
| Quantitative results
| equation-focus
| Centered equation + variable definitions
| Mathematical derivations
| qualitative-grid
| 2x2 or 3x2 image comparison grid
| Visual results, ablations
| references-list
| Numbered citation list
| Key references slide
| contributions
| Numbered contribution points
| Contributions summary
Usage: Add Layout: <name> in slide's // LAYOUT section to guide visual composition.
Design Philosophy
This deck is designed for reading and sharing, not live presentation:
-
Each slide must be self-explanatory without verbal commentary
-
Structure content for logical flow when scrolling
-
Include all necessary context within each slide
-
Optimize for social media sharing and offline reading
File Management
Output Directory
Each session creates an independent directory named by content slug:
slide-deck/{topic-slug}/
├── source-{slug}.{ext} # Source files (text, images, etc.)
├── outline.md
├── outline-{style}.md # Style variant outlines
├── prompts/
│ └── 01-slide-cover.md, 02-slide-{slug}.md, ...
├── 01-slide-cover.png, 02-slide-{slug}.png, ...
├── {topic-slug}.pptx
└── {topic-slug}.pdf
Slug Generation:
-
Extract main topic from content (2-4 words, kebab-case)
-
Example: "Introduction to Machine Learning" →
intro-machine-learning
Conflict Resolution
If slide-deck/{topic-slug}/ already exists:
-
Append timestamp:
{topic-slug}-YYYYMMDD-HHMMSS -
Example:
intro-mlexists →intro-ml-20260118-143052
Source Files
Copy all sources with naming source-{slug}.{ext}:
-
source-article.md(main text content) -
source-diagram.png(image from conversation) -
source-data.xlsx(additional file)
Multiple sources supported: text, images, files from conversation.
Workflow
Step 1: Analyze Content
-
Save source content (if pasted, save as
source.md) -
Follow
references/analysis-framework.mdfor deep content analysis -
Determine style (use
--styleor auto-select from signals) -
Detect languages (source vs. user preference)
-
Plan slide count (
--slidesor dynamic) -
For academic papers (PDF with figures): Run automatic figure detection:
npx -y bun ${SKILL_DIR}/scripts/detect-figures.ts --pdf source-paper.pdf --output figures.json
This outputs a JSON file with all detected figures/tables, their page numbers, and captions.
Step 2: Generate Outline Variants
-
Generate 3 style variant outlines based on content analysis
-
Follow
references/outline-template.mdfor structure -
Auto-populate IMAGE_SOURCE for academic papers:
Read figures.json from Step 1
-
Map figures to slides using rules in
references/analysis-framework.mdSection 8 -
Automatically add
// IMAGE_SOURCEblocks to appropriate slides:
Architecture/pipeline figures → Methods slides (Source: extract)
-
Results tables → Quantitative results slides (
Source: extract) -
Comparison images → Qualitative results slides (
Source: extract) -
Conceptual/simple diagrams → Leave for AI generation (
Source: generateor omit) -
Save as
outline-{style}.mdfor each variant
Step 3: User Confirmation
Single AskUserQuestion with all applicable options:
| Style variant | Always (3 options + custom)
| Language | Only if source ≠ user language
After selection:
-
Copy selected
outline-{style}.mdtooutline.md -
Regenerate in different language if requested
-
User may edit
outline.mdfor fine-tuning
If --outline-only, stop here.
Step 4: Generate Prompts
-
Read
references/base-prompt.md -
Combine with style instructions from outline
-
Add slide-specific content
-
If
Layout:specified in outline, include layout guidance in prompt:
Reference layout characteristics for image composition
-
Example:
Layout: hub-spoke→ "Central concept in middle with related items radiating outward" -
Save to
prompts/directory
Step 5: Image Generation Method Selection
Before generating images, ask user to choose generation method:
Use AskUserQuestion with options:
| 1 | Gemini API (Recommended) | Official Google API via Python. Requires GOOGLE_API_KEY env var.
| 2 | Gemini Web (Browser-based) | ⚠️ Uses reverse-engineered web API. No API key needed but may break.
Based on selection:
Option 1: Gemini API (Python)
-
Verify API key: Check
GOOGLE_API_KEYorGEMINI_API_KEYenvironment variable -
Run generation script:
python ${SKILL_DIR}/scripts/generate-slides.py <slide-deck-dir> --model gemini-3-pro-image-preview
Script Features:
-
Auto-installs
google-genaipackage if missing -
Retry logic with exponential backoff (3 retries)
-
Skips already-generated slides (> 10KB)
-
Supports custom model via
--modelflag -
Outputs to
slides/subdirectory
Troubleshooting:
-
If server disconnection errors occur, script auto-retries
-
For persistent failures, re-run the script (it skips completed slides)
-
Check API quota if many failures occur
Option 2: Gemini Web Skill
- Consent Check: Read consent file at:
Windows: $APPDATA/baoyu-skills/gemini-web/consent.json
-
macOS:
~/Library/Application Support/baoyu-skills/gemini-web/consent.json -
Linux:
~/.local/share/baoyu-skills/gemini-web/consent.json -
If no consent or version mismatch, display disclaimer and ask:
⚠️ DISCLAIMER: This uses a reverse-engineered Gemini Web API (NOT official).
Risks: May break anytime, no support, possible account risk.
- For each slide, run:
npx -y bun ${GEMINI_WEB_SKILL_DIR}/scripts/main.ts \
--promptfiles prompts/01-slide-cover.md \
--image 01-slide-cover.png \
--sessionId slides-{topic-slug}-{timestamp}
Where GEMINI_WEB_SKILL_DIR = path to baoyu-danger-gemini-web skill directory.
- Proxy support: If user is in restricted network, prepend:
HTTP_PROXY=http://127.0.0.1:7890 HTTPS_PROXY=http://127.0.0.1:7890
Step 5.5: Process IMAGE_SOURCE (Automatic Figure Extraction)
For academic presentations, IMAGE_SOURCE metadata was auto-populated in Step 2 based on figure detection from Step 1.
Automatic Execution:
-
Parse outline to identify slides with
Source: extract -
Create figures directory:
mkdir -p figures -
For each extract slide, automatically:
Read the Figure number, Page, and Caption from metadata
- Run figure extraction script:
npx -y bun ${SKILL_DIR}/scripts/extract-figure.ts \
--pdf source-paper.pdf \
--page <page-number> \
--output figures/figure-<N>.png
- Run template application script:
npx -y bun ${SKILL_DIR}/scripts/apply-template.ts \
--figure figures/figure-<N>.png \
--title "<slide-headline>" \
--caption "Figure <N>: <caption-text>" \
--output <NN>-slide-<slug>.png
-
Report: "Extracted: Figure N → slide NN"
-
For slides with
Source: generate(or no IMAGE_SOURCE):
Proceed to Step 6 for AI generation
Note: Source PDF must be saved as source-paper.pdf in output directory.
Troubleshooting:
-
If figure detection missed a figure: manually add
// IMAGE_SOURCEblock to outline -
If wrong figure mapped: edit the
Figure:andPage:values in outline -
If extraction fails: check PDF page number (1-indexed)
PyMuPDF Fallback for Page Extraction:
If extract-figure.ts fails with "Image or Canvas expected" error (common with complex PDFs), use PyMuPDF:
import fitz
doc = fitz.open("source-paper.pdf")
page = doc[page_num - 1] # 0-indexed
mat = fitz.Matrix(3, 3) # 3x scale for high resolution
pix = page.get_pixmap(matrix=mat)
pix.save(f"extracted/page-{page_num}.png")
Then apply template using apply-template.ts.
Step 6: Generate Images
-
Use selected method from Step 5
-
Skip slides already processed in Step 5.5 (those with
Source: extract) -
Generate session ID:
slides-{topic-slug}-{timestamp} -
Generate each remaining slide with same session ID
-
Report progress: "Generated X/N"
-
Auto-retry once on generation failure
Step 7: Merge to PPTX and PDF
npx -y bun ${SKILL_DIR}/scripts/merge-to-pptx.ts <slide-deck-dir>
npx -y bun ${SKILL_DIR}/scripts/merge-to-pdf.ts <slide-deck-dir>
Step 8: Output Summary
Slide Deck Complete!
Topic: [topic]
Style: [style name]
Location: [directory path]
Slides: N total
- 01-slide-cover.png ✓ Cover
- 02-slide-intro.png ✓ Content
- ...
- {NN}-slide-back-cover.png ✓ Back Cover
Outline: outline.md
PPTX: {topic-slug}.pptx
PDF: {topic-slug}.pdf
Slide Modification
See references/modification-guide.md for:
-
Edit single slide workflow
-
Add new slide (with renumbering)
-
Delete slide (with renumbering)
-
File naming conventions
Image Generation Dependencies
Gemini API (Option 1 - Recommended)
Requires:
-
GOOGLE_API_KEYorGEMINI_API_KEYenvironment variable -
Python 3.8+ with pip
-
google-genaipackage (auto-installed by script)
Model: gemini-3-pro-image-preview (default)
Gemini Web Skill (Option 2)
Requires:
-
baoyu-danger-gemini-webskill installed at.claude/skills/baoyu-danger-gemini-web -
Google Chrome browser with logged-in Google account
-
User consent for reverse-engineered API disclaimer
PDF Figure Extraction
Requires:
-
Primary:
pdfjs-distnpm package (use legacy build for Node.js) -
Fallback:
pymupdfPython package (more reliable for complex PDFs) -
canvasnpm package for apply-template.ts
References
| references/analysis-framework.md
| Deep content analysis for presentations
| references/outline-template.md
| Outline structure and STYLE_INSTRUCTIONS format
| references/modification-guide.md
| Edit, add, delete slide workflows
| references/content-rules.md
| Content and style guidelines
| references/base-prompt.md
| Base prompt for image generation
| references/figure-container-template.md
| Visual specs for extracted figure containers
| references/styles/<style>.md
| Full style specifications
Notes
Image Generation
-
Nano Banana Pro API: Recommended. Stable, reliable, requires API key
-
Gemini Web: No API key needed, but uses reverse-engineered API with account risk
-
Generation time: 10-30 seconds per slide
-
Auto-retry once on generation failure
-
Maintain style consistency via session ID
Content Guidelines
-
Use stylized alternatives for sensitive public figures
-
Both methods use the same underlying Gemini model for image generation
Extension Support
Custom styles and configurations via EXTEND.md.
Check paths (priority order):
-
.paper-skills/paper-slide-deck/EXTEND.md(project) -
~/.paper-skills/paper-slide-deck/EXTEND.md(user)
If found, load before Step 1. Extension content overrides defaults.