extract-moves-from-video

安装量: 36
排名: #19225

安装

npx skills add https://github.com/letta-ai/skills --skill extract-moves-from-video
Extract Moves From Video
Overview
This skill provides a systematic approach for extracting text commands from video recordings. Common use cases include extracting gameplay commands from text adventure games (like Zork), capturing terminal commands from screen recordings, or transcribing any typed input visible in video content.
Workflow
Step 1: Analyze the Source Video
Before processing, understand the video characteristics:
Determine video properties
Resolution, duration, frame rate
Identify text regions
Where commands appear on screen (e.g., after a prompt character like
>
)
Assess text style
Font type, color, background contrast (terminal text on dark backgrounds requires specific handling)
Check for audio
Determine if audio transcription could supplement OCR (verify audio contains relevant content before installing large packages like Whisper)
Understand typing patterns
Estimate how frequently new commands appear to inform frame sampling rate
Step 2: Download and Prepare Video
Download video
using appropriate tools (
yt-dlp
,
youtube-dl
, or direct download)
Verify download integrity
before proceeding
Extract video metadata
to confirm properties match expectations
Step 3: Extract Frames Strategically
Frame extraction requires balancing coverage against processing time:
Analyze command frequency first
Manually review a sample of the video to understand how often new commands appear
Choose appropriate sampling rate
:
Fast typing: 0.5-1 second intervals
Slow typing: 2-3 second intervals
When uncertain, extract at higher frequency and subsample later (avoids re-extraction)
Use FFmpeg for extraction
:
ffmpeg
-i
video.mp4
-vf
"fps=1"
frames/frame_%04d.png
Focus on relevant screen regions
If commands appear in a specific area, crop frames to that region to improve OCR accuracy
Step 4: Optimize OCR Configuration
OCR accuracy depends heavily on proper configuration for the specific video type:
Test on sample frames first
Before processing all frames, tune OCR settings on 5-10 representative frames
Configure Tesseract page segmentation mode (
--psm
)
:
--psm 6
Assume uniform block of text
--psm 7
Single text line
--psm 13
Raw line (treat as single line, no analysis)
Preprocess images for better OCR
:
Binarization
Convert to black/white with appropriate threshold
Invert colors
if text is light on dark background
Increase contrast
for low-contrast videos
Scale up
small text (2x-3x enlargement often helps)
Test multiple threshold values
Common values (127, 150, 180) work differently depending on video; empirically test which produces best results Example preprocessing with Python/OpenCV: import cv2 img = cv2 . imread ( 'frame.png' , cv2 . IMREAD_GRAYSCALE )

Invert if light text on dark background

img

cv2 . bitwise_not ( img )

Binarize with tested threshold

_ , img = cv2 . threshold ( img , 150 , 255 , cv2 . THRESH_BINARY )

Scale up for better OCR

img

cv2
.
resize
(
img
,
None
,
fx
=
2
,
fy
=
2
,
interpolation
=
cv2
.
INTER_CUBIC
)
Step 5: Extract and Parse Commands
Run OCR on preprocessed frames
Use Python bindings (
pytesseract
) for efficiency over subprocess calls
Identify command patterns
Look for prompt markers (e.g.,

, $ ,

) that precede commands Handle OCR output carefully : Do not assume commands start at line beginning (OCR introduces whitespace) Account for partial prompt character recognition (e.g.,

may become › or » ) Use flexible pattern matching :

More robust than grep "^>"

import
re
command_pattern
=
re
.
compile
(
r'[>›»]\s*(.+)'
)
Step 6: Clean and Deduplicate Results
Critical
Understand the data domain before cleaning:
Preserve legitimate duplicates
In many contexts (games, shell sessions), the same command can appear multiple times intentionally
Use temporal deduplication
Only remove duplicates from consecutive frames showing the same command, not all duplicates globally
Handle partial commands
Commands being typed appear partially; only capture complete commands
Validate corrections
When fixing OCR errors, verify corrections are contextually appropriate
Temporal deduplication approach:
def
temporal_dedupe
(
commands
)
:
"""Remove only consecutive duplicates, preserving repeated commands."""
result
=
[
]
prev
=
None
for
cmd
in
commands
:
if
cmd
!=
prev
:
result
.
append
(
cmd
)
prev
=
cmd
return
result
Step 7: Verify Results
Verification is essential for accuracy:
Sample verification
Manually compare extracted commands against source frames for a random sample
Domain validation
If extracting game commands, verify they are valid commands for that game
Sequence logic check
Verify the command sequence makes logical sense (e.g., movement commands follow plausible paths)
Count verification
Compare total extracted commands against expected count based on video length and typing speed
Common Pitfalls
OCR Quality Issues
Mistake
Using default OCR settings without optimization
Solution
Always tune
--psm
mode and image preprocessing on sample frames first
Incorrect Deduplication
Mistake
Using global deduplication (e.g.,
awk '!seen[$0]++'
) which removes all repeated commands
Solution
Use temporal deduplication that only removes consecutive duplicates
Prompt Detection Failures
Mistake
Using rigid patterns like
grep "^>"
that assume specific formatting
Solution
Use flexible regex that accounts for OCR variations and whitespace
Wasted Tool Installation
Mistake
Installing large packages (Whisper for audio) without verifying they're needed
Solution
Check if audio contains useful content before installing audio processing tools
No Intermediate Checkpointing
Mistake
Processing all frames without saving intermediate results, losing progress on timeouts
Solution
Save results after each processing stage; implement progress checkpoints
Abandoned Verification
Mistake
Not validating extracted commands against source material
Solution
Always verify a sample of extractions and validate overall sequence logic Verification Checklist Before finalizing extracted commands: Sample of extracted commands verified against source frames Command count is reasonable for video duration No obvious OCR artifacts remain (random characters, split words) Legitimate repeated commands are preserved (not incorrectly deduplicated) Command sequence follows logical order Domain-specific validation performed (e.g., commands are valid for the game/application) Tool Selection Guide Task Recommended Tool Notes Video download yt-dlp More maintained than youtube-dl Frame extraction ffmpeg Industry standard, reliable OCR tesseract via pytesseract Use Python bindings for efficiency Image preprocessing OpenCV ( cv2 ) Flexible, well-documented Pattern matching Python re module More flexible than grep
返回排行榜