Extract Moves From Video

Overview

This skill provides a systematic approach for extracting text commands from video recordings. Common use cases include extracting gameplay commands from text adventure games (like Zork), capturing terminal commands from screen recordings, or transcribing any typed input visible in video content.

Workflow

Step 1: Analyze the Source Video

Before processing, understand the video characteristics:

Determine video properties

Resolution, duration, frame rate

Identify text regions

Where commands appear on screen (e.g., after a prompt character like

>

)

Assess text style

Font type, color, background contrast (terminal text on dark backgrounds requires specific handling)

Check for audio

Determine if audio transcription could supplement OCR (verify audio contains relevant content before installing large packages like Whisper)

Understand typing patterns

Estimate how frequently new commands appear to inform frame sampling rate

Step 2: Download and Prepare Video

Download video

using appropriate tools (

yt-dlp

,

youtube-dl

, or direct download)

Verify download integrity

before proceeding

Extract video metadata

to confirm properties match expectations

Step 3: Extract Frames Strategically

Frame extraction requires balancing coverage against processing time:

Analyze command frequency first

Manually review a sample of the video to understand how often new commands appear

Choose appropriate sampling rate

:

Fast typing: 0.5-1 second intervals

Slow typing: 2-3 second intervals

When uncertain, extract at higher frequency and subsample later (avoids re-extraction)

Use FFmpeg for extraction

:

ffmpeg

-i

video.mp4

-vf

"fps=1"

frames/frame_%04d.png

Focus on relevant screen regions

If commands appear in a specific area, crop frames to that region to improve OCR accuracy

Step 4: Optimize OCR Configuration

OCR accuracy depends heavily on proper configuration for the specific video type:

Test on sample frames first

Before processing all frames, tune OCR settings on 5-10 representative frames

Configure Tesseract page segmentation mode (

--psm

)

:

--psm 6

Assume uniform block of text

--psm 7

Single text line

--psm 13

Raw line (treat as single line, no analysis)

Preprocess images for better OCR

:

Binarization

Convert to black/white with appropriate threshold
Invert colors
if text is light on dark background
Increase contrast
for low-contrast videos
Scale up
small text (2x-3x enlargement often helps)
Test multiple threshold values: Common values (127, 150, 180) work differently depending on video; empirically test which produces best results Example preprocessing with Python/OpenCV: import cv2 img = cv2 . imread ( 'frame.png' , cv2 . IMREAD_GRAYSCALE )

Invert if light text on dark background

img

cv2 . bitwise_not ( img )

Binarize with tested threshold

_ , img = cv2 . threshold ( img , 150 , 255 , cv2 . THRESH_BINARY )

Scale up for better OCR

img

cv2

.

resize

(

img

,

None

,

fx

=

2

,

fy

=

2

,

interpolation

=

cv2

.

INTER_CUBIC

)

Step 5: Extract and Parse Commands

Run OCR on preprocessed frames

Use Python bindings (
pytesseract
) for efficiency over subprocess calls
Identify command patterns: Look for prompt markers (e.g.,

, $ ,

) that precede commands Handle OCR output carefully : Do not assume commands start at line beginning (OCR introduces whitespace) Account for partial prompt character recognition (e.g.,

may become › or » ) Use flexible pattern matching :

More robust than grep "^>"

import

re

command_pattern

=

re

.

compile

(

r'[>›»]\s*(.+)'

)

Step 6: Clean and Deduplicate Results

Critical

Understand the data domain before cleaning:

Preserve legitimate duplicates

In many contexts (games, shell sessions), the same command can appear multiple times intentionally

Use temporal deduplication

Only remove duplicates from consecutive frames showing the same command, not all duplicates globally

Handle partial commands

Commands being typed appear partially; only capture complete commands

Validate corrections

When fixing OCR errors, verify corrections are contextually appropriate

Temporal deduplication approach:

def

temporal_dedupe

(

commands

)

:

"""Remove only consecutive duplicates, preserving repeated commands."""

result

=

[

]

prev

=

None

for

cmd

in

commands

:

if

cmd

!=

prev

:

result

.

append

(

cmd

)

prev

=

cmd

return

result

Step 7: Verify Results

Verification is essential for accuracy:

Sample verification

Manually compare extracted commands against source frames for a random sample

Domain validation

If extracting game commands, verify they are valid commands for that game

Sequence logic check

Verify the command sequence makes logical sense (e.g., movement commands follow plausible paths)

Count verification

Compare total extracted commands against expected count based on video length and typing speed

Common Pitfalls

OCR Quality Issues

Mistake

Using default OCR settings without optimization

Solution

Always tune

--psm

mode and image preprocessing on sample frames first

Incorrect Deduplication

Mistake

Using global deduplication (e.g.,

awk '!seen[$0]++'

) which removes all repeated commands

Solution

Use temporal deduplication that only removes consecutive duplicates

Prompt Detection Failures

Mistake

Using rigid patterns like

grep "^>"

that assume specific formatting

Solution

Use flexible regex that accounts for OCR variations and whitespace

Wasted Tool Installation

Mistake

Installing large packages (Whisper for audio) without verifying they're needed

Solution

Check if audio contains useful content before installing audio processing tools

No Intermediate Checkpointing

Mistake

Processing all frames without saving intermediate results, losing progress on timeouts

Solution

Save results after each processing stage; implement progress checkpoints

Abandoned Verification

Mistake

Not validating extracted commands against source material
Solution: Always verify a sample of extractions and validate overall sequence logic Verification Checklist Before finalizing extracted commands: Sample of extracted commands verified against source frames Command count is reasonable for video duration No obvious OCR artifacts remain (random characters, split words) Legitimate repeated commands are preserved (not incorrectly deduplicated) Command sequence follows logical order Domain-specific validation performed (e.g., commands are valid for the game/application) Tool Selection Guide Task Recommended Tool Notes Video download yt-dlp More maintained than youtube-dl Frame extraction ffmpeg Industry standard, reliable OCR tesseract via pytesseract Use Python bindings for efficiency Image preprocessing OpenCV ( cv2 ) Flexible, well-documented Pattern matching Python re module More flexible than grep

extract-moves-from-video

安装

Invert if light text on dark background

img

Binarize with tested threshold

Scale up for better OCR

img

More robust than grep "^>"