OpenCV - Computer Vision and Image Processing OpenCV (Open Source Computer Vision Library) is the de facto standard library for computer vision tasks. It provides 2500+ optimized algorithms for real-time image and video processing, from basic operations like reading images to advanced tasks like face recognition and 3D reconstruction. When to Use Reading, writing, and displaying images and videos from files or cameras. Image preprocessing (resizing, cropping, rotating, color conversion). Edge detection (Canny, Sobel) and contour finding. Feature detection and matching (SIFT, ORB, AKAZE). Object detection (Haar Cascades, HOG, DNN module for YOLO/SSD). Face detection and recognition. Image segmentation (thresholding, watershed, GrabCut). Video analysis (motion detection, object tracking, optical flow). Camera calibration and 3D reconstruction. Image stitching and panorama creation. Real-time applications requiring fast performance. Reference Documentation Official docs : https://docs.opencv.org/4.x/ GitHub : https://github.com/opencv/opencv Tutorials : https://docs.opencv.org/4.x/d9/df8/tutorial_root.html Search patterns : cv2.imread , cv2.cvtColor , cv2.Canny , cv2.findContours , cv2.VideoCapture Core Principles Image as NumPy Array OpenCV represents images as NumPy arrays with shape (height, width, channels). This allows seamless integration with NumPy operations and other scientific Python libraries. BGR Color Space (Not RGB!) OpenCV uses BGR (Blue-Green-Red) instead of RGB by default. This is critical to remember when displaying images or integrating with other libraries. In-Place vs Copy Operations Many OpenCV functions modify images in-place for performance. Understanding when copies are made is essential for efficient code. C++ Performance in Python OpenCV is written in optimized C++, making it extremely fast even when called from Python. Avoid Python loops when OpenCV vectorized operations exist. Quick Reference Installation
Basic OpenCV
pip install opencv-python
With contrib modules (SIFT, SURF, etc.)
pip install opencv-contrib-python
Headless (no GUI, for servers)
pip install opencv-python-headless Standard Imports import cv2 import numpy as np import matplotlib . pyplot as plt Basic Pattern - Read, Process, Display import cv2
1. Read image
img
cv2 . imread ( 'image.jpg' )
2. Process (convert to grayscale)
gray
cv2 . cvtColor ( img , cv2 . COLOR_BGR2GRAY )
3. Display
cv2 . imshow ( 'Grayscale' , gray ) cv2 . waitKey ( 0 )
Wait for key press
cv2 . destroyAllWindows ( ) Basic Pattern - Video Processing import cv2
1. Open video capture
cap
cv2 . VideoCapture ( 0 )
0 = default camera, or 'video.mp4'
while True :
2. Read frame
ret , frame = cap . read ( ) if not ret : break
3. Process frame
gray
cv2 . cvtColor ( frame , cv2 . COLOR_BGR2GRAY )
4. Display
cv2 . imshow ( 'Video' , gray )
5. Exit on 'q' key
if cv2 . waitKey ( 1 ) & 0xFF == ord ( 'q' ) : break
6. Cleanup
cap . release ( ) cv2 . destroyAllWindows ( ) Critical Rules ✅ DO Check Image Loaded - Always verify img is not None after cv2.imread() to catch file errors. Use cv2.cvtColor() for Color Conversion - Don't manually rearrange channels; use the provided conversion codes. Release Resources - Always call cap.release() and cv2.destroyAllWindows() when done with video/windows. Copy Before Modifying - Use img.copy() if you need to preserve the original image. Use Appropriate Data Types - Keep images as uint8 (0-255) for display, convert to float32 (0-1) for mathematical operations. Validate VideoCapture - Check cap.isOpened() before reading frames. Use BGR2RGB for Matplotlib - Convert BGR to RGB when displaying with matplotlib. Vectorize Operations - Use OpenCV's built-in functions instead of Python loops over pixels. ❌ DON'T Don't Assume RGB - OpenCV uses BGR by default; convert to RGB for matplotlib or PIL. Don't Forget waitKey() - Without cv2.waitKey() , windows won't display properly. Don't Mix PIL and OpenCV Directly - Convert between them explicitly (OpenCV uses BGR, PIL uses RGB). Don't Process Video in Memory - Process frame-by-frame to avoid memory issues with large videos. Don't Use Python Loops for Pixels - This is 100x slower than vectorized operations. Don't Hardcode Paths - Use os.path.join() or pathlib for cross-platform compatibility. Anti-Patterns (NEVER) import cv2 import numpy as np
❌ BAD: Not checking if image loaded
img
cv2 . imread ( 'image.jpg' ) gray = cv2 . cvtColor ( img , cv2 . COLOR_BGR2GRAY )
Crashes if file doesn't exist!
✅ GOOD: Always validate
img
cv2 . imread ( 'image.jpg' ) if img is None : raise FileNotFoundError ( "Image not found" ) gray = cv2 . cvtColor ( img , cv2 . COLOR_BGR2GRAY )
❌ BAD: Using Python loops for pixel manipulation
for i in range ( img . shape [ 0 ] ) : for j in range ( img . shape [ 1 ] ) : img [ i , j ] = img [ i , j ] * 0.5
Extremely slow!
✅ GOOD: Vectorized NumPy operations
img
( img * 0.5 ) . astype ( np . uint8 )
❌ BAD: Displaying BGR image with matplotlib
plt . imshow ( img )
Colors will be wrong!
✅ GOOD: Convert to RGB first
img_rgb
cv2 . cvtColor ( img , cv2 . COLOR_BGR2RGB ) plt . imshow ( img_rgb )
❌ BAD: Not releasing video capture
cap
cv2 . VideoCapture ( 'video.mp4' ) while cap . read ( ) [ 0 ] : pass
Memory leak! Camera still locked!
✅ GOOD: Always release
cap
cv2 . VideoCapture ( 'video.mp4' ) try : while cap . read ( ) [ 0 ] : pass finally : cap . release ( ) Image I/O and Display Reading and Writing Images import cv2
Read image (returns None if failed)
img
cv2 . imread ( 'image.jpg' )
Read as grayscale
gray
cv2 . imread ( 'image.jpg' , cv2 . IMREAD_GRAYSCALE )
Read with alpha channel
img_alpha
cv2 . imread ( 'image.png' , cv2 . IMREAD_UNCHANGED )
Write image
cv2 . imwrite ( 'output.jpg' , img )
Write with quality (JPEG: 0-100, PNG: 0-9 compression)
cv2 . imwrite ( 'output.jpg' , img , [ cv2 . IMWRITE_JPEG_QUALITY , 95 ] ) cv2 . imwrite ( 'output.png' , img , [ cv2 . IMWRITE_PNG_COMPRESSION , 9 ] )
Check if image loaded
if img is None : print ( "Error: Could not load image" ) else : print ( f"Image shape: { img . shape } " )
(height, width, channels)
Display Images import cv2
Display image in window
cv2 . imshow ( 'Window Name' , img ) cv2 . waitKey ( 0 )
Wait indefinitely for key press
cv2 . destroyAllWindows ( )
Display for specific duration (milliseconds)
cv2 . imshow ( 'Image' , img ) cv2 . waitKey ( 3000 )
Wait 3 seconds
cv2 . destroyAllWindows ( )
Display multiple images
cv2 . imshow ( 'Original' , img ) cv2 . imshow ( 'Gray' , gray ) cv2 . waitKey ( 0 ) cv2 . destroyAllWindows ( )
Display with matplotlib (convert BGR to RGB!)
import matplotlib . pyplot as plt img_rgb = cv2 . cvtColor ( img , cv2 . COLOR_BGR2RGB ) plt . imshow ( img_rgb ) plt . axis ( 'off' ) plt . show ( ) Video Capture import cv2
Open camera (0 = default, 1 = second camera, etc.)
cap
cv2 . VideoCapture ( 0 )
Open video file
cap
cv2 . VideoCapture ( 'video.mp4' )
Check if opened successfully
if not cap . isOpened ( ) : print ( "Error: Could not open video" ) exit ( )
Get video properties
fps
cap . get ( cv2 . CAP_PROP_FPS ) width = int ( cap . get ( cv2 . CAP_PROP_FRAME_WIDTH ) ) height = int ( cap . get ( cv2 . CAP_PROP_FRAME_HEIGHT ) ) total_frames = int ( cap . get ( cv2 . CAP_PROP_FRAME_COUNT ) ) print ( f"Video: { width } x { height } @ { fps } fps, { total_frames } frames" )
Read and process frames
while True : ret , frame = cap . read ( ) if not ret : print ( "End of video or error" ) break
Process frame here
cv2 . imshow ( 'Frame' , frame ) if cv2 . waitKey ( 1 ) & 0xFF == ord ( 'q' ) : break cap . release ( ) cv2 . destroyAllWindows ( ) Writing Videos import cv2 cap = cv2 . VideoCapture ( 'input.mp4' )
Get video properties
fps
int ( cap . get ( cv2 . CAP_PROP_FPS ) ) width = int ( cap . get ( cv2 . CAP_PROP_FRAME_WIDTH ) ) height = int ( cap . get ( cv2 . CAP_PROP_FRAME_HEIGHT ) )
Create VideoWriter
fourcc
cv2 . VideoWriter_fourcc ( * 'mp4v' )
or 'XVID', 'MJPG'
out
cv2 . VideoWriter ( 'output.mp4' , fourcc , fps , ( width , height ) ) while True : ret , frame = cap . read ( ) if not ret : break
Process frame
processed
cv2 . cvtColor ( frame , cv2 . COLOR_BGR2GRAY ) processed = cv2 . cvtColor ( processed , cv2 . COLOR_GRAY2BGR )
Convert back to 3-channel
Write frame
out . write ( processed ) cap . release ( ) out . release ( ) cv2 . destroyAllWindows ( ) Image Transformations Resizing and Cropping import cv2 img = cv2 . imread ( 'image.jpg' )
Resize to specific dimensions
resized
cv2 . resize ( img , ( 800 , 600 ) )
(width, height)
Resize by scale factor
scaled
cv2 . resize ( img , None , fx = 0.5 , fy = 0.5 )
50% of original
Resize with interpolation methods
resized_linear
cv2 . resize ( img , ( 800 , 600 ) , interpolation = cv2 . INTER_LINEAR )
Default
resized_cubic
cv2 . resize ( img , ( 800 , 600 ) , interpolation = cv2 . INTER_CUBIC )
Better quality
resized_area
cv2 . resize ( img , ( 400 , 300 ) , interpolation = cv2 . INTER_AREA )
Best for shrinking
Crop (using NumPy slicing)
height , width = img . shape [ : 2 ] cropped = img [ 100 : 400 , 200 : 600 ]
[y1:y2, x1:x2]
Center crop
crop_size
300 center_x , center_y = width // 2 , height // 2 x1 = center_x - crop_size // 2 y1 = center_y - crop_size // 2 center_cropped = img [ y1 : y1 + crop_size , x1 : x1 + crop_size ] Rotation and Flipping import cv2
Flip horizontally
flipped_h
cv2 . flip ( img , 1 )
Flip vertically
flipped_v
cv2 . flip ( img , 0 )
Flip both
flipped_both
cv2 . flip ( img , - 1 )
Rotate 90 degrees clockwise
rotated_90
cv2 . rotate ( img , cv2 . ROTATE_90_CLOCKWISE )
Rotate 180 degrees
rotated_180
cv2 . rotate ( img , cv2 . ROTATE_180 )
Rotate 90 degrees counter-clockwise
rotated_90_ccw
cv2 . rotate ( img , cv2 . ROTATE_90_COUNTERCLOCKWISE )
Rotate by arbitrary angle (around center)
height , width = img . shape [ : 2 ] center = ( width // 2 , height // 2 ) angle = 45
degrees
Get rotation matrix
M
cv2 . getRotationMatrix2D ( center , angle , scale = 1.0 )
Apply rotation
rotated
cv2 . warpAffine ( img , M , ( width , height ) )
Rotate and scale
M_scaled
cv2 . getRotationMatrix2D ( center , 30 , scale = 0.8 ) rotated_scaled = cv2 . warpAffine ( img , M_scaled , ( width , height ) ) Color Space Conversions import cv2 img = cv2 . imread ( 'image.jpg' )
BGR to RGB (for matplotlib)
rgb
cv2 . cvtColor ( img , cv2 . COLOR_BGR2RGB )
BGR to Grayscale
gray
cv2 . cvtColor ( img , cv2 . COLOR_BGR2GRAY )
BGR to HSV (useful for color-based segmentation)
hsv
cv2 . cvtColor ( img , cv2 . COLOR_BGR2HSV )
BGR to LAB
lab
cv2 . cvtColor ( img , cv2 . COLOR_BGR2LAB )
Grayscale to BGR (add color channels)
gray_bgr
cv2 . cvtColor ( gray , cv2 . COLOR_GRAY2BGR )
Extract individual channels
b , g , r = cv2 . split ( img )
Merge channels
merged
cv2 . merge ( [ b , g , r ] ) Image Filtering and Enhancement Blurring and Smoothing import cv2
Gaussian blur (reduce noise)
blurred
cv2 . GaussianBlur ( img , ( 5 , 5 ) , 0 )
(kernel_size, sigma)
Median blur (good for salt-and-pepper noise)
median
cv2 . medianBlur ( img , 5 )
kernel_size must be odd
Bilateral filter (edge-preserving smoothing)
bilateral
cv2 . bilateralFilter ( img , 9 , 75 , 75 )
(d, sigmaColor, sigmaSpace)
Average blur
avg_blur
cv2 . blur ( img , ( 5 , 5 ) )
Box filter
box
cv2 . boxFilter ( img , - 1 , ( 5 , 5 ) ) Edge Detection import cv2
Convert to grayscale first
gray
cv2 . cvtColor ( img , cv2 . COLOR_BGR2GRAY )
Canny edge detection
edges
cv2 . Canny ( gray , threshold1 = 50 , threshold2 = 150 )
Sobel edge detection (gradient in x and y)
sobelx
cv2 . Sobel ( gray , cv2 . CV_64F , 1 , 0 , ksize = 3 )
X gradient
sobely
cv2 . Sobel ( gray , cv2 . CV_64F , 0 , 1 , ksize = 3 )
Y gradient
sobel
cv2 . magnitude ( sobelx , sobely )
Laplacian edge detection
laplacian
cv2 . Laplacian ( gray , cv2 . CV_64F )
Scharr (more accurate than Sobel for small kernels)
scharrx
cv2 . Scharr ( gray , cv2 . CV_64F , 1 , 0 ) scharry = cv2 . Scharr ( gray , cv2 . CV_64F , 0 , 1 ) Morphological Operations import cv2 import numpy as np
Define kernel
kernel
np . ones ( ( 5 , 5 ) , np . uint8 )
Erosion (shrink white regions)
eroded
cv2 . erode ( img , kernel , iterations = 1 )
Dilation (expand white regions)
dilated
cv2 . dilate ( img , kernel , iterations = 1 )
Opening (erosion followed by dilation) - removes noise
opening
cv2 . morphologyEx ( img , cv2 . MORPH_OPEN , kernel )
Closing (dilation followed by erosion) - closes gaps
closing
cv2 . morphologyEx ( img , cv2 . MORPH_CLOSE , kernel )
Gradient (difference between dilation and erosion) - outlines
gradient
cv2 . morphologyEx ( img , cv2 . MORPH_GRADIENT , kernel )
Top hat (difference between input and opening) - bright spots
tophat
cv2 . morphologyEx ( img , cv2 . MORPH_TOPHAT , kernel )
Black hat (difference between closing and input) - dark spots
blackhat
cv2 . morphologyEx ( img , cv2 . MORPH_BLACKHAT , kernel ) Thresholding import cv2 gray = cv2 . cvtColor ( img , cv2 . COLOR_BGR2GRAY )
Simple threshold
ret , thresh = cv2 . threshold ( gray , 127 , 255 , cv2 . THRESH_BINARY )
Binary inverse
ret , thresh_inv = cv2 . threshold ( gray , 127 , 255 , cv2 . THRESH_BINARY_INV )
Truncate
ret , thresh_trunc = cv2 . threshold ( gray , 127 , 255 , cv2 . THRESH_TRUNC )
To zero
ret , thresh_tozero = cv2 . threshold ( gray , 127 , 255 , cv2 . THRESH_TOZERO )
Otsu's thresholding (automatic threshold calculation)
ret , thresh_otsu = cv2 . threshold ( gray , 0 , 255 , cv2 . THRESH_BINARY + cv2 . THRESH_OTSU )
Adaptive thresholding (different threshold for different regions)
adaptive_mean
cv2 . adaptiveThreshold ( gray , 255 , cv2 . ADAPTIVE_THRESH_MEAN_C , cv2 . THRESH_BINARY , 11 , 2 ) adaptive_gaussian = cv2 . adaptiveThreshold ( gray , 255 , cv2 . ADAPTIVE_THRESH_GAUSSIAN_C , cv2 . THRESH_BINARY , 11 , 2 ) Contours and Shape Detection Finding and Drawing Contours import cv2
Convert to grayscale and threshold
gray
cv2 . cvtColor ( img , cv2 . COLOR_BGR2GRAY ) ret , thresh = cv2 . threshold ( gray , 127 , 255 , cv2 . THRESH_BINARY )
Find contours
contours , hierarchy = cv2 . findContours ( thresh , cv2 . RETR_EXTERNAL , cv2 . CHAIN_APPROX_SIMPLE )
Draw all contours
img_contours
img . copy ( ) cv2 . drawContours ( img_contours , contours , - 1 , ( 0 , 255 , 0 ) , 2 )
-1 = all contours
Draw specific contour
cv2 . drawContours ( img_contours , contours , 0 , ( 255 , 0 , 0 ) , 3 )
First contour
Iterate through contours
for i , contour in enumerate ( contours ) :
Calculate area
area
cv2 . contourArea ( contour )
Calculate perimeter
perimeter
cv2 . arcLength ( contour , True )
Filter by area
if area
1000 : cv2 . drawContours ( img_contours , [ contour ] , - 1 , ( 0 , 0 , 255 ) , 2 )
Get bounding rectangle
x , y , w , h = cv2 . boundingRect ( contour ) cv2 . rectangle ( img_contours , ( x , y ) , ( x + w , y + h ) , ( 255 , 0 , 0 ) , 2 ) Shape Approximation import cv2 for contour in contours :
Approximate contour to polygon
epsilon
0.02 * cv2 . arcLength ( contour , True ) approx = cv2 . approxPolyDP ( contour , epsilon , True )
Number of vertices
n_vertices
len ( approx )
Classify shape
if n_vertices == 3 : shape = "Triangle" elif n_vertices == 4 :
Check if rectangle or square
x , y , w , h = cv2 . boundingRect ( approx ) aspect_ratio = float ( w ) / h shape = "Square" if 0.95 <= aspect_ratio <= 1.05 else "Rectangle" elif n_vertices
4 : shape = "Circle" if n_vertices
10 else "Polygon"
Draw and label
cv2 . drawContours ( img , [ approx ] , - 1 , ( 0 , 255 , 0 ) , 2 ) x , y = approx [ 0 ] [ 0 ] cv2 . putText ( img , shape , ( x , y ) , cv2 . FONT_HERSHEY_SIMPLEX , 0.5 , ( 255 , 255 , 255 ) , 2 ) Contour Features import cv2 import numpy as np for contour in contours :
Moments (for center of mass)
M
cv2 . moments ( contour ) if M [ 'm00' ] != 0 : cx = int ( M [ 'm10' ] / M [ 'm00' ] ) cy = int ( M [ 'm01' ] / M [ 'm00' ] ) cv2 . circle ( img , ( cx , cy ) , 5 , ( 255 , 0 , 0 ) , - 1 )
Minimum enclosing circle
( x , y ) , radius = cv2 . minEnclosingCircle ( contour ) center = ( int ( x ) , int ( y ) ) radius = int ( radius ) cv2 . circle ( img , center , radius , ( 0 , 255 , 0 ) , 2 )
Fit ellipse (requires at least 5 points)
if len ( contour )
= 5 : ellipse = cv2 . fitEllipse ( contour ) cv2 . ellipse ( img , ellipse , ( 255 , 0 , 255 ) , 2 )
Convex hull
hull
cv2 . convexHull ( contour ) cv2 . drawContours ( img , [ hull ] , - 1 , ( 0 , 255 , 255 ) , 2 )
Solidity (contour area / convex hull area)
hull_area
cv2 . contourArea ( hull ) contour_area = cv2 . contourArea ( contour ) solidity = contour_area / hull_area if hull_area
0 else 0 Feature Detection and Matching ORB (Oriented FAST and Rotated BRIEF) import cv2 img1 = cv2 . imread ( 'image1.jpg' , cv2 . IMREAD_GRAYSCALE ) img2 = cv2 . imread ( 'image2.jpg' , cv2 . IMREAD_GRAYSCALE )
Create ORB detector
orb
cv2 . ORB_create ( nfeatures = 1000 )
Detect keypoints and compute descriptors
kp1 , des1 = orb . detectAndCompute ( img1 , None ) kp2 , des2 = orb . detectAndCompute ( img2 , None )
Draw keypoints
img1_kp
cv2 . drawKeypoints ( img1 , kp1 , None , color = ( 0 , 255 , 0 ) )
Match descriptors using BFMatcher
bf
cv2 . BFMatcher ( cv2 . NORM_HAMMING , crossCheck = True ) matches = bf . match ( des1 , des2 )
Sort matches by distance (best first)
matches
sorted ( matches , key = lambda x : x . distance )
Draw top matches
img_matches
cv2 . drawMatches ( img1 , kp1 , img2 , kp2 , matches [ : 50 ] , None , flags = cv2 . DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS ) cv2 . imshow ( 'Matches' , img_matches ) cv2 . waitKey ( 0 ) SIFT (Scale-Invariant Feature Transform) import cv2
Note: SIFT is in opencv-contrib-python, not opencv-python
img
cv2 . imread ( 'image.jpg' , cv2 . IMREAD_GRAYSCALE )
Create SIFT detector
sift
cv2 . SIFT_create ( )
Detect keypoints and compute descriptors
keypoints , descriptors = sift . detectAndCompute ( img , None )
Draw keypoints
img_kp
cv2 . drawKeypoints ( img , keypoints , None , flags = cv2 . DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS ) print ( f"Number of keypoints: { len ( keypoints ) } " ) Feature Matching with FLANN import cv2 import numpy as np
Detect features
sift
cv2 . SIFT_create ( ) kp1 , des1 = sift . detectAndCompute ( img1 , None ) kp2 , des2 = sift . detectAndCompute ( img2 , None )
FLANN parameters
FLANN_INDEX_KDTREE
1 index_params = dict ( algorithm = FLANN_INDEX_KDTREE , trees = 5 ) search_params = dict ( checks = 50 ) flann = cv2 . FlannBasedMatcher ( index_params , search_params ) matches = flann . knnMatch ( des1 , des2 , k = 2 )
Lowe's ratio test
good_matches
[ ] for m , n in matches : if m . distance < 0.7 * n . distance : good_matches . append ( m ) print ( f"Good matches: { len ( good_matches ) } " )
Draw good matches
img_matches
cv2 . drawMatches ( img1 , kp1 , img2 , kp2 , good_matches , None , flags = cv2 . DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS ) Object Detection Haar Cascade (Face Detection) import cv2
Load pre-trained Haar Cascade
face_cascade
cv2 . CascadeClassifier ( cv2 . data . haarcascades + 'haarcascade_frontalface_default.xml' ) eye_cascade = cv2 . CascadeClassifier ( cv2 . data . haarcascades + 'haarcascade_eye.xml' ) img = cv2 . imread ( 'people.jpg' ) gray = cv2 . cvtColor ( img , cv2 . COLOR_BGR2GRAY )
Detect faces
faces
face_cascade . detectMultiScale ( gray , scaleFactor = 1.1 , minNeighbors = 5 , minSize = ( 30 , 30 ) )
Draw rectangles around faces
for ( x , y , w , h ) in faces : cv2 . rectangle ( img , ( x , y ) , ( x + w , y + h ) , ( 255 , 0 , 0 ) , 2 )
Detect eyes in face region
roi_gray
gray [ y : y + h , x : x + w ] roi_color = img [ y : y + h , x : x + w ] eyes = eye_cascade . detectMultiScale ( roi_gray ) for ( ex , ey , ew , eh ) in eyes : cv2 . rectangle ( roi_color , ( ex , ey ) , ( ex + ew , ey + eh ) , ( 0 , 255 , 0 ) , 2 ) cv2 . imshow ( 'Faces' , img ) cv2 . waitKey ( 0 ) Template Matching import cv2 img = cv2 . imread ( 'image.jpg' ) template = cv2 . imread ( 'template.jpg' )
Convert to grayscale
img_gray
cv2 . cvtColor ( img , cv2 . COLOR_BGR2GRAY ) template_gray = cv2 . cvtColor ( template , cv2 . COLOR_BGR2GRAY ) h , w = template_gray . shape
Template matching
result
cv2 . matchTemplate ( img_gray , template_gray , cv2 . TM_CCOEFF_NORMED )
Find locations above threshold
threshold
0.8 locations = np . where ( result
= threshold )
Draw rectangles
for pt in zip ( * locations [ : : - 1 ] ) : cv2 . rectangle ( img , pt , ( pt [ 0 ] + w , pt [ 1 ] + h ) , ( 0 , 255 , 0 ) , 2 ) cv2 . imshow ( 'Matches' , img ) cv2 . waitKey ( 0 ) Practical Workflows 1. Document Scanner (Perspective Transform) import cv2 import numpy as np def order_points ( pts ) : """Order points: top-left, top-right, bottom-right, bottom-left.""" rect = np . zeros ( ( 4 , 2 ) , dtype = "float32" ) s = pts . sum ( axis = 1 ) rect [ 0 ] = pts [ np . argmin ( s ) ]
Top-left
rect [ 2 ] = pts [ np . argmax ( s ) ]
Bottom-right
diff
np . diff ( pts , axis = 1 ) rect [ 1 ] = pts [ np . argmin ( diff ) ]
Top-right
rect [ 3 ] = pts [ np . argmax ( diff ) ]
Bottom-left
return rect def four_point_transform ( image , pts ) : """Apply perspective transform to get bird's eye view.""" rect = order_points ( pts ) ( tl , tr , br , bl ) = rect
Compute width and height
widthA
np . linalg . norm ( br - bl ) widthB = np . linalg . norm ( tr - tl ) maxWidth = max ( int ( widthA ) , int ( widthB ) ) heightA = np . linalg . norm ( tr - br ) heightB = np . linalg . norm ( tl - bl ) maxHeight = max ( int ( heightA ) , int ( heightB ) )
Destination points
dst
np . array ( [ [ 0 , 0 ] , [ maxWidth - 1 , 0 ] , [ maxWidth - 1 , maxHeight - 1 ] , [ 0 , maxHeight - 1 ] ] , dtype = "float32" )
Perspective transform
M
cv2 . getPerspectiveTransform ( rect , dst ) warped = cv2 . warpPerspective ( image , M , ( maxWidth , maxHeight ) ) return warped
Usage
img
cv2 . imread ( 'document.jpg' ) gray = cv2 . cvtColor ( img , cv2 . COLOR_BGR2GRAY ) edges = cv2 . Canny ( gray , 50 , 150 )
Find contours
contours , _ = cv2 . findContours ( edges , cv2 . RETR_EXTERNAL , cv2 . CHAIN_APPROX_SIMPLE ) contours = sorted ( contours , key = cv2 . contourArea , reverse = True )
Find document contour (assume largest quadrilateral)
for contour in contours : peri = cv2 . arcLength ( contour , True ) approx = cv2 . approxPolyDP ( contour , 0.02 * peri , True ) if len ( approx ) == 4 : pts = approx . reshape ( 4 , 2 ) scanned = four_point_transform ( img , pts ) cv2 . imshow ( 'Scanned' , scanned ) cv2 . waitKey ( 0 ) break 2. Motion Detection import cv2 def detect_motion ( video_path ) : """Detect motion in video using frame differencing.""" cap = cv2 . VideoCapture ( video_path ) ret , frame1 = cap . read ( ) ret , frame2 = cap . read ( ) while cap . isOpened ( ) :
Compute difference between frames
diff
cv2 . absdiff ( frame1 , frame2 ) gray = cv2 . cvtColor ( diff , cv2 . COLOR_BGR2GRAY ) blur = cv2 . GaussianBlur ( gray , ( 5 , 5 ) , 0 ) _ , thresh = cv2 . threshold ( blur , 20 , 255 , cv2 . THRESH_BINARY )
Dilate to fill gaps
dilated
cv2 . dilate ( thresh , None , iterations = 3 )
Find contours
contours , _ = cv2 . findContours ( dilated , cv2 . RETR_EXTERNAL , cv2 . CHAIN_APPROX_SIMPLE )
Draw bounding boxes
for contour in contours : if cv2 . contourArea ( contour ) < 500 : continue x , y , w , h = cv2 . boundingRect ( contour ) cv2 . rectangle ( frame1 , ( x , y ) , ( x + w , y + h ) , ( 0 , 255 , 0 ) , 2 ) cv2 . putText ( frame1 , "Motion" , ( x , y - 10 ) , cv2 . FONT_HERSHEY_SIMPLEX , 0.5 , ( 0 , 255 , 0 ) , 2 ) cv2 . imshow ( 'Motion Detection' , frame1 )
Update frames
frame1
frame2 ret , frame2 = cap . read ( ) if not ret or cv2 . waitKey ( 1 ) & 0xFF == ord ( 'q' ) : break cap . release ( ) cv2 . destroyAllWindows ( )
Usage
detect_motion('video.mp4')
- Color-Based Object Tracking import cv2 import numpy as np def track_colored_object ( video_path , lower_color , upper_color ) : """Track object by color in HSV space.""" cap = cv2 . VideoCapture ( video_path ) while True : ret , frame = cap . read ( ) if not ret : break
Convert to HSV
hsv
cv2 . cvtColor ( frame , cv2 . COLOR_BGR2HSV )
Create mask for color
mask
cv2 . inRange ( hsv , lower_color , upper_color )
Remove noise
mask
cv2 . erode ( mask , None , iterations = 2 ) mask = cv2 . dilate ( mask , None , iterations = 2 )
Find contours
contours , _ = cv2 . findContours ( mask , cv2 . RETR_EXTERNAL , cv2 . CHAIN_APPROX_SIMPLE ) if contours :
Find largest contour
largest
max ( contours , key = cv2 . contourArea )
Get center and radius
( ( x , y ) , radius ) = cv2 . minEnclosingCircle ( largest ) if radius
10 :
Draw circle and center
cv2 . circle ( frame , ( int ( x ) , int ( y ) ) , int ( radius ) , ( 0 , 255 , 0 ) , 2 ) cv2 . circle ( frame , ( int ( x ) , int ( y ) ) , 5 , ( 0 , 0 , 255 ) , - 1 ) cv2 . imshow ( 'Tracking' , frame ) cv2 . imshow ( 'Mask' , mask ) if cv2 . waitKey ( 1 ) & 0xFF == ord ( 'q' ) : break cap . release ( ) cv2 . destroyAllWindows ( )
Usage: Track red object
lower_red = np.array([0, 100, 100])
upper_red = np.array([10, 255, 255])
track_colored_object(0, lower_red, upper_red)
- QR Code Detection import cv2 def detect_qr_code ( image_path ) : """Detect and decode QR codes.""" img = cv2 . imread ( image_path )
Initialize QR code detector
detector
cv2 . QRCodeDetector ( )
Detect and decode
data , bbox , straight_qrcode = detector . detectAndDecode ( img ) if bbox is not None :
Draw bounding box
n_lines
len ( bbox ) for i in range ( n_lines ) : point1 = tuple ( bbox [ i ] [ 0 ] . astype ( int ) ) point2 = tuple ( bbox [ ( i + 1 ) % n_lines ] [ 0 ] . astype ( int ) ) cv2 . line ( img , point1 , point2 , ( 0 , 255 , 0 ) , 3 )
Display decoded data
if data : print ( f"QR Code data: { data } " ) cv2 . putText ( img , data , ( 50 , 50 ) , cv2 . FONT_HERSHEY_SIMPLEX , 1 , ( 0 , 255 , 0 ) , 2 ) cv2 . imshow ( 'QR Code' , img ) cv2 . waitKey ( 0 ) cv2 . destroyAllWindows ( )
Usage
detect_qr_code('qrcode.jpg')
- Image Stitching (Panorama) import cv2 def create_panorama ( images ) : """Stitch multiple images into panorama."""
Create stitcher
stitcher
cv2 . Stitcher_create ( )
Stitch images
status , pano = stitcher . stitch ( images ) if status == cv2 . Stitcher_OK : print ( "Panorama created successfully" ) return pano else : print ( f"Error: { status } " ) return None
Usage
img1
cv2 . imread ( 'image1.jpg' ) img2 = cv2 . imread ( 'image2.jpg' ) img3 = cv2 . imread ( 'image3.jpg' ) panorama = create_panorama ( [ img1 , img2 , img3 ] ) if panorama is not None : cv2 . imshow ( 'Panorama' , panorama ) cv2 . waitKey ( 0 ) Performance Optimization Use GPU Acceleration import cv2
Check CUDA availability
print ( f"CUDA devices: { cv2 . cuda . getCudaEnabledDeviceCount ( ) } " )
Upload to GPU
gpu_img
cv2 . cuda_GpuMat ( ) gpu_img . upload ( img )
GPU operations (must use cv2.cuda module)
gpu_gray
cv2 . cuda . cvtColor ( gpu_img , cv2 . COLOR_BGR2GRAY )
Download from GPU
result
gpu_gray . download ( ) Vectorize Operations
❌ SLOW: Python loops
for i in range ( height ) : for j in range ( width ) : img [ i , j ] = img [ i , j ] * 0.5
✅ FAST: NumPy vectorization
img
( img * 0.5 ) . astype ( np . uint8 )
✅ FAST: OpenCV built-in functions
img
cv2 . convertScaleAbs ( img , alpha = 0.5 , beta = 0 ) Multi-threading for Video import cv2 from threading import Thread from queue import Queue class VideoCapture : """Threaded video capture for better performance.""" def init ( self , src ) : self . cap = cv2 . VideoCapture ( src ) self . q = Queue ( maxsize = 128 ) self . stopped = False def start ( self ) : Thread ( target = self . _reader , daemon = True ) . start ( ) return self def _reader ( self ) : while not self . stopped : ret , frame = self . cap . read ( ) if not ret : self . stop ( ) break self . q . put ( frame ) def read ( self ) : return self . q . get ( ) def stop ( self ) : self . stopped = True self . cap . release ( )
Usage
cap
VideoCapture ( 0 ) . start ( ) while True : frame = cap . read ( )
Process frame...
if cv2 . waitKey ( 1 ) & 0xFF == ord ( 'q' ) : break cap . stop ( ) Common Pitfalls and Solutions The "BGR vs RGB" Color Confusion OpenCV uses BGR, most other libraries use RGB.
❌ Problem: Colors look wrong in matplotlib
img
cv2 . imread ( 'image.jpg' ) plt . imshow ( img )
Blue and red are swapped!
✅ Solution: Convert to RGB
img_rgb
cv2 . cvtColor ( img , cv2 . COLOR_BGR2RGB ) plt . imshow ( img_rgb )
✅ Alternative: Use OpenCV's imshow
cv2 . imshow ( 'Correct Colors' , img ) cv2 . waitKey ( 0 ) The "Window Won't Close" Problem Windows stay open without proper key handling.
❌ Problem: Window frozen
cv2 . imshow ( 'Image' , img )
Program hangs!
✅ Solution: Always use waitKey
cv2 . imshow ( 'Image' , img ) cv2 . waitKey ( 0 )
Wait for key press
cv2 . destroyAllWindows ( ) The "Video Capture Not Released" Problem Camera stays locked if not released properly.
❌ Problem: Camera locked after crash
cap
cv2 . VideoCapture ( 0 )
... code crashes ...
Camera still locked!
✅ Solution: Use try-finally
cap
cv2 . VideoCapture ( 0 ) try : while True : ret , frame = cap . read ( )
... process ...
finally : cap . release ( ) cv2 . destroyAllWindows ( ) The "Image Modification" Confusion Some operations modify in-place, others return new images.
In-place modification
cv2 . rectangle ( img , ( 10 , 10 ) , ( 100 , 100 ) , ( 0 , 255 , 0 ) , 2 )
Modifies img
Returns new image
blurred
cv2 . GaussianBlur ( img , ( 5 , 5 ) , 0 )
img unchanged
✅ Always use .copy() if you need original
img_copy
img . copy ( ) cv2 . rectangle ( img_copy , ( 10 , 10 ) , ( 100 , 100 ) , ( 0 , 255 , 0 ) , 2 ) The "Contour Hierarchy" Misunderstanding findContours returns different structures based on retrieval mode.
External contours only (most common)
contours , hierarchy = cv2 . findContours ( thresh , cv2 . RETR_EXTERNAL , cv2 . CHAIN_APPROX_SIMPLE )
All contours with full hierarchy
contours , hierarchy = cv2 . findContours ( thresh , cv2 . RETR_TREE , cv2 . CHAIN_APPROX_SIMPLE )
⚠️ hierarchy structure: [Next, Previous, First_Child, Parent]
Most use cases only need RETR_EXTERNAL
OpenCV is the Swiss Army knife of computer vision. Its vast library of optimized algorithms, combined with Python's ease of use, makes it the perfect tool for everything from simple image processing to complex real-time vision systems. Master these fundamentals, and you'll have the foundation to tackle any computer vision challenge.