Layout Analyzer Skill Overview This skill enables document layout analysis using surya - an advanced document understanding system. Detect text blocks, tables, figures, headings, and determine reading order in complex documents. How to Use Provide the document image or PDF Specify what layout elements to detect I'll analyze the structure and return detected regions Example prompts: "Analyze the layout of this document page" "Detect all tables and text blocks in this image" "Determine the reading order for this PDF page" "Find headings and paragraphs in this document" Domain Knowledge surya Fundamentals from surya . detection import DetectionPredictor from surya . layout import LayoutPredictor from surya . reading_order import ReadingOrderPredictor from PIL import Image
Load image
image
Image . open ( "document.png" )
Detect layout elements
layout_predictor
LayoutPredictor ( ) layout_result = layout_predictor ( [ image ] ) Layout Element Types Element Description Text Regular paragraph text Title Document/section titles Section-header Section headings List-item Bulleted/numbered items Table Tabular data Figure Images/diagrams Caption Figure/table captions Footnote Footnotes Formula Mathematical equations Page-header Headers Page-footer Footers Text Detection from surya . detection import DetectionPredictor from PIL import Image
Initialize detector
detector
DetectionPredictor ( )
Load image
image
Image . open ( "document.png" )
Detect text regions
results
detector ( [ image ] )
Access results
for page_result in results : for bbox in page_result . bboxes : print ( f"Text region: { bbox . bbox } " ) print ( f"Confidence: { bbox . confidence } " ) Layout Analysis from surya . layout import LayoutPredictor from PIL import Image
Initialize layout predictor
layout_predictor
LayoutPredictor ( )
Analyze layout
image
Image . open ( "document.png" ) layout_results = layout_predictor ( [ image ] )
Process results
for page_result in layout_results : for element in page_result . bboxes : print ( f"Type: { element . label } " ) print ( f"Bbox: { element . bbox } " ) print ( f"Confidence: { element . confidence } " ) Reading Order Detection from surya . reading_order import ReadingOrderPredictor from surya . layout import LayoutPredictor from PIL import Image
Get layout first
layout_predictor
LayoutPredictor ( ) image = Image . open ( "document.png" ) layout_results = layout_predictor ( [ image ] )
Determine reading order
reading_order_predictor
ReadingOrderPredictor ( ) order_results = reading_order_predictor ( [ image ] , layout_results )
Access ordered elements
for page_result in order_results : for i , element in enumerate ( page_result . ordered_bboxes ) : print ( f" { i + 1 } . { element . label } : { element . bbox } " ) OCR with Layout from surya . ocr import OCRPredictor from surya . layout import LayoutPredictor from PIL import Image
Initialize predictors
ocr_predictor
OCRPredictor ( ) layout_predictor = LayoutPredictor ( )
Load image
image
Image . open ( "document.png" )
Get layout
layout_results
layout_predictor ( [ image ] )
Run OCR
ocr_results
ocr_predictor ( [ image ] )
Combine results
for layout , ocr in zip ( layout_results , ocr_results ) : for layout_elem in layout . bboxes : print ( f"Element: { layout_elem . label } " )
Find OCR text within this layout element
for text_line in ocr . text_lines : if boxes_overlap ( layout_elem . bbox , text_line . bbox ) : print ( f" Text: { text_line . text } " ) Processing PDFs from surya . layout import LayoutPredictor from pdf2image import convert_from_path def analyze_pdf_layout ( pdf_path ) : """Analyze layout of all pages in PDF."""
Convert PDF to images
images
convert_from_path ( pdf_path )
Initialize predictor
layout_predictor
LayoutPredictor ( )
Analyze all pages
results
layout_predictor ( images ) document_structure = [ ] for page_num , page_result in enumerate ( results ) : page_elements = [ ] for element in page_result . bboxes : page_elements . append ( { 'type' : element . label , 'bbox' : element . bbox , 'confidence' : element . confidence } ) document_structure . append ( { 'page' : page_num + 1 , 'elements' : page_elements } ) return document_structure structure = analyze_pdf_layout ( "document.pdf" ) Visualization from surya . layout import LayoutPredictor from PIL import Image , ImageDraw , ImageFont def visualize_layout ( image_path , output_path ) : """Visualize detected layout elements.""" image = Image . open ( image_path ) layout_predictor = LayoutPredictor ( ) results = layout_predictor ( [ image ] )
Create drawing context
draw
ImageDraw . Draw ( image )
Color mapping for element types
colors
{ 'Text' : 'blue' , 'Title' : 'red' , 'Table' : 'green' , 'Figure' : 'purple' , 'Section-header' : 'orange' , 'List-item' : 'cyan' , } for element in results [ 0 ] . bboxes : bbox = element . bbox color = colors . get ( element . label , 'gray' )
Draw rectangle
draw . rectangle ( bbox , outline = color , width = 2 )
Add label
- draw
- .
- text
- (
- (
- bbox
- [
- 0
- ]
- ,
- bbox
- [
- 1
- ]
- -
- 15
- )
- ,
- f"
- {
- element
- .
- label
- }
- (
- {
- element
- .
- confidence
- :
- .2f
- }
- )"
- ,
- fill
- =
- color
- )
- image
- .
- save
- (
- output_path
- )
- return
- output_path
- Best Practices
- Use High-Quality Images
-
- 150+ DPI for best results
- Preprocess if Needed
-
- Deskew rotated documents
- Validate Results
-
- Check confidence scores
- Handle Multi-page
-
- Process pages individually
- Combine with OCR
- Get text within detected regions Common Patterns Document Structure Extraction def extract_document_structure ( image_path ) : """Extract hierarchical document structure.""" from surya . layout import LayoutPredictor from surya . reading_order import ReadingOrderPredictor image = Image . open ( image_path )
Get layout
layout_predictor
LayoutPredictor ( ) layout_results = layout_predictor ( [ image ] )
Get reading order
order_predictor
ReadingOrderPredictor ( ) order_results = order_predictor ( [ image ] , layout_results ) structure = { 'title' : None , 'sections' : [ ] , 'tables' : [ ] , 'figures' : [ ] } current_section = None for element in order_results [ 0 ] . ordered_bboxes : if element . label == 'Title' : structure [ 'title' ] = element elif element . label == 'Section-header' : current_section = { 'header' : element , 'content' : [ ] } structure [ 'sections' ] . append ( current_section ) elif element . label == 'Table' : structure [ 'tables' ] . append ( element ) elif element . label == 'Figure' : structure [ 'figures' ] . append ( element ) elif current_section and element . label in [ 'Text' , 'List-item' ] : current_section [ 'content' ] . append ( element ) return structure Table Region Extraction def extract_table_regions ( image_path ) : """Extract table regions from document.""" from surya . layout import LayoutPredictor image = Image . open ( image_path ) layout_predictor = LayoutPredictor ( ) results = layout_predictor ( [ image ] ) tables = [ ] for element in results [ 0 ] . bboxes : if element . label == 'Table' : bbox = element . bbox
Crop table region
table_image
image . crop ( bbox ) tables . append ( { 'bbox' : bbox , 'image' : table_image , 'confidence' : element . confidence } ) return tables Examples Example 1: Academic Paper Analysis from surya . layout import LayoutPredictor from surya . reading_order import ReadingOrderPredictor from pdf2image import convert_from_path def analyze_academic_paper ( pdf_path ) : """Analyze structure of academic paper.""" images = convert_from_path ( pdf_path ) layout_predictor = LayoutPredictor ( ) order_predictor = ReadingOrderPredictor ( ) paper_structure = { 'pages' : [ ] , 'element_counts' : { 'Title' : 0 , 'Section-header' : 0 , 'Text' : 0 , 'Table' : 0 , 'Figure' : 0 , 'Formula' : 0 , 'Footnote' : 0 } } layout_results = layout_predictor ( images ) order_results = order_predictor ( images , layout_results ) for page_num , ( layout , order ) in enumerate ( zip ( layout_results , order_results ) ) : page_structure = { 'page' : page_num + 1 , 'elements' : [ ] } for element in order . ordered_bboxes : page_structure [ 'elements' ] . append ( { 'type' : element . label , 'bbox' : element . bbox , 'order' : element . position } )
Count element types
if element . label in paper_structure [ 'element_counts' ] : paper_structure [ 'element_counts' ] [ element . label ] += 1 paper_structure [ 'pages' ] . append ( page_structure ) return paper_structure paper = analyze_academic_paper ( 'research_paper.pdf' ) print ( f"Total tables: { paper [ 'element_counts' ] [ 'Table' ] } " ) print ( f"Total figures: { paper [ 'element_counts' ] [ 'Figure' ] } " ) Example 2: Form Field Detection from surya . layout import LayoutPredictor from PIL import Image def detect_form_fields ( image_path ) : """Detect form fields and labels.""" image = Image . open ( image_path ) layout_predictor = LayoutPredictor ( ) results = layout_predictor ( [ image ] ) form_fields = [ ] for element in results [ 0 ] . bboxes :
Look for text elements that might be labels
if element . label == 'Text' :
Check if there's a box/line nearby (potential input field)
form_fields . append ( { 'type' : 'potential_label' , 'bbox' : element . bbox , 'confidence' : element . confidence } ) return form_fields fields = detect_form_fields ( 'form.png' ) print ( f"Found { len ( fields ) } potential form elements" ) Example 3: Multi-column Article from surya . layout import LayoutPredictor from surya . reading_order import ReadingOrderPredictor from PIL import Image def process_multicolumn_article ( image_path ) : """Process multi-column article layout.""" image = Image . open ( image_path ) layout_predictor = LayoutPredictor ( ) order_predictor = ReadingOrderPredictor ( ) layout_results = layout_predictor ( [ image ] ) order_results = order_predictor ( [ image ] , layout_results )
Group elements by column
image_width
image . width column_threshold = image_width / 2 columns = { 'left' : [ ] , 'right' : [ ] , 'full_width' : [ ] } for element in order_results [ 0 ] . ordered_bboxes : bbox = element . bbox element_center = ( bbox [ 0 ] + bbox [ 2 ] ) / 2 element_width = bbox [ 2 ] - bbox [ 0 ]
Determine column
if element_width
column_threshold * 1.5 : columns [ 'full_width' ] . append ( element ) elif element_center < column_threshold : columns [ 'left' ] . append ( element ) else : columns [ 'right' ] . append ( element ) return { 'layout' : 'multi-column' , 'columns' : columns , 'reading_order' : order_results [ 0 ] . ordered_bboxes } article = process_multicolumn_article ( 'newspaper_page.png' ) print ( f"Left column: { len ( article [ 'columns' ] [ 'left' ] ) } elements" ) print ( f"Right column: { len ( article [ 'columns' ] [ 'right' ] ) } elements" ) Limitations Handwritten layouts may be inaccurate Very small text regions may be missed Complex nested layouts challenging GPU recommended for batch processing Multi-language support varies Installation pip install surya-ocr
For PDF processing
pip install pdf2image Resources surya GitHub Model Documentation Examples