Structure, validate, and format long-form markdown content for documentation, blogs, and static site generators. Auto-generate tables of contents, add frontmatter, validate structure, and convert between markdown flavors.
Workflow
The markdown formatting process follows these steps:
-
Load - Read markdown file or content
-
Validate - Check heading hierarchy, broken links, structure issues
-
Format - Apply formatting rules (spacing, code blocks, etc.)
-
Generate - Add TOC, frontmatter, cross-references
-
Export - Save in target markdown flavor
Quick Start
from scripts.markdown_formatter import MarkdownFormatter
# Load and format markdown
formatter = MarkdownFormatter(file_path='document.md')
# Generate table of contents
toc = formatter.generate_toc(max_depth=3)
# Validate structure
validation = formatter.validate_structure()
if not validation['valid']:
print("Issues found:")
for error in validation['errors']:
print(f" - {error['message']}")
# Add frontmatter
formatter.add_frontmatter({
'title': 'My Document',
'author': 'John Doe',
'date': '2024-01-15'
})
# Export formatted version
formatter.export(
output_path='formatted.md',
include_toc=True,
target_flavor='github'
)
Formatting Operations
1. Table of Contents Generation
Auto-generate TOC from document heading structure:
-
Customizable depth (H2, H3, etc.)
-
GitHub-style anchor links
-
Numbered or bulleted format
-
Smart indentation based on heading levels
2. Frontmatter Management
Add YAML/TOML/JSON frontmatter for static site generators:
-
YAML (
---) for Jekyll/Hugo -
TOML (
+++) for Hugo -
JSON for custom parsers
-
Structured metadata (title, author, date, tags, etc.)
3. Structure Validation
Check document structure for common issues:
-
Heading hierarchy - Detect skipped levels (H2 → H4)
-
Broken links - Find invalid internal (#anchors) and external links
-
Duplicate headings - Identify heading ID conflicts
-
Missing elements - Check for required sections
4. Code Block Formatting
Enhance code blocks with syntax highlighting markers:
-
Add language tags to fenced code blocks
-
Convert indented code to fenced blocks
-
Default language specification
-
Consistent formatting
5. Cross-Reference Linking
Auto-link headings and create cross-references:
-
Generate unique heading IDs
-
Link section mentions (e.g., "see Introduction")
-
Create anchor links for internal navigation
-
Handle duplicate heading names
6. Spacing and Consistency
Apply consistent formatting rules:
-
Line breaks around headings
-
List formatting (bullets, numbers)
-
Code block spacing
-
Paragraph breaks
-
Horizontal rules
7. Flavor Conversion
Convert between markdown flavors:
-
GitHub Flavored Markdown - Task lists, tables, syntax highlighting
-
CommonMark - Standard specification
-
Jekyll - Liquid templates, includes
-
Hugo - Shortcodes, taxonomies
Validation Checks
The validator identifies these common issues:
| Heading Skip | Level jumps (H2 → H4) | Missing H3 between H2 and H4
| Broken Link
| Invalid internal/external link
| [link](#missing-section)
| Duplicate Heading | Same heading appears multiple times | Two "Introduction" headings
| Missing ID | Heading lacks unique identifier | Anchor link fails
| Invalid Structure | Incorrect nesting or formatting | List inside heading
API Reference
MarkdownFormatter
Initialization:
formatter = MarkdownFormatter(
file_path='document.md', # OR
content='# Markdown text...'
)
Parameters:
-
file_path(str): Path to markdown file (optional) -
content(str): Direct markdown content (optional)
One of file_path or content must be provided.
Table of Contents
generate_toc()
toc = formatter.generate_toc(
max_depth=3, # Max heading level (1-6)
start_level=2, # Start from H2 (skip H1)
style='github' # 'github', 'numbered', 'bullets'
)
Returns: TOC markdown string
Styles:
-
github- Bulleted list with anchor links -
numbered- Numbered outline -
bullets- Simple bullet list
Example Output (github style):
## Table of Contents
- [Introduction](#introduction)
- [Getting Started](#getting-started)
- [Installation](#installation)
- [Configuration](#configuration)
- [Advanced Topics](#advanced-topics)
Frontmatter
add_frontmatter()
content = formatter.add_frontmatter(
metadata={
'title': 'Document Title',
'author': 'John Doe',
'date': '2024-01-15',
'tags': ['markdown', 'documentation']
},
format='yaml' # 'yaml', 'toml', or 'json'
)
Returns: Markdown content with frontmatter prepended
Example Output (YAML):
---
title: Document Title
author: John Doe
date: 2024-01-15
tags:
- markdown
- documentation
---
Validation
validate_structure()
result = formatter.validate_structure()
Returns: Dictionary with validation results
{
'valid': bool,
'errors': [
{
'type': 'heading_skip',
'line': 45,
'message': 'Heading level jumps from H2 to H4'
}
],
'warnings': [
{
'type': 'duplicate_heading',
'line': 120,
'message': 'Heading "Introduction" appears multiple times'
}
]
}
Code Blocks
format_code_blocks()
content = formatter.format_code_blocks(
add_language_tags=True,
default_language='text'
)
Returns: Markdown with formatted code blocks
Converts:
code here
To:
```text
code here
### Cross-References
#### auto_link_headings()
content = formatter.auto_link_headings()
**Returns**: Markdown with heading IDs and cross-reference links
Generates GitHub-style anchors:
- `# Getting Started` → `<a id="getting-started"></a>`
- Links "see Getting Started" → `[Getting Started](#getting-started)`
### Spacing
#### fix_spacing()
content = formatter.fix_spacing()
**Returns**: Markdown with consistent spacing
Applies rules:
- 2 blank lines before H1
- 1 blank line before H2-H6
- 1 blank line around code blocks
- 1 blank line around lists
### Flavor Conversion
#### convert_to_flavor()
content = formatter.convert_to_flavor(target='jekyll')
**Parameters**:
- `target` (str): 'github', 'commonmark', 'jekyll', or 'hugo'
**Returns**: Converted markdown string
### Export
#### export()
formatter.export( output_path='formatted.md', include_toc=True, include_frontmatter=True, target_flavor='github' )
**Parameters**:
- `output_path` (str): Output file path
- `include_toc` (bool): Add TOC at beginning
- `include_frontmatter` (bool): Preserve/add frontmatter
- `target_flavor` (str): Target markdown flavor
## CLI Usage
### Generate TOC
python scripts/markdown_formatter.py \ --input document.md \ --toc \ --toc-depth 3 \ --toc-style github \ --output formatted.md
### Add Frontmatter
From command line
python scripts/markdown_formatter.py \ --input document.md \ --frontmatter title="My Doc" author="John Doe" date="2024-01-15" \ --output formatted.md
From file
python scripts/markdown_formatter.py \ --input document.md \ --frontmatter-file metadata.yaml \ --output formatted.md
### Validate Structure
python scripts/markdown_formatter.py \ --input document.md \ --validate \ --format json
**Output**:
{ "valid": false, "errors": [ { "type": "heading_skip", "line": 45, "message": "Heading level jumps from H2 to H4" } ], "warnings": [] }
### Full Formatting
python scripts/markdown_formatter.py \ --input document.md \ --toc \ --frontmatter title="My Doc" \ --auto-link \ --fix-spacing \ --flavor github \ --output formatted.md
### Batch Processing
Format all markdown files in directory
for file in docs/*.md; do python scripts/markdown_formatter.py \ --input "$file" \ --toc \ --fix-spacing \ --output "formatted/$file" done
### CLI Arguments
| `--input`, `-i`
| Input markdown file
| Required
| `--output`, `-o`
| Output file path
| stdout
| `--toc`
| Generate table of contents
| False
| `--toc-depth`
| Max TOC depth (1-6)
| 3
| `--toc-style`
| TOC style (github/numbered/bullets)
| github
| `--frontmatter`
| Key=value pairs for frontmatter
| -
| `--frontmatter-file`
| YAML file with frontmatter
| -
| `--auto-link`
| Auto-link headings
| False
| `--fix-spacing`
| Fix spacing and formatting
| False
| `--flavor`
| Target markdown flavor
| github
| `--validate`
| Validate structure only
| False
| `--format`
| Output format for validation (json/text)
| text
## Examples
### Example 1: Auto-Generate TOC
formatter = MarkdownFormatter(file_path='guide.md') toc = formatter.generate_toc(max_depth=3, style='github')
print(toc)
## Table of Contents
- Introduction
- Setup
- Installation
- Configuration
### Example 2: Add Jekyll Frontmatter
formatter = MarkdownFormatter(file_path='post.md')
formatter.add_frontmatter({ 'layout': 'post', 'title': 'Getting Started with Markdown', 'date': '2024-01-15', 'categories': ['tutorial', 'markdown'], 'tags': ['beginner', 'documentation'] }, format='yaml')
formatter.export('_posts/2024-01-15-getting-started.md')
### Example 3: Validate Document Structure
formatter = MarkdownFormatter(file_path='documentation.md') result = formatter.validate_structure()
if not result['valid']: print("Errors found:") for error in result['errors']: print(f"Line {error['line']}: {error['message']}")
print("\nWarnings:")
for warning in result['warnings']:
print(f"Line {warning['line']}: {warning['message']}")
else: print("Document structure is valid!")
### Example 4: Fix Common Issues
formatter = MarkdownFormatter(file_path='messy.md')
Fix spacing issues
formatter.fix_spacing()
Format code blocks
formatter.format_code_blocks(default_language='python')
Add heading IDs
formatter.auto_link_headings()
Export cleaned version
formatter.export('clean.md', target_flavor='github')
### Example 5: Convert for Hugo Static Site
formatter = MarkdownFormatter(file_path='article.md')
Add Hugo frontmatter
formatter.add_frontmatter({ 'title': 'My Article', 'date': '2024-01-15T10:00:00Z', 'draft': False, 'tags': ['hugo', 'static-site'], 'categories': ['web-development'] }, format='toml')
Generate TOC
toc = formatter.generate_toc(max_depth=2)
Convert to Hugo flavor
formatter.convert_to_flavor('hugo')
Export
formatter.export( output_path='content/posts/my-article.md', include_toc=True, target_flavor='hugo' )
### Example 6: Batch Validation
Validate all markdown files
for file in docs/*/.md; do echo "Validating $file..." python scripts/markdown_formatter.py \ --input "$file" \ --validate \ --format json > "${file}.validation.json" done
Find files with errors
jq -r 'select(.valid == false) | input_filename' docs/*/.validation.json
## Dependencies
markdown>=3.5.0 pyyaml>=6.0.0 beautifulsoup4>=4.12.0 pandas>=2.0.0
Install dependencies:
pip install -r scripts/requirements.txt
```
Limitations
-
Link Validation: External link checking requires network requests (not performed by default)
-
Markdown Parsing: Uses Python-Markdown library; some edge cases may differ from other parsers
-
Flavor Differences: Not all flavor-specific features are converted (e.g., Hugo shortcodes)
-
Heading Anchors: Anchor generation follows GitHub algorithm but may differ from other platforms
-
Code Language Detection: Automatic language detection is limited; manual tags recommended
-
Large Files: Very large files (>10MB) may be slow to process
-
Unicode: Some unicode characters in heading anchors may cause issues
-
Nested Lists: Complex nested list structures may not format perfectly
-
HTML in Markdown: Raw HTML blocks are preserved but not validated
-
Math Equations: LaTeX math equations are not parsed or validated
Markdown Flavor Notes
GitHub Flavored Markdown (GFM)
-
Task lists:
- [ ] Task/- [x] Done -
Tables with alignment
-
Strikethrough:
~~text~~ -
Automatic link detection
CommonMark
-
Strict specification adherence
-
No extensions (no task lists, no tables)
-
Predictable parsing
Jekyll
-
Liquid templating:
{{ variable }} -
Includes:
{% include file.html %} -
Frontmatter required
Hugo
-
Shortcodes:
{{< shortcode >}} -
TOML frontmatter preferred
-
Taxonomies (tags, categories)
-
Nested sections