Documentation Quality Assurance Systematic documentation quality system combining automated validation scripts with ToolUniverse-specific structural audits. When to Use Pre-release documentation review After major refactoring (commands, APIs, tool counts changed) User reports confusing or outdated documentation Circular navigation or structural problems suspected Want to establish automated validation pipeline Approach: Two-Phase Strategy Phase A: Automated Validation (15-20 min) Create validation scripts for systematic detection Test commands, links, terminology consistency Priority-based fixes (blockers → polish) Phase B: ToolUniverse-Specific Audit (20-25 min) Circular navigation checks MCP configuration duplication Tool count consistency Auto-generated file conflicts Phase A: Automated Validation A1. Build Validation Script Create scripts/validate_documentation.py :
!/usr/bin/env python3
"""Documentation validator for ToolUniverse""" import re import glob from pathlib import Path DOCS_ROOT = Path ( "docs" )
ToolUniverse-specific patterns
DEPRECATED_PATTERNS
[ ( r"python -m tooluniverse.server" , "tooluniverse-server" ) , ( r"600+?\s+tools" , "1000+ tools" ) , ( r"750+?\s+tools" , "1000+ tools" ) , ] def is_false_positive ( match , content ) : """Smart context checking to avoid false positives""" start = max ( 0 , match . start ( ) - 100 ) end = min ( len ( content ) , match . end ( ) + 100 ) context = content [ start : end ] . lower ( )
Skip if discussing deprecation itself
if any ( kw in context for kw in [ 'deprecated' , 'old version' , 'migration' ] ) : return True
Skip technical values (ports, dimensions, etc.)
if any ( kw in context for kw in [ 'width' , 'height' , 'port' , '":"' ] ) : return True return False def validate_file ( filepath ) : """Check one file for issues""" with open ( filepath , 'r' , encoding = 'utf-8' ) as f : content = f . read ( ) issues = [ ]
Check deprecated patterns
for old_pattern , new_text in DEPRECATED_PATTERNS : matches = re . finditer ( old_pattern , content ) for match in matches : if is_false_positive ( match , content ) : continue line_num = content [ : match . start ( ) ] . count ( '\n' ) + 1 issues . append ( { 'file' : filepath , 'line' : line_num , 'severity' : 'HIGH' , 'found' : match . group ( ) , 'suggestion' : new_text } ) return issues
Scan all docs
all_issues
[ ] for doc_file in glob . glob ( str ( DOCS_ROOT / "/*.md" ) , recursive = True ) : all_issues . extend ( validate_file ( doc_file ) ) for doc_file in glob . glob ( str ( DOCS_ROOT / "/*.rst" ) , recursive = True ) : all_issues . extend ( validate_file ( doc_file ) )
Report
if all_issues : print ( f"❌ Found { len ( all_issues ) } issues\n" ) for issue in all_issues : print ( f" { issue [ 'file' ] } : { issue [ 'line' ] } [ { issue [ 'severity' ] } ]" ) print ( f" Found: { issue [ 'found' ] } " ) print ( f" Should be: { issue [ 'suggestion' ] } \n" ) exit ( 1 ) else : print ( "✅ Documentation validation passed" ) exit ( 0 ) A2. Command Accuracy Check Test that commands in docs actually work:
Extract and test commands
grep
-r
"^\s\$\s"
docs/
|
while
read
line
;
do
cmd
=
$(
echo
"
$line
"
|
sed
's/.\$ //'
|
cut
-d
' '
-f1
)
if
!
command
-v
"
$cmd
"
&>
/dev/null
;
then
echo
"❌ Command not found:
$cmd
in
$line
"
fi
done
A3. Link Integrity Check
For RST docs:
def
check_rst_links
(
docs_root
)
:
"""Validate :doc: references"""
pattern
=
r':doc:([^]+)`'
for
rst_file
in
glob
.
glob
(
f"
{
docs_root
}
//.rst"
,
recursive
=
True
)
:
with
open
(
rst_file
)
as
f
:
content
=
f
.
read
(
)
matches
=
re
.
finditer
(
pattern
,
content
)
for
match
in
matches
:
ref
=
match
.
group
(
1
)
Check if target exists
possible
[ f" { ref } .rst" , f" { ref } .md" , f" { ref } /index.rst" ] if not any ( Path ( docs_root , p ) . exists ( ) for p in possible ) : print ( f"❌ Broken link in { rst_file } : { ref } " ) A4. Terminology Consistency Track variations and standardize:
Define standard terms
TERMINOLOGY
- {
- 'api_endpoint'
- :
- [
- 'endpoint'
- ,
- 'url'
- ,
- 'route'
- ,
- 'path'
- ]
- ,
- 'tool_count'
- :
- [
- 'tools'
- ,
- 'resources'
- ,
- 'integrations'
- ]
- ,
- }
- def
- check_terminology
- (
- content
- )
- :
- """Find inconsistent terminology"""
- for
- standard
- ,
- variations
- in
- TERMINOLOGY
- .
- items
- (
- )
- :
- counts
- =
- {
- v
- :
- content
- .
- lower
- (
- )
- .
- count
- (
- v
- )
- for
- v
- in
- variations
- }
- if
- len
- (
- [
- c
- for
- c
- in
- counts
- .
- values
- (
- )
- if
- c
- >
- 0
- ]
- )
- >
- 2
- :
- return
- f"Inconsistent terminology:
- {
- counts
- }
- "
- return
- None
- Phase B: ToolUniverse-Specific Audit
- B1. Circular Navigation Check
- Issue
- Documentation pages that reference each other in loops. Check manually :
Find cross-references
grep -r ":doc:`" docs/*.rst | grep -E "(quickstart|getting_started|installation)" Checklist : Is there a clear "Start Here" on docs/index.rst ? Does navigation follow linear path: index → quickstart → getting_started → guides? No "you should have completed X first" statements that create dependency loops? Common patterns to fix : quickstart.rst → "See getting_started" getting_started.rst → "Complete quickstart first" B2. Duplicate Content Check Common duplicates in ToolUniverse : Multiple FAQs: docs/faq.rst and docs/help/faq.rst Getting started: docs/installation.rst , docs/quickstart.rst , docs/getting_started.rst MCP configuration: All files in docs/guide/building_ai_scientists/ Detection :
Find MCP config duplication
- rg
- "MCP.*configuration"
- docs/
- -l
- |
- wc
- -l
- rg
- "pip install tooluniverse"
- docs/
- -l
- |
- wc
- -l
- Action
-
- Consolidate or clearly differentiate
- B3. Tool Count Consistency
- Standard
- Use "1000+ tools" consistently. Detection :
Find all tool count mentions
- rg
- "[0-9]++?\s+(tools|resources|integrations)"
- docs/ --no-filename
- |
- sort
- -u
- Check
- :
- Are different numbers used (600, 750, 1195)?
- Is "1000+ tools" used consistently?
- Exact counts avoided in favor of "1000+"?
- B4. Auto-Generated File Headers
- Auto-generated directories
- :
- docs/tools/*_tools.rst
- (from
- generate_config_index.py
- )
- docs/api/*.rst
- (from
- sphinx-apidoc
- )
- Required header
- :
- .. AUTO-GENERATED - DO NOT EDIT MANUALLY
- .. Generated by: docs/generate_config_index.py
- .. Last updated: 2024-02-05
- ..
- .. To modify, edit source files and regenerate.
- Check
- :
- head
- -5
- docs/tools/*_tools.rst
- |
- grep
- "AUTO-GENERATED"
- B5. CLI Tools Documentation
- Check pyproject.toml for all CLIs
- :
- grep
- -A
- 20
- "[project.scripts]"
- pyproject.toml
- Common undocumented
- :
- tooluniverse-expert-feedback
- tooluniverse-expert-feedback-web
- generate-mcp-tools
- Action
- Ensure all in docs/reference/cli_tools.rst B6. Environment Variables Discovery :
Find all env vars in code
rg "os.getenv|os.environ" src/tooluniverse/ -o | sort -u rg "TOOLUNIVERSE_[A-Z_]+" src/tooluniverse/ -o | sort -u Categories to document : Cache: TOOLUNIVERSE_CACHE_ Logging: TOOLUNIVERSE_LOG_ LLM: TOOLUNIVERSE_LLM_ API keys: _API_KEY Check : Does docs/reference/environment_variables.rst exist? Are variables categorized? Each has: default, description, example? Is there .env.template at project root? B7. ToolUniverse-Specific Jargon Terms to define on first use : Tool Specification EFO ID MCP, SMCP Compact Mode Tool Finder AI Scientist Check : Is there docs/glossary.rst ? Terms defined inline with :term: references? Glossary linked from main index? B8. CI/CD Documentation Regeneration Required in .github/workflows/deploy-docs.yml : - name : Regenerate tool documentation run : | cd docs python generate_config_index.py python generate_remote_tools_docs.py python generate_tool_reference.py Check : CI/CD regenerates docs before build? Regeneration happens BEFORE Sphinx build? docs/api/ excluded from cache? Priority Framework Issue Severity Severity Definition Examples Timeline CRITICAL Blocks release Broken builds, dangerous instructions Immediate HIGH Blocks users Wrong commands, broken setup Same day MEDIUM Causes confusion Inconsistent terminology, unclear examples Same week LOW Reduces quality Long files, minor formatting Future task Fix Order Run automated validation → Fix HIGH issues Check circular navigation → Fix CRITICAL loops Verify tool counts → Standardize to "1000+" Check auto-generated headers → Add missing Validate CLI docs → Document all from pyproject.toml Check env vars → Create reference page Review jargon → Create/update glossary Verify CI/CD → Add regeneration steps Validation Checklist Before considering docs "done": Accuracy Automated validation passes All commands tested Version numbers current Counts match reality Structure (ToolUniverse-specific) No circular navigation Clear "Start Here" entry point Linear learning path Max 2-3 level hierarchy Consistency "1000+ tools" everywhere Same terminology throughout Auto-generated files have headers All CLIs documented Completeness All features documented All CLIs in pyproject.toml covered All env vars documented Glossary includes all jargon Output: Audit Report
- Documentation Quality Report
- **
- Date
- **
-
- [date]
- **
- Scope
- **
- Automated validation + ToolUniverse audit
Executive Summary
Files scanned: X
Issues found: Y (Critical: A, High: B, Medium: C, Low: D)
Critical Issues 1. ** [Issue] ** - Location: file:line - Problem: [description] - Fix: [action] - Effort: [time]
Automated Validation Results
Deprecated commands: X instances
Inconsistent counts: Y instances
Broken links: Z instances
ToolUniverse-Specific Findings
Circular navigation: [yes/no]
Tool count variations: [list]
Missing CLI docs: [list]
Auto-generated headers: X missing
Recommendations 1. Immediate (today): [list] 2. This week: [list] 3. Next sprint: [list]
Validation Command
Run
python scripts/validate_documentation.py
to verify fixes
CI/CD Integration
Add to
.github/workflows/validate-docs.yml
:
name
:
Validate Documentation
on
:
[
pull_request
]
jobs
:
validate
:
runs-on
:
ubuntu
-
latest
steps
:
-
uses
:
actions/checkout@v2
-
name
:
Install dependencies
run
:
pip install
-
r requirements.txt
-
name
:
Run validation
run
:
python scripts/validate_documentation.py
-
name
:
Check auto
-
generated headers
run
:
|
for f in docs/tools/*_tools.rst; do
if ! head -1 "$f" | grep -q "AUTO-GENERATED"; then
echo "Missing header: $f"
exit 1
fi
done
Common Issues Quick Reference
Issue
Detection
Fix
Deprecated command
rg "old-cmd" docs/
Replace with
new-cmd
Wrong tool count
rg "[0-9]+ tools" docs/
Change to "1000+ tools"
Circular nav
Manual trace
Remove back-references
Missing header
head -1 file.rst
Add AUTO-GENERATED header
Undocumented CLI
Check pyproject.toml
Add to cli_tools.rst
Missing env var
rg "os.getenv" src/
Add to env vars reference
Best Practices
Automate first
- Build validation before manual audit
Context matters
- Smart pattern matching avoids false positives
Fix systematically
- Batch similar issues together
Validate continuously
- Add to CI/CD pipeline
ToolUniverse-specific last
- Automated checks catch most issues
Success Criteria
Documentation quality achieved when:
✅ Automated validation reports 0 HIGH issues
✅ No circular navigation
✅ "1000+ tools" used consistently
✅ All auto-generated files have headers
✅ All CLIs from pyproject.toml documented
✅ All env vars have reference page
✅ Glossary covers all technical terms
✅ CI/CD validates on every PR