SEO Technical: robots.txt

Guides configuration and auditing of robots.txt for search engine and AI crawler control.

When invoking

On

first use

, if helpful, open with 1–2 sentences on what this skill covers and why it matters, then provide the main output. On

subsequent use

or when the user asks to skip, go directly to the main output.

Scope (Technical SEO)

Robots.txt

Review Disallow/Allow; avoid blocking important pages

Crawler access

Ensure crawlers (including AI crawlers) can access key pages

Indexing

Misconfigured robots.txt can block indexing; verify no accidental blocks

Initial Assessment

Check for product marketing context first:

If

.claude/product-marketing-context.md

or

.cursor/product-marketing-context.md

exists, read it for site URL and indexing goals.

Identify:

Site URL

Base domain (e.g.,

https://example.com

)

Indexing scope

Full site, partial, or specific paths to exclude
AI crawler strategy: Allow search/indexing vs. block training data crawlers Best Practices Purpose and Limitations Point Note Purpose Controls crawler access; does NOT prevent indexing (disallowed URLs may still appear in search without snippet) No-index Use noindex meta or auth for sensitive content; robots.txt is publicly readable Indexed vs non-indexed Not all content should be indexed. robots.txt and noindex complement each other: robots for path-level crawl control, noindex for page-level indexing. See indexing Advisory Rules are advisory; malicious crawlers may ignore Location and Format Item Requirement Path Site root: https://example.com/robots.txt Encoding UTF-8 plain text Standard RFC 9309 (Robots Exclusion Protocol) Core Directives Directive Purpose Example User-agent: Target crawler User-agent: Googlebot , User-agent: * Disallow: Block path prefix Disallow: /admin/ Allow: Allow path (can override Disallow) Allow: /public/ Sitemap: Declare sitemap absolute URL Sitemap: https://example.com/sitemap.xml Clean-param: Strip query params (Yandex) See below Critical: Do Not Block Rendering Resources Do not block CSS, JS, images; Google needs them to render pages Only block paths that don't need crawling: admin, API, temp files AI Crawler Strategy User-agent Purpose Typical OAI-SearchBot ChatGPT search Allow GPTBot OpenAI training Disallow Claude-SearchBot Claude search Allow ClaudeBot Anthropic training Disallow PerplexityBot Perplexity search Allow Google-Extended Gemini training Disallow CCBot Common Crawl Disallow Clean-param (Yandex) Clean-param: utm_source&utm_medium&utm_campaign&utm_term&utm_content&ref&fbclid&gclid Output Format Current state (if auditing) Recommended robots.txt (full file) Compliance checklist References : Google robots.txt

robots-txt

安装