安装
npx skills add https://github.com/tavily-ai/tavily-plugins --skill crawl-url
- URL Crawler
- Crawls websites using Tavily Crawl API and saves each page as a separate markdown file in a flat directory structure.
- Prerequisites
- Tavily API Key Required
- - Get your key at
- https://tavily.com
- Add to
- ~/.claude/settings.json
- :
- {
- "env"
- :
- {
- "TAVILY_API_KEY"
- :
- "tvly-your-api-key-here"
- }
- }
- Restart Claude Code after adding your API key.
- When to Use
- Use this skill when the user wants to:
- Crawl and extract content from a website
- Download API documentation, framework docs, or knowledge bases
- Save web content locally for offline access or analysis
- Usage
- Execute the crawl script with a URL and optional instruction:
- python scripts/crawl_url.py
- <
- URL
- >
- [
- --instruction
- "guidance text"
- ]
- Required Parameters
- URL
-
- The website to crawl (e.g.,
- https://docs.stripe.com/api
- )
- Optional Parameters
- --instruction, -i
-
- Natural language guidance for the crawler (e.g., "Focus on API endpoints only")
- --output, -o
-
- Output directory (default:
- /crawled_context/
- )
- --depth, -d
-
- Max crawl depth (default: 2, range: 1-5)
- --breadth, -b
-
- Max links per level (default: 50)
- --limit, -l
-
- Max total pages to crawl (default: 50)
- Output
- The script creates a flat directory structure at
- /crawled_context//
- with one markdown file per crawled page. Filenames are derived from URLs (e.g.,
- docs_stripe_com_api_authentication.md
- ).
- Each markdown file includes:
- Frontmatter with source URL and crawl timestamp
- The extracted content in markdown format
- Examples
- Basic Crawl
- python scripts/crawl_url.py https://docs.anthropic.com
- Crawls the Anthropic docs with default settings, saves to
- /crawled_context/docs_anthropic_com/
- .
- With Instruction
- python scripts/crawl_url.py https://react.dev
- --instruction
- "Focus on API reference pages and hooks documentation"
- Uses natural language instruction to guide the crawler toward specific content.
- Custom Output Directory
- python scripts/crawl_url.py https://docs.stripe.com/api
- -o
- ./stripe-api-docs
- Saves results to a custom directory.
- Adjust Crawl Parameters
- python scripts/crawl_url.py https://nextjs.org/docs
- --depth
- 3
- --breadth
- 100
- --limit
- 200
- Increases crawl depth, breadth, and page limit for more comprehensive coverage.
- Important Notes
- API Key Required
-
- Set
- TAVILY_API_KEY
- environment variable (loads from
- .env
- if available)
- Crawl Time
-
- Deeper crawls take longer (depth 3+ may take many minutes)
- Filename Safety
-
- URLs are converted to safe filenames automatically
- Flat Structure
-
- All files saved in
- /crawled_context//
- directory regardless of original URL hierarchy
- Duplicate Prevention
- Files are overwritten if URLs generate identical filenames
← 返回排行榜