crawl-url

安装量: 73
排名: #10641

安装

npx skills add https://github.com/tavily-ai/tavily-plugins --skill crawl-url
URL Crawler
Crawls websites using Tavily Crawl API and saves each page as a separate markdown file in a flat directory structure.
Prerequisites
Tavily API Key Required
- Get your key at
https://tavily.com
Add to
~/.claude/settings.json
:
{
"env"
:
{
"TAVILY_API_KEY"
:
"tvly-your-api-key-here"
}
}
Restart Claude Code after adding your API key.
When to Use
Use this skill when the user wants to:
Crawl and extract content from a website
Download API documentation, framework docs, or knowledge bases
Save web content locally for offline access or analysis
Usage
Execute the crawl script with a URL and optional instruction:
python scripts/crawl_url.py
<
URL
>
[
--instruction
"guidance text"
]
Required Parameters
URL
The website to crawl (e.g.,
https://docs.stripe.com/api
)
Optional Parameters
--instruction, -i
Natural language guidance for the crawler (e.g., "Focus on API endpoints only")
--output, -o
Output directory (default:
/crawled_context/
)
--depth, -d
Max crawl depth (default: 2, range: 1-5)
--breadth, -b
Max links per level (default: 50)
--limit, -l
Max total pages to crawl (default: 50)
Output
The script creates a flat directory structure at
/crawled_context//
with one markdown file per crawled page. Filenames are derived from URLs (e.g.,
docs_stripe_com_api_authentication.md
).
Each markdown file includes:
Frontmatter with source URL and crawl timestamp
The extracted content in markdown format
Examples
Basic Crawl
python scripts/crawl_url.py https://docs.anthropic.com
Crawls the Anthropic docs with default settings, saves to
/crawled_context/docs_anthropic_com/
.
With Instruction
python scripts/crawl_url.py https://react.dev
--instruction
"Focus on API reference pages and hooks documentation"
Uses natural language instruction to guide the crawler toward specific content.
Custom Output Directory
python scripts/crawl_url.py https://docs.stripe.com/api
-o
./stripe-api-docs
Saves results to a custom directory.
Adjust Crawl Parameters
python scripts/crawl_url.py https://nextjs.org/docs
--depth
3
--breadth
100
--limit
200
Increases crawl depth, breadth, and page limit for more comprehensive coverage.
Important Notes
API Key Required
Set
TAVILY_API_KEY
environment variable (loads from
.env
if available)
Crawl Time
Deeper crawls take longer (depth 3+ may take many minutes)
Filename Safety
URLs are converted to safe filenames automatically
Flat Structure
All files saved in
/crawled_context//
directory regardless of original URL hierarchy
Duplicate Prevention
Files are overwritten if URLs generate identical filenames
返回排行榜