URL Crawler

Crawls websites using Tavily Crawl API and saves each page as a separate markdown file in a flat directory structure.

Prerequisites

Tavily API Key Required

- Get your key at

https://tavily.com

Add to

~/.claude/settings.json

:

{

"env"

:

{

"TAVILY_API_KEY"

:

"tvly-your-api-key-here"

}

Restart Claude Code after adding your API key.

When to Use

Use this skill when the user wants to:

Crawl and extract content from a website

Download API documentation, framework docs, or knowledge bases

Save web content locally for offline access or analysis

Usage

Execute the crawl script with a URL and optional instruction:

python scripts/crawl_url.py

<

URL

>

[

--instruction

"guidance text"

]

Required Parameters

URL

The website to crawl (e.g.,

https://docs.stripe.com/api

)

Optional Parameters

--instruction, -i

Natural language guidance for the crawler (e.g., "Focus on API endpoints only")

--output, -o

Output directory (default:

/crawled_context/

)

--depth, -d

Max crawl depth (default: 2, range: 1-5)

--breadth, -b

Max links per level (default: 50)

--limit, -l

Max total pages to crawl (default: 50)

Output

The script creates a flat directory structure at

/crawled_context//

with one markdown file per crawled page. Filenames are derived from URLs (e.g.,

docs_stripe_com_api_authentication.md

).

Each markdown file includes:

Frontmatter with source URL and crawl timestamp

The extracted content in markdown format

Examples

Basic Crawl

python scripts/crawl_url.py https://docs.anthropic.com

Crawls the Anthropic docs with default settings, saves to

/crawled_context/docs_anthropic_com/

.

With Instruction

python scripts/crawl_url.py https://react.dev

--instruction

"Focus on API reference pages and hooks documentation"

Uses natural language instruction to guide the crawler toward specific content.

Custom Output Directory

python scripts/crawl_url.py https://docs.stripe.com/api

-o

./stripe-api-docs

Saves results to a custom directory.

Adjust Crawl Parameters

python scripts/crawl_url.py https://nextjs.org/docs

--depth

3

--breadth

100

--limit

200

Increases crawl depth, breadth, and page limit for more comprehensive coverage.

Important Notes

API Key Required

Set

TAVILY_API_KEY

environment variable (loads from

.env

if available)

Crawl Time

Deeper crawls take longer (depth 3+ may take many minutes)

Filename Safety

URLs are converted to safe filenames automatically

Flat Structure

All files saved in
/crawled_context//
directory regardless of original URL hierarchy
Duplicate Prevention: Files are overwritten if URLs generate identical filenames

crawl-url

安装