Firecrawl Scraping Overview Scrape individual web pages and convert them to clean, LLM-ready markdown. Handles JavaScript rendering, anti-bot protection, and dynamic content. Quick Decision Tree What are you scraping? │ ├── Single page (article, blog, docs) │ └── references/single-page.md │ └── Script: scripts/firecrawl_scrape.py │ └── Entire website (multiple pages, crawling) └── references/website-crawler.md └── (Use Apify Website Content Crawler for multi-page) Environment Setup

Required in .env

FIRECRAWL_API_KEY

fc-your-api-key-here

Get your API key:

https://firecrawl.dev/app/api-keys

Common Usage

Simple Scrape

python scripts/firecrawl_scrape.py

"https://example.com/article"

With Options

python scripts/firecrawl_scrape.py

"https://wsj.com/article"

\

--proxy

stealth

\

--format

markdown summary

\

--timeout

60000

Proxy Modes

Mode

Use Case

basic

Standard sites, fastest

stealth

Anti-bot protection, premium content (WSJ, NYT)

auto

Let Firecrawl decide (recommended)

Output Formats

markdown

- Clean markdown content (default)

html

- Raw HTML

summary

- AI-generated summary

screenshot

- Page screenshot

links

- All links on page

Cost

~1 credit per page. Stealth proxy may use additional credits.

Security Notes

Credential Handling

Store

FIRECRAWL_API_KEY

in

.env

file (never commit to git)

API keys can be regenerated at

https://firecrawl.dev/app/api-keys

Never log or print API keys in script output

Use environment variables, not hardcoded values

Data Privacy

Only scrapes publicly accessible web pages

Scraped content is processed by Firecrawl servers temporarily

Markdown output stored locally in

.tmp/

directory

Screenshots (if requested) are stored locally

No persistent data retention by Firecrawl after request

Access Scopes

API key provides full access to scraping features

No granular permission scopes available

Monitor usage via Firecrawl dashboard

Compliance Considerations

Robots.txt

Firecrawl respects robots.txt by default

Public Content Only

Only scrape publicly accessible pages

Terms of Service

Respect target site ToS

Rate Limiting

Built-in rate limiting prevents abuse

Stealth Proxy

Use stealth mode only when necessary (paywalled news, not auth bypass)

GDPR

Scraped content may contain PII - handle accordingly
Copyright: Respect intellectual property rights of scraped content Troubleshooting Common Issues Issue: Credits exhausted Symptoms: API returns "insufficient credits" or quota exceeded error Cause: Account credits depleted Solution: Check credit balance at https://firecrawl.dev/app Upgrade plan or purchase additional credits Reduce scraping frequency Use basic proxy mode to conserve credits Issue: Page not rendering correctly Symptoms: Empty content or partial HTML returned Cause: JavaScript-heavy page not fully loading Solution: Enable JavaScript rendering with --js-render flag Increase timeout with --timeout 60000 (60 seconds) Try stealth proxy mode for protected sites Wait for specific elements with --wait-for selector Issue: 403 Forbidden error Symptoms: Script returns 403 status code Cause: Site blocking automated access Solution: Enable stealth proxy mode Add delay between requests Try at different times (some sites rate limit by time) Check if site requires login (not supported) Issue: Empty markdown output Symptoms: Scrape succeeds but markdown is empty or malformed Cause: Dynamic content loaded after page load, or unusual page structure Solution: Increase wait time for JavaScript to execute Use --wait-for to wait for specific content Try html format to see raw content Check if content is in an iframe (not always supported) Issue: Timeout errors Symptoms: Request times out before completion Cause: Slow page load or large page content Solution: Increase timeout value (up to 120000ms) Use basic proxy for faster response Target specific page sections if possible Check if site is experiencing issues Resources references/single-page.md - Single page scraping details references/website-crawler.md - Multi-page website crawling Integration Patterns Scrape and Analyze Skills: firecrawl-scraping → parallel-research Use case: Scrape competitor pages, then analyze content strategy Flow: Scrape competitor website pages with Firecrawl Convert to clean markdown Use parallel-research to analyze positioning, messaging, features Scrape and Document Skills: firecrawl-scraping → content-generation Use case: Create summary documents from web research Flow: Scrape multiple article pages on a topic Combine markdown content Generate summary document via content-generation Scrape and Enrich CRM Skills: firecrawl-scraping → attio-crm Use case: Enrich company records with website data Flow: Scrape company website (about page, team page, product pages) Extract key information (funding, team size, products) Update company record in Attio CRM with enriched data

安装

Required in .env

FIRECRAWL_API_KEY