Firecrawl Scraping Overview Scrape individual web pages and convert them to clean, LLM-ready markdown. Handles JavaScript rendering, anti-bot protection, and dynamic content. Quick Decision Tree What are you scraping? │ ├── Single page (article, blog, docs) │ └── references/single-page.md │ └── Script: scripts/firecrawl_scrape.py │ └── Entire website (multiple pages, crawling) └── references/website-crawler.md └── (Use Apify Website Content Crawler for multi-page) Environment Setup
Required in .env
FIRECRAWL_API_KEY
- fc-your-api-key-here
- Get your API key:
- https://firecrawl.dev/app/api-keys
- Common Usage
- Simple Scrape
- python scripts/firecrawl_scrape.py
- "https://example.com/article"
- With Options
- python scripts/firecrawl_scrape.py
- "https://wsj.com/article"
- \
- --proxy
- stealth
- \
- --format
- markdown summary
- \
- --timeout
- 60000
- Proxy Modes
- Mode
- Use Case
- basic
- Standard sites, fastest
- stealth
- Anti-bot protection, premium content (WSJ, NYT)
- auto
- Let Firecrawl decide (recommended)
- Output Formats
- markdown
- - Clean markdown content (default)
- html
- - Raw HTML
- summary
- - AI-generated summary
- screenshot
- - Page screenshot
- links
- - All links on page
- Cost
- ~1 credit per page. Stealth proxy may use additional credits.
- Security Notes
- Credential Handling
- Store
- FIRECRAWL_API_KEY
- in
- .env
- file (never commit to git)
- API keys can be regenerated at
- https://firecrawl.dev/app/api-keys
- Never log or print API keys in script output
- Use environment variables, not hardcoded values
- Data Privacy
- Only scrapes publicly accessible web pages
- Scraped content is processed by Firecrawl servers temporarily
- Markdown output stored locally in
- .tmp/
- directory
- Screenshots (if requested) are stored locally
- No persistent data retention by Firecrawl after request
- Access Scopes
- API key provides full access to scraping features
- No granular permission scopes available
- Monitor usage via Firecrawl dashboard
- Compliance Considerations
- Robots.txt
-
- Firecrawl respects robots.txt by default
- Public Content Only
-
- Only scrape publicly accessible pages
- Terms of Service
-
- Respect target site ToS
- Rate Limiting
-
- Built-in rate limiting prevents abuse
- Stealth Proxy
-
- Use stealth mode only when necessary (paywalled news, not auth bypass)
- GDPR
-
- Scraped content may contain PII - handle accordingly
- Copyright
- Respect intellectual property rights of scraped content Troubleshooting Common Issues Issue: Credits exhausted Symptoms: API returns "insufficient credits" or quota exceeded error Cause: Account credits depleted Solution: Check credit balance at https://firecrawl.dev/app Upgrade plan or purchase additional credits Reduce scraping frequency Use basic proxy mode to conserve credits Issue: Page not rendering correctly Symptoms: Empty content or partial HTML returned Cause: JavaScript-heavy page not fully loading Solution: Enable JavaScript rendering with --js-render flag Increase timeout with --timeout 60000 (60 seconds) Try stealth proxy mode for protected sites Wait for specific elements with --wait-for selector Issue: 403 Forbidden error Symptoms: Script returns 403 status code Cause: Site blocking automated access Solution: Enable stealth proxy mode Add delay between requests Try at different times (some sites rate limit by time) Check if site requires login (not supported) Issue: Empty markdown output Symptoms: Scrape succeeds but markdown is empty or malformed Cause: Dynamic content loaded after page load, or unusual page structure Solution: Increase wait time for JavaScript to execute Use --wait-for to wait for specific content Try html format to see raw content Check if content is in an iframe (not always supported) Issue: Timeout errors Symptoms: Request times out before completion Cause: Slow page load or large page content Solution: Increase timeout value (up to 120000ms) Use basic proxy for faster response Target specific page sections if possible Check if site is experiencing issues Resources references/single-page.md - Single page scraping details references/website-crawler.md - Multi-page website crawling Integration Patterns Scrape and Analyze Skills: firecrawl-scraping → parallel-research Use case: Scrape competitor pages, then analyze content strategy Flow: Scrape competitor website pages with Firecrawl Convert to clean markdown Use parallel-research to analyze positioning, messaging, features Scrape and Document Skills: firecrawl-scraping → content-generation Use case: Create summary documents from web research Flow: Scrape multiple article pages on a topic Combine markdown content Generate summary document via content-generation Scrape and Enrich CRM Skills: firecrawl-scraping → attio-crm Use case: Enrich company records with website data Flow: Scrape company website (about page, team page, product pages) Extract key information (funding, team size, products) Update company record in Attio CRM with enriched data